Black Holes And Quantum Machine Learning.
Information Retrieval From Hawking Radiation Is A Linear Regression Problem?
Research connects the black hole information paradox with double descent in machine learning, framing information retrieval from Hawking radiation as a linear regression problem. The Page time, denoting a shift in interpolation, correlates with changes in the rank structure of subsystems, suggesting a conceptual link between black hole physics and statistical learning.
The enduring puzzle of what happens to information that falls into a black hole, known as the information paradox, receives fresh scrutiny in new research that connects it to an unexpected area of modern computation: machine learning. Specifically, researchers investigate a correspondence between the behaviour of information retrieved from Hawking radiation – the thermal radiation emitted by black holes – and the ‘double descent’ phenomenon observed when training excessively complex machine learning models. Jae-Weon Lee of Jungwon University and Zae Young Kim of Spinor Media Inc., alongside colleagues, present their findings in a paper entitled ‘Black hole/quantum machine learning correspondence’. They propose a framework where information retrieval can be modelled as a linear regression problem over the multitude of possible internal states of a black hole, termed ‘microstates’, with a critical time, the ‘Page time’, acting as a threshold beyond which model performance unexpectedly improves despite increasing complexity. Their analysis, utilising the Marchenko-Pastur law – a result from random matrix theory describing the distribution of eigenvalues of large random matrices – suggests a fundamental link between the rank structure of these microstates and the transition across the Page time, potentially offering novel insights into both black hole physics and the behaviour of overparameterised machine learning models.
Recent investigations reveal a surprising connection between high-dimensional linear regression and the black hole information paradox. The analysis demonstrates that increasing the number of features—the input variables used for prediction—beyond the point of perfect training data fit does not invariably lead to increased test error, the measure of a model’s performance on unseen data. Instead, generalisation performance can improve again, creating a distinct second descent in the error rate. This counterintuitive result fundamentally alters the established relationship between model capacity—a model’s ability to learn complex patterns—and overfitting, where a model learns the training data too well and performs poorly on new data, prompting a reevaluation of optimisation strategies and regularisation techniques, methods used to prevent overfitting.
The study identifies three distinct regimes governing the behaviour of linear regression models. Initially, in the underparameterized regime, the number of data points significantly exceeds the number of features, and the generalisation error decreases as model complexity increases. Moving towards the interpolation threshold, where the number of features equals the number of data points, the model perfectly fits the training data but often suffers from high test error, indicating a lack of ability to generalise to unseen data. Crucially, analysis demonstrates that moving into the overparameterized regime, where features outnumber data points, can reduce test error due to implicit regularisation effects, offering a pathway to improved predictive performance.
A key mechanism driving this behaviour is the implicit regularisation inherent in the stochastic gradient descent (SGD) optimisation algorithm. SGD, an iterative method for finding the best model parameters, favours solutions with smaller norms—a measure of the magnitude of the model’s parameters—effectively mitigating overfitting in high-dimensional spaces. SGD, even without explicit regularisation terms, guides the model towards simpler, more generalizable solutions, enhancing its ability to perform well on unseen data. This implicit regularisation, combined with a nuanced bias-variance tradeoff—the balance between a model’s tendency to underfit or overfit—explains why increasing model complexity can paradoxically improve generalisation performance beyond the interpolation threshold, offering a new perspective on model optimisation.
The analysis leverages concepts from random matrix theory to provide a rigorous understanding of the data covariance matrix—a measure of the relationships between different features—and the stability of solutions found by SGD, solidifying the theoretical foundation of this phenomenon. Furthermore, this research establishes a conceptual connection between the double descent phenomenon and the black hole information paradox, treating information retrieval from Hawking radiation—the theoretical emission of particles from black holes—as a linear regression problem over black hole microstates, the possible configurations of a black hole.
The Page time—the point at which Hawking radiation begins to carry information about the black hole’s interior—corresponds to the interpolation threshold, marking a critical transition in information retrieval. The analysis demonstrates that crossing this threshold is associated with a change in the rank structure of subsystems, mirroring the observed double descent in the regression problem, suggesting a shared underlying mathematical structure. This connection suggests a potentially fruitful interplay between black hole physics and machine learning, prompting exploration of shared mathematical structures and conceptual parallels between these seemingly disparate fields.
Researchers utilize the Marchenko-Pastur law, a cornerstone of random matrix theory, to describe the distribution of eigenvalues—values that characterize the matrix—in random matrices and predict the variance in test error for the linear regression problem. The authors demonstrate that the transition occurring at the Page time correlates with a shift in the rank structure of the system’s subsystems, revealing a deeper connection between information retrieval and statistical learning.
Researchers utilise singular value decomposition (SVD) to decompose the data matrix, revealing the underlying structure of the system and providing insights into the relationships between data points and features. The ratio of data points to features (N/p) proves critical, as it influences whether the system is underdetermined, well-determined, or overdetermined, thereby impacting the stability and accuracy of the model. The Marchenko-Pastur law then predicts the distribution of singular values, providing insights into the variance of the test error, allowing for a quantitative comparison between the behaviour of linear regression and the information leakage from black holes.
The study proposes a novel perspective on the information paradox, suggesting that the statistical mechanics of high-dimensional systems, as described by random matrix theory, may offer a framework for understanding the fundamental principles governing black hole physics. By drawing parallels between the rank structure changes at the Page time and the behaviour of eigenvalues in random matrices, the authors open new avenues for exploring the connection between information theory, statistical learning, and the mysteries of black holes, potentially leading to breakthroughs in both fields.
Future research will focus on extending these findings to more complex models and datasets, exploring the implications for other areas of machine learning and physics, and developing new algorithms that can exploit the benefits of the double descent phenomenon.
👉 More information
🗞 Black hole/quantum machine learning correspondence
🧠 DOI: https://doi.org/10.48550/arXiv.2506.09678