Some considerations about the spectral entropies framework
I am illustrating some of the ideas that emerged before, during and after the conference. How to compute this quantity?
\begin{equation} \mathbb{E}\lbrack S(\rho | \sigma) \rbrack \end{equation}
where the expectation is taken over the disorder induced by the random nature of the networks obtained from a model with some given parameters $\theta$.
By linearity of trace and expectations, and thanks to the fact that for positive definite matrices the matrix logarithm of the matrix exponential is the matrix itself, we can simplify the above equation as:
\begin{equation} \mathbb{E}\lbrack S(\rho | \sigma) \rbrack = \mathrm{Tr}{\rho \log \rho} + \beta \mathbb{E}\lbrack L \rbrack + \mathbb{E}\lbrack \log Z \rbrack \end{equation}
While computing the expectation over the random realizations of L is simple, computing $\mathbb{E} \log Z$ is more difficult. This quantity in statistical mechanics is called the quenched free energy (times a -1/beta factor), and can be written as:
\begin{equation} \mathbb{E}[\log Z] = \mathbb{E} \log \left(\sum_{i=1}^n e^{-\beta \lambda_i} \right) \end{equation}
where $\lambda_i$ are the eigenvalues of some specific random realization of the model at parameters $\theta$. This problem is reminescent of calculations done in the context of the Ising model.
What I have observed, at least numerically, and also justified by some statistical mechanics texts, is that you can “replace” the quenched average which is very difficult to compute, with the annealed average, for which it is a lower bound. The annealed average consists in moving the expectation over the partition function rather than on the log-partition function.
\begin{equation} \mathbb{E}[\log Z] \geq \log \mathbb{E}[Z] \end{equation}
Moreover in the high temperature limit the two quantities get closer and closer, eventually becoming equal. The code showing the relation between these quantities is attached here (“quenched_free_energy.py”). In the script I also checked to what extent the approximation I did in my work is good, and as I supposed it works very well in the N>100 and beta < 1 domain.
Gradients and optimization
Getting the gradients of the expected relative entropy could enable practical stochastic optimization methods to work pretty well. Minimization of the “ensemble” relative entropy would mean to be able to fit any model to the observed density matrix $\rho$. However I expect the gradients (in the sense of calculus) to exist approximately only in the large N limit. As you can understand the smaller the number of nodes, the smaller the ensemble of networks generated by parameters $\theta$. To make this more concrete, for example consider all Erdos-Renyi networks with $N=2$ nodes and $p=0.5$. The “variance” of this ensemble is huge compared to its size, indeed there are 2 networks with Laplacians $L_1=[0,0; 0,0]$ and $L_2=[1 -1;-1 1]$.
In this case the expected relative entropy is, in layman’s terms, not a smooth function hence it has no gradients. My intuition is confirmed by numerical simulations where, using automatic differentiation packages, the gradients become more and more stable with increasing N, as imagined.
A question whose solution is far from being trivial is in estimating the effect of a small perturbation of the parameters $\theta + \delta\theta$ on the quenched free energy. What kind of mathematical tools are available to compute the effect of a perturbation of the parameters on the quenched log-partition function
\begin{equation} \lim \limits_{\delta\theta \to 0}\mathbb{E}\lbrack \log Z(\theta +\delta\theta) \rbrack \end{equation}
Take for example the Erdos-Renyi random graph model with the only parameter being the probability of link $p$. How can we estimate the variation of $\mathbb{E}\lbrack \log Z(p +\delta p) \rbrack$? I believe we need to relyi on methods of random matrix theory and eigenvalues perturbation to quantify this effect. The problem can be stated as follows: given a random matrix $L=D-A$ with eigenvalues $\lambda_i$, where $A$ entries are iid Bernoulli random variables with parameter $p$, how do the eigenvalues change if I slightly increase the parameter $p$ by a small amount $\delta p$?
Numerical simulations, again with automatic differentiation tools helped me to calculate the effect of this perturbation but an analitical treatment would greatly help. For this reason I believe the Heims perturbation theory can be of help. For references about this, take a look at the book “Probability theory the logic of science” by E. Jaynes, where in the 30th chapter he formally analyzes this kind of problem in great detail.
Open questions
A few open questions remain that I’d like to give an answer, or at least to better understand:
- Is $\mathbb{E}\lbrack S(\rho | \sigma) \rbrack$ an unbiased estimator? In other words, by minimization of this quantity can we recover the empirical parameters $\theta^*$?
- To what extent is it safe to approximate the quenched average with the annealed average
- Schottky anomaly lurking somewhere? Indeed, for small N a system has a finite number of energy levels, hence the so-called Schottky anomalies could be present in the heat capacity. I was looking for this reference in the MacKay book “Information Theory, Inference, and Learning Algorithms”. They disappear in the large N limit.