From the three-body problem to KAM

I believe that many physicists may have come across the KAM theory in the physics literature. For me, I have read about KAM many times in papers discussing the discrete regime of wave turbulence at low nonlinearity. Nevertheless, I never had a deep understanding of this concept except knowing that the system behaves "like a linear one" or "like an integrable one". Recently, in a project with one of my students, we found a breather solution in a derivative NLS, for which we suspected that the oscillation part is supported on a KAM tori. This motivated me to seek further understanding about the KAM theory. However, most of the mathematical materials that I found online are not immediately accessible, as many of them are written in terms of symplectic manifold which requires further efforts to understand. This is until finally I found a book "The KAM story: a friendly introduction to the content, history, and significance of classical Kolmogorov-Arnold-Moser Theory". The contents of the book are very understandable to me (and in general to physicists, I believe), and are presented in the perspective of historical development (which I am a big fan of). Although the book does not contain a detailed mathematical proof of KAM theory, it provides enough intuition underlying the theory, including its connection to concepts such as integrable system, chaos and ergodicity. In this article, I will summarize and re-organize the materials I learned from reading the book (with many descriptions taken directly from the book), following the order of historical development. 


Newton, the Principia and integrability

Newton's monograph "the Principia" inaugurated the modern form of mathematical physics — modeling phenomena by way of differential equations. From this principle Newton was able to describe precisely the movements of planets. On a mathematical level, calculation of such movements can be cast into the n-body problem.

n-body problem:  the dynamical system that describes the motions of $n$ given masses interacting by mutual attraction according to Newton’s law of gravity.

The simplest but non-trivial one from the n-body problem is the Kepler's problem, which describes the motion of a point mass moving in an inward central force field with force inversely proportional to the square of the distance from the center. One can visualize Kepler's problem as a special case of two-body problems ($n=2$) but with mass of one body much greater than that of the other one (e.g., sun and earth). The Kepler's problem is analytically solvable, and Newton was able to do that, providing theoretical support on Kepler's laws of planetary motion that were obtained originally from observation. When it comes to more general two-body problems ($n=2$ with comparable masses of the two bodies), analytical solution is still straightforward to obtain since such problems can be readily reduced to the Kepler's problem through very simple manipulation.

The real trouble with the n-body problem begins when $n$ is three or greater, as Newton discovered to his dismay when he tried to use his newly enunciated laws to calculate motions of the earth-moon-sun system (in particular to describe the motion of the moon). There is a famous remark that Newton said that “his head never ached but with his studies on the moon”. The unsuccessful attempt of Newton for the three-body problem led to deeper questions at least from two aspects: (1) Is the solar system stable? That is, even if its motions aren’t periodic, or completely predictable, can we determine whether it will "hang together", or suffer calamity (fly apart or suffer internal collision) over some given long interval of time? (2) What kind of differential equations can be exactly solvable, in the sense that system’s solution is given by explicit formulas or integrals (that may not be evaluated in terms of elementary functions).

The question (1) is not easy to answer and we will come back to this later in the article. Here we would mention that at the beginning of the 18th century, Newton famously wrote that the solar system needed occasional divine intervention (presumably a nudge here and there from the hand of God) in order to remain stable. This was interpreted to mean that Newton believed his mathematical model of the solar system — the n body problem — did not have stable solutions. 

To answer question (2),  we need to discuss the concept of integrable system. Let us consider a Hamiltonian system

$\frac{dq_i}{dt}=\frac{\partial H}{\partial p_i}$, $\frac{dp_i}{dt}=-\frac{\partial H}{\partial q_i}$, $i=1,...,n$, 

with $n$ degrees of freedom and $2n$ dimensions in phase space $M$. This Hamiltonian system is completely integrable (with Liouville integrability) if it has $n$ constants of motion $f_1, f_2,..., f_n$ which are independent and in involution.

In particular, two smooth functions $f_i$, $f_j$ on $M$ are in involution if $\{f_i,f_j\}=0$ (i.e., their Poisson bracket vanishes on $M$). A finite set of smooth functions $\{ f_1, f_2, ..., f_n \}$ are independent on $M$ if the corresponding gradient vectors $\{ Df_1, Df_2, ..., Df_n \}$ is linearly independent at almost every point on $M$. A non-constant smooth function $f:M\rightarrow \mathbb{R}$ is a first integral (or constant of motion) if it is constant when evaluated along solutions of the Hamiltonian system, i.e., $f(q(t),p(t))=constant$ for $q(t), p(t)$ a solution to the Hamiltonian system, or $\{f,H\}=0$.

While the definition of integrability involves some technical points, its main reasoning can be described as follows: For a $2n$ dimensional phase space $M$, setting each of the $n$ first integrals to be a (separate) constant produces a $n$ dimensional smooth surface (manifold). Because it is "cut out" in this way by first integrals, this surface is clearly invariant under the flow; i.e., any solution beginning on the surface remains on it for all times thereafter (instead of in the whole phase space). Furthermore, the property of this surface is described by the LMAJ theorem.  

Liouville-Mineur-Arnold-Jost (LMAJ) theorem: Whenever such a surface is bounded and connected, it is an $n$ dimensional torus, denoted $\mathbb{T}^n$. Moreover, on these tori, there are special canonical coordinates $(\theta, I)=(\theta_1, ..., \theta_n, I_1, ..., I_n)$, called action-angle variables, in which the Hamiltonian takes the simple form $H=h(I)$ with canonical equations

$\frac{dI_k}{dt}=0$,   $\frac{d\theta_k}{dt}=\omega_k$,    $k=1,...,n$

having very simple solutions

$I_k(t)=I_k^0$,   $\theta_k(t)=\theta_k^0 + \omega_k t\ (mod\ 1)$,   $k=1,...,n$.

Here $I_k^0$, $\theta_k^0$ are the ICs, $\omega_k=\frac{\partial H(I)}{\partial I_k}$ are the frequencies. 

Remarks: The invariant surfaces that are tori are called invariant tori (or Lagrangian tori). When they occupy a region of $M$, mathematicians say that the $n$-dimensional tori foliate the region of $2n$ dimensional phase space they occupy. Since action values are fixed under the flow, each torus may be labeled (or parameterized) by the initial action $I_0$ of a trajectory on it. Once an individual torus $I = I_0$ is selected, the angles $\theta = (\theta_1, . . . , \theta_n)$ form a coordinate system on it. Individual trajectories of the flow wrap around the torus with constant angular frequency $\omega_k$ in each angular coordinate $\theta_k$ (the $\omega_k=\omega_k(I)$ are constant on each torus because the actions $I=I_0$ are constant there). Some simple examples of integrable systems with phase spaces and trajectories are shown in the following figure.

In general, if a system is integrable, then it is exactly "solvable" guaranteed by the LMAJ theorem. For the Kepler's problem with 3 degrees of freedom, one can find 5 independent first integrals (total energy, three angular momentum and three Laplace-Runge-Lenz vectors related by two equations), which makes the problem super integrable and the solution periodic. For n-body problems with $n\geq 3$, it is Poincaré who showed that such problems are not completely integrable, which will be discussed in the next session.

Before going into Poincaré's work, we will need to introduce two concepts that will become important later.

Resonant vs. non-resonant tori: On a $n$-dimensional torus with $\omega =(\omega_1, ..., \omega_n)$, consider the equation $k\cdot\omega=0$ where $k=(k_1, ..., k_n)$ is a vector with integer components. If this equation has non-zero solution $k$, we say the frequency $\omega$ is resonant. If the only solution to the equation is $k=0$, we say it is non-resonant. Non-resonant linear flow on $\mathbb{T}^n$ is also called quasi-periodic flow. See the following figure for examples on $\mathbb{T}^2$

(a) resonant and (b) non-resonant flow on $\mathbb{T}^2$.

Nondegeneracy: In action-angle coordinates $(\theta, I)$, the Hamiltonian $H$ of a completely integrable system takes the simple form $H=h(I)$ depending only on the actions $I$. To each invariant torus $I=I_0$ we then associate the single fixed frequency vector $\omega=\frac{\partial h}{\partial I}(I_0)$. This establishes a map $I \rightarrow \omega$ from action vectors to frequencies. The idea of nondegeneracy is roughly that a change in actions should induce a change in frequencies. An easier case can be visualized for $n=2$, shown in the following figure. As we move out radially from torus to torus, the frequency map should change its slope instead of collapsing to one point.



Remark: For a nondegenerate system with $n=2$, as we move out radially in $r$, the resonant tori are distributed among the nonresonant tori in the same way that the rational numbers are distributed among the irrationals. In particular, the resonant tori are dense and the nonresonant tori have full measure. See what follows for some definitions and the Appendix for why this is the case.

Dense: Given a subspace $B$ of the topological space $X$, the subset $A$ of $B$ is dense in $B$ if the closure of $A$ (smallest closed set in $X$ that contains $A$, or the union of $A$ and all limiting points) contains $B$.

Full measure: In a measure space $(M, \Sigma, \mu)$, the set $A\in \Sigma$ is of (or has) full measure in $B\in \Sigma$ provided $\mu (B \verb|\| A)=0$. The reference set $B$ is often understood to be some nice set, such as a ball, or the base set $M$, in which case one simply says "$A$ is of full measure". 


The era of Poincaré and the aftermath

This part of the story started from Karl Weierstrass in the early 1880s during his professorship at the University of Berlin. As the world's leading figure in mathematics, Weierstrass took a keen interest in the n-body problem. Weierstrass calculated formal series solutions (akin to so-called Lindstedt series) for the n-body problem, suspected that they are convergent, but had difficulty establishing the convergence rigorously. Meanwhile, in Stockholm, Weierstrass' protégé Gösta Mittag-Leffler had risen to a high station in Swedish academic circles, high enough that the scientifically enlightened Oscar II, King of Norway and Sweden, asked him to suggest questions for a scientific prize competition that would honor the king’s upcoming 60th birthday in 1889. Mittag-Leffler in turn consulted Weierstrass and Kowalevski (student of the former), who suggested the convergence problem. One of the three prize questions was therefore on the convergence of these series in the three body problem, or, as it is dramatically phrased in the competition, “on the stability of our planetary system”.

The prize was, of course, later received by the young French mathematician Henry Poincaré for his magnificent 270-page memoir published in Acta Mathematica (in fact, Poincaré's first version of the paper involved a fundamental error discovered by L.E. Phragmén, and he had to work over the next year to correct and recreate the "great manuscript" we know today).  Poincaré is often called the "last universalist of mathematics" and "father of dynamical systems", who stands second only to Newton in revolutionizing the science of mechanics. In order to understand what Poincaré did (that changed the course of astronomical dynamics forever), let us first introduce the Hamiltonian perturbation theory (HPT), which Poincaré called "the fundamental problem of dynamics".

Hamiltonian perturbation theory (HPT): we consider a smooth (or even analytic) completely integrable n-degree-of-freedom Hamiltonian system with Hamiltonian $h=h(I)$ in the action-angle variables $(\theta,I)$, valid in a nice region $U$ of phase space. To this integrable system, we add a smooth (or analytic) function of both actions and angles having the form $\epsilon f(\theta, I, \epsilon)$, called a perturbation of $h$. This gives a so-called perturbed Hamiltonian system (or nearly integrable Hamiltonian system) $H(\theta, I, \epsilon) = h(I) + \epsilon f(\theta, I, \epsilon)$. 

This idea of starting from a known entity (here $h = h(I)$), then "perturbing" it slightly as a way of exploring the unknown, was introduced by Newton in the Principia and soon became the standard means of studying the n-body problem for $n \geq 3$. In this case, a two-body problem is the known entity, and smaller additional bodies comprise the perturbation. It was by varying this approach that many of the fundamental advances in planetary motion were made in the 18th and 19th centuries.

In HPT, one is interested in whether the perturbed system $H(\theta, I, \epsilon)$ is integrable or not. The basic method of verifying integrability, by C.G.J. Jacobi, is to look for a smooth canonical change of coordinates $(\theta,I) \rightarrow (\phi, J)$ such that in the new action-angle variables $(\phi, J)$ the Hamiltonian $H$ will depend only on $J$. Such coordinate change can be effectively formulated through the Lie series method (which in fact appeared after Poincaré's time). 

Lie series method: We look for a generating function $\chi(\phi, J)$ to make a Liouville operator $L_{\chi}(H) \equiv \{ \chi, H \}$ and a near-identity variable change $e^{\epsilon L_{\chi}}=I+\epsilon L_{\chi} + O(\epsilon^2)$. Consider a symplectic change of variables $T: D' \rightarrow D$ defined as the time-1 flow of the Hamiltonian $\epsilon \chi$, with $(\phi, J) \in D'$ and $(\theta, I) \in D$. Then the new Hamiltonian $K(\Phi, J) = e^{\epsilon L_{\chi}} H = H \circ T$. 

Formal proof: Let $\Psi^t (\phi, J)$ be the time-$t$ flow map of $\epsilon \chi(\phi, J)$. By Taylor's expansion, $H\circ \Psi^1 = H\circ \Psi^0 + \frac{d}{dt} (H\circ \Psi^t) |_{t=0} + ...$ with $\frac{d}{dt} (H\circ \Psi^t) = \nabla H(\Psi^t(\phi, J)) \cdot  \frac{d}{dt} \Psi^t(\phi, J) =  \nabla H(\Psi^t(\phi, J)) \cdot \epsilon (\frac{\partial \chi}{\partial J}, -\frac{\partial \chi}{\partial \phi}) |_{\Psi^t} = \epsilon \{ \chi, H \}\Psi^t (\phi, J)$. Therefore,  $H \circ T(\phi, J) = H\circ \Psi^1(\phi, J)= e^{\epsilon L_{\chi}} H(\phi, J) = K(\phi, J)$.

Performing the Lie series method on $H=h+\epsilon f$ gives 
$K=e^{\epsilon L_{\chi}} H = H + \epsilon L_{\chi}(H) + O(\epsilon^2) = h(J) + \epsilon f(\phi, J) + \epsilon \{\chi, h\} (\phi, J) + O(\epsilon^2)$.                          (1)
In order to get rid of the $\phi$-dependence in the $O(\epsilon)$ term, we need $f+\{ \chi, h \}=0$ or more explicitly  
                $f(\phi, J) = -\sum_{k=1}^n \frac{\partial \chi}{\partial \phi_k}\frac{\partial h}{\partial J_k}$.                                           (2)
This is an example of a "homological equation", or fundamental linear equation of perturbation theory. In order to solve this equation, we note that $f(\phi,J)$ is periodic in $\phi$ for given $J$ (because $\phi \in \mathbb{T}^n$), we can expand $f$ (and also the constructed $\chi$) by Fourier series: $f(\phi, J) = \sum_{k\in \mathbb{Z}^n}\hat{f}_k(J) e^{2\pi i k\cdot \phi}$,   $\chi(\phi, J) = \sum_{k\in \mathbb{Z}^n}\hat{\chi}_k(J) e^{2\pi i k\cdot \phi}$. Substituting these expressions in the homological equation and using $\omega(J)=\partial h/\partial J$, we get
$\sum_{k\in \mathbb{Z}^n}\hat{f}_k(J) e^{2\pi i k\cdot \phi}= -2\pi i \sum_{k\in \mathbb{Z}^n}\hat{\chi}_k(J) k \cdot \omega(J) e^{2\pi i k\cdot \phi}$. We then solve for the Fourier coefficients of $\chi$ to find (formally)
                $\hat{\chi}_k(J)=\frac{-\hat{f}_k(J)}{2\pi i k \cdot \omega(J)}$                                         (3)
In examining the above expression, we begin to worry about the possibility of "zero divisors". In earlier discussion, we saw that the resonant tori (those with $k\cdot \omega(J)=0$) of nondegenerate systems are dense, and we arrive at the following conclusion.

Small divisor problem: The nearly integrable Hamiltonian $H=h+\epsilon f$ with nondegenerate integrable part $h$ will have zero divisors $k\cdot \omega(J)=0$ (i.e. resonances) in any open set of phase space. For "ordinary" perturbations $\epsilon f$, there is therefore no hope of using the above equation to construct the desired variable change in the classical sense (i.e., so that it’s defined on a "nice" domain—one that’s open, or contains nonempty open sets). While it’s true that the portion of phase space where the divisors vanish has Lebesgue measure 0, even if this portion is removed, the divisors still become arbitrarily small on the remaining part of phase space. Thus, even on this (highly spongiform) part of phase space from which all zero divisors have been removed (and which contains no nonempty open sets), we are unable to ensure the convergence of the Fourier series for $\chi$ using the coefficients of the above equation.

It was Poincaré who first recognized that the small divisor problem made integrable Hamiltonian systems exceedingly rare and that most systems are not integrable in the classical sense defined above. The fact that no classical transformation to integrable form exists for generic systems is called "Poincaré’s non-existence theorem". This finding by Poincaré became closely related to the historical development of two scientific/mathematical fields: the chaos theory and the ergodic theory.

For the former, Poincaré found that, under perturbation, the majority of resonant tori are forced to follow the contortions of what he called "a sort of trellis" and what modern dynamicists call a "homoclinic tangle". Although it’s now commonplace knowledge, it seemed astonishing at the time to think that the vast majority of Hamiltonian systems were nonintegrable and contained homo- or heteroclinic tangles. The existence of tangles implies a number of dynamical features now seen as hallmarks of "chaos": Among other things orbits starting very near each other can quickly end up far apart (the sensitive dependence on initial conditions). All these concepts were present in Poincaré’s work on HPT and the three body problem, which embodies most of "chaos theory" as it became fashionably known in the 1970s. Most compelling reason for the sudden growth of chaos theory around 1970 is simply the arrival of easily programmable personal computers, with which scientists of all sorts could experiment with simple mathematical models of nonlinear systems. To a large extent, they rediscovered by experiment what Poincaré and followers had discovered by thinking.

For the latter, let us first describe ergodicity as a slightly stronger form of chaos (see in Appendix the mathematical formulation of "ergodic theorem" that appeared in 1930s after the development of measure theory). An ergodic system has trajectories that do not confine themselves to separate portions of phase space; instead, trajectories pass through almost all points in phase space that are energetically accessible. Since the introduction of the ergodicity concept by Ludwig Boltzmann, the keen interest in ergodic systems merged with Poincaré’s proof of the nonintegrability of generic Hamiltonian systems. Many best researchers of the time enthusiastically followed Boltzmann's idea and thought or hoped  that nonintegrability entails ergodicity (which was known not to be true later after the birth of KAM). Such researchers include Enrico Fermi and George Birkhoff (Poincaré's foremost disciple), although Fermi was later "disappointed" by the FPUT numerical experiment which surprisingly showed resistance of the system to thermalization (i.e., ergodicity).

Theoretician's over-indulgence in chaos and ergodicity led to a long paradox in mechanics. While theoreticians believed that as soon as $\epsilon$ is turned on in HPT, the integrability of the system is destroyed, those who worked with solutions of such system in practice saw things differently. In the experience of the latter, for small values of $\epsilon$, most of their approximate solutions (in particular, from series computed by perturbation method) of the perturbed system $H = h + \epsilon f$ stayed fairly close to solutions of the integrable system $h$ for long intervals of time, and that both the closeness and the time interval could be controlled by varying $\epsilon$. In other words, Experience taught them that there was some kind of continuity to integrable behavior, despite the immediate breakdown claimed by theoreticians.  

Poincaré had partly explained this paradox by showing that most of the series used in celestial mechanics were divergent asymptotic expansions (although they are useful approximations by retaining only finite number of terms). For the Lindstedt series that is the subject of King Oscar's prize, after showing that the series are plagued by small divisor problems and therefore cannot be everywhere uniformly convergent, Poincaré considered the question of whether the series might converge for special initial conditions chosen in advance. He asserted his belief twice in his manuscript that, even in these special cases, the Lindstedt series diverge. But in each case, he qualifies the statement by saying that his arguments do not allow him to assert this divergence with absolute rigor. It was the birth of KAM which finally resolved the paradox, as a result of which mathematicians are able to understand that the Lindstedt series is indeed convergent for special initial conditions. Everyone in Newton's and Poincaré's era were about to be re-educated in a dramatic way. 



The birth of KAM

The first essential breakthrough on the path to KAM theory was the idea to overcome the small divisor problem, which was first constructed by the German number theorist Carl Ludwig Siegel. At that time, Siegel was in fact not working on the HPT problem, and we shall directly describe how Andrey Kolmogorov adapted Siegel's techniques for the purpose of the KAM theory (although there is no direct evidence that Kolmogorov knew about Siegel's work that appeared earlier). Let us now return to equation (3) and recall that, following Poincaré, the presence of a dense set of resonant tori means that small divisors in these solutions prevent $\chi$, hence also $T$ or $e^{\epsilon L_{\chi}}$, from being defined classically. This is still true, but Kolmogorov would take a different approach. The basic idea is not to consider a classical transformation defined on a nice set (one with interior), and simply look for some non-empty set on which the transformation can be defined. 

Let us be more specific. Consider a special set of frequency vectors (Diophantine vectors defined by Diophantine condition): $D^{\omega}(\gamma, \tau) = \{ \omega\in\mathbb{R}^n | \text{for each nonzero } k\in\mathbb{Z}^n, |k\cdot\omega|\geq \frac{\gamma}{|k|^\tau} \}$, and the corresponding set of $J$-values $D^J(\gamma, \tau) = \{ J\in\mathbb{R}^n | \text{for each nonzero } k\in\mathbb{Z}^n, |k\cdot\omega(J)|\geq \frac{\gamma}{|k|^\tau} \} $. These two sets are nonempty provided $\gamma>0$ is small enough, with $\tau > n-1$, and provided the unperturbed Hamiltonian $h=h(J)$ is nondegenerate (so that the frequency map $\omega(J)=\partial h/\partial J$ preserves the basic structure of $\omega$'s when mapped backward to $J$'s). The frequencies in $D^{\omega}(\gamma, \tau)$ form a strange sort of set closely related to a Cantor set, which is a type of fractal with no interior (see figure below for the most common construction of the cantor ternary set and an example of $D^{\omega}(\gamma, \tau)$ with $n=2$). And because the frequency map is nondegenerate, the preimage $D^J(\gamma, \tau)$ in $J$ space is also a Cantor-like set. The set $D^J(\gamma, \tau)$ does not have interior, but for small $\gamma$ it is still large in the sense of Lebesgue measure, and it can be shown to generate uncountably many invariant tori for the original system.


Let us further consider a smooth analytic function $f$ with Fourier coefficients $\hat{f}_k=\hat{f}_k(J)$ that decay rapidly with index $k$: $||\hat{f}_k||_B \leq \frac{C}{|k|^b}$ where $C>0$ and $b>0$ (which can be as large as desired by taking $f$ to be sufficiently smooth). The subscript $B$ indicates that we use the uniform norm to measure $\hat{f}_k$, i.e., for a closed ball $B$, $||\hat{f}_k||_B=max_{J\in B}||\hat{f}_k(J)||$. Now, consider $J$ values in the subset $S=D^J(\gamma, \tau) \cap B$, on which the Fourier coefficients of $\chi$ satisfy $|| \hat{\chi}_k ||_S= || \frac{\hat{f}_k(J)}{2\pi i k\cdot \omega(J)} ||_S \leq \frac{C}{2\pi \gamma |k|^{b-\tau}}$. If we choose $f$ smooth enough to ensure that the exponent $b-\tau >n$, it follows that the Fourier coefficients $\hat{\chi}_k(J)$ decays rapidly enough to get absolute convergence of the Fourier series. In this way we define the generating function $\chi$, and hence also the desired transformation $e^{\epsilon L_{\chi}}$. This shows that the transformation is more than merely formal, since we specify the domain $S$ on which its generating function converges. In doing this, we violate nothing that Poincaré proved — the transformation is not classical because $S$ has no interior.

The idea of using Diophantine conditions to get convergence of a Fourier series for $\chi$ was the first essential breakthrough on the path to the KAM theory. Yet, even if the transformation $e^{\epsilon L_{\chi}}$ can be made to work in this way, it only serves to make Eq. (2) true, i.e., it only gets rid of the $O(\epsilon)$ term in Eq. (1). In other words, our transformation is literally only the first step in a succession of infinitely many steps required to eliminate the $\phi$-dependence entirely. It was Kolmogorov who pointed out how to take the remaining steps in his 1954 ICM lecture and his paper published in the same year.

Before introducing Kolmogorov's scheme, let us discuss what if we set up a regular iterative perturbation procedure which, at the $k$th step, solves a homological equation at order $\epsilon^k$. We may naïvely expect that after $n$ steps, the original Hamiltonian will have been transformed to integrable form through $O(\epsilon^n)$, i.e., to 
                  $K(\phi, J, \epsilon) = h_n(J, \epsilon) + \epsilon^{n+1} R(\phi, J, \epsilon)$.           (4)
We may hope to continue this process "to infinity", but a technical problem arises: If we look at the transformed actions $J$ at the $n$th step, we find that their distance $|I-J|$ from the old actions may be up to $O(\epsilon^{n+1})$. In turn the new frequency $\omega(J)$ will be similarly displaced from the old. It turns out that this "wiggle room" is too much, i.e., the $\omega$ can move too close to resonance such that the next step in the iteration cannot proceed due to the violation of the Diophantine condition.

Kolmogorov was able to address the above issue by applying a generalized Newton's method in an appropriate functional space. As a result, in stark contrast to Eq. (4), after $n$ steps Kolmogorov’s procedure would transform the original Hamiltonian to something like
                  $K(\phi, J, \epsilon) = h_n(J, \epsilon) + \epsilon^{\alpha^n} R(\phi, J, \epsilon)$,
with $\alpha>1$. Here the $O(\epsilon^{\alpha^n})$ remainder shrinks much more quickly than the $O(\epsilon^{n+1})$ remainder in (4). This rapid convergence (characteristic of Newton's method) combined with Diophantine conditions is enough to overcome the small divisors at all orders. In the end, this permits a transformation of the perturbed Hamiltonian to the integrable form $K(\phi, J, \epsilon) = h_\infty (J, \epsilon)$ for any given $J$ in the Cantor-like set $D^J(\gamma, \tau)$.         

Kolmogorov's specific iteration procedure goes as follows. Choose an action value $I_0 \in B$ so that the corresponding frequency is Diophantine. Our goal is to show that for small enough $\epsilon >0$, this torus persists (though slightly distorted) as an invariant torus of the perturbed system. Now consider an appropriate complex neighborhood $D_0$ of $\mathbb{T}^n \times \{I_0\}$. We want to find a slightly smaller complex domain $D_1 \subset D_0$ and a symplectic near-identity transformation $T_1$ (time-1 flow of the generating function $\chi(\phi, J)$) : $D_1 → D_0$ which transforms $H$ to $H_1=h_1(J,\epsilon) + \epsilon^2 R_1(\phi, J, \epsilon)$. We also want to be sure that the preimage $I_1=T_1^{-1}(I_0)$ lies inside $D_1$. We then repeat the process (now with $H_1=h_1+\epsilon^2 R_1$  in place of $H = h + \epsilon f$) to create a transformation $T_2$: $D_2 \rightarrow D_1$ transforming $H_1$ to $H_2=h_2 + (\epsilon^2)^2 R_2$, while maintaining $I_2=T_2^{-1}(I_1) \in D_2$ as needed. Note the appearance of the "quadratically shrinking remainder" of order $(\epsilon^2)^2=\epsilon^4$. At the next step, this remainder will again be squared to produce a new remainder of order $(\epsilon^4)^2=\epsilon^8$. This process generalizes to a sequence of transformations $T_k: D_k \rightarrow D_{k-1}$ on nested domains $D_k \subset D_{k-1}$ on which $T_k$ transforms $H_{k-1}$ to $H_k = h_k + \epsilon^{2^k} R_k$ with $I_k=T_k^{-1}(I^{k-1}) \in D_k$. Finally, look at the composite maps $U_m=T_1\circ T_2\circ ...\circ T_m: D_m \rightarrow D_0$. Application of $U_m$ to $H$ transforms it to $H_m=h_m + \epsilon^{2^m} R_m$ which rapidly gets closer to integrable form on the shrinking domains $D_m$. In the limit, $lim_{m\rightarrow \infty}H_m = h_{\infty}(J,\epsilon)$ is completely integrable on the complex domain $D_{\infty}$ which contains $\mathbb{T}_n \times \{I_\infty\}$. This last object is clearly an invariant torus of $h_\infty$ supporting quasiperiodic flow with frequency $\omega$ (this is a slightly distorted torus with same quasi-periodic frequency $\omega$).

With the idea of proof outlined above, we now state a prototype KAM theorem. 
Prototype KAM theorem: Let $(\theta, I)=(\theta_1, ..., \theta_n, I_1, ...., I_n)$ be action-angle variables for the smooth completely integrable Hamiltonian $h: M \rightarrow \mathbb{R}$ with $n\geq 2$ degrees of freedom. Assume $h$ is nondegenerate, and let $\epsilon \in \mathbb{R}$ be a small parameter. Then for any smooth perturbation $\epsilon f(\theta, I, \epsilon)$ there is a threshold $\epsilon_0 >0$ such that whenever $||\epsilon f||_M <\epsilon_0$, the perturbed Hamiltonian $H(\theta, I, \epsilon)=h(I)+ \epsilon f(\theta, I, \epsilon)$ has a nonempty set $\mathcal{T}$ of invariant n-dimensional (Lagrangian or KAM) tori in its phase space. On each invariant tori of $\mathcal{T}$, the flow of $H$ is quasiperiodic with highly non-resonant frequency. Furthermore, the set $\mathcal{T}$ is large in the sense that its measure becomes full as $\epsilon \rightarrow 0$.

Although Kolmogorov outlined the idea of proof in his 1954 paper, there was no direct evidence that he had written down a detailed proof. The complete proof of the theorem was eventually provided by Vladimir Igorevich Arnold and Jürgen Moser separately in 1962. Both of their results were announced at the 1962 ICM in Stockholm. Arnold applied Kolmogorov's ideas to the problem of stability in (certain versions of) the n-body problem. In the next year he also published a complete proof of a slightly modified version of Komogorov's theorem (dedicating the paper to Kolmogorov on his 60th birthday). Moser's result in 1962 was both less and more general than what Kolmogorov had claimed. It was stated as a theorem about the persistence of invariant curves for twist maps of the plane; in that sense, it implied the existence of invariant tori only for perturbed Hamiltonian systems with $n = 2$ degrees of freedom, rather than for arbitrary $n$ as in Kolmogorov’s theorem. But, Moser’s theorem replaced the assumption that the Hamiltonian was analytic with an assumption that it was only finitely differentiable (he required 333 derivatives). This drastic reduction in the smoothness hypothesis greatly strengthened the theorem, and opened up a line of research into the minimal smoothness required for the persistence of invariant tori. Later in 1967, Moser also showed that the existence of the invariant tori (and the quasiperiodic solutions they contain) settles the question of convergence of the Lindstedt series, and can thus be seen as a long-delayed answer to the question posed in the King Oscar Prize competition. The series indeed converge for a large (and strange) set of initial conditions.



Other results in HPT

Now let us return to HPT and mention some other relevant concepts/results. In a paper published in 1999, P. Lochak divides HPT into three basic parts: results pertaining to geometric stability, classical stability, and instability. 

Geometric stability refers to the invariant sets in phase space (such as invariant tori or invariant manifolds generally) to which orbits are confined for all time, thus providing stability over infinite time intervals. KAM tori belong to the class (but are not the only objects) of geometric stability results in HPT. Classical stability is the sort of stability arising out of classical Hamiltonian perturbation theory. In other words, when faced with the small divisor problem, one way to proceed is to "go non-classical", ultimately leading to the set of KAM tori, interpreted as transformation to integrable form on a Cantor-like set. But, on the other hand, one can instead "go classical", requiring transformations to be defined on sets with interior. This leads to stability over finite — but very long — time intervals. There’s a trade-off with respect to geometric stability: one loses the infinite time intervals, but gains a nice, classical set of initial conditions for which results hold (see Nekhoroshev theorem described below for more details). Though perhaps less esthetically satisfying than KAM theory, under the right conditions (long enough time intervals, big enough thresholds, etc.) this is probably the kind of stability result best suited to physical applications. Instability in HPT is basically everything else. In systems far from integrable, solutions are not subject to either kind of stability mentioned above, and may behave chaotically even over relatively short time intervals. For near-integrable systems, solutions may be subject to classical stability over long time intervals, but display a weak drifting behavior beyond these intervals.

We now discuss the Nekhoroshev theorem (first laid out in the 1970s by Arnold’s student N.N. Nekhoroshev), which is often referred to as "physicists’ KAM theory". This is so partly because the threshold $\epsilon_0$ in Nekhoroshev theorem can be much greater than that in KAM, more suitable for application in physical problems ($\epsilon_0$ in KAM is often absurdly small for any practical situation).  Theorems of this sort show adiabatic invariance of the action variables for analytic nearly integrable systems satisfying certain steepness or "convexity" conditions, which replace the nondegeneracy conditions of KAM theory. Adiabatic invariance means that the actions $I = I(t)$ stay near their initial values $I(0)$ over some (long) time interval. We state a prototype Nekhoroshev theorem below. 

Prototype Nekhoroshev theorem: Suppose that the nearly integrable Hamiltonian $H(\theta, I, \epsilon) = h(I) + \epsilon f(\theta, I, \epsilon)$ is analytic and that the unperturbed part $h = h(I)$ is steep (or convex, or quasiconvex) on some nice domain. Then there’s a threshold $\epsilon_0 >0$, and positive constants $R$, $T$, $a$, and $b$ such that whenever $|\epsilon|<\epsilon_0$, for all initial actions $I(0)$ in the domain (and far enough from the boundary) we have $||I(t)-I(0)|| \leq R \epsilon^b$ for times $t\leq Texp(\epsilon^{-a})$.

Remark 1: To say that $h = h(I)$ is convex on a domain means that its Hessian matrix $\partial^2 h/\partial I^2$ is (uniformly positive, or uniformly negative) definite on the $I$-domain; and $h$ is "quasiconvex" if it is convex on every fixed energy level $h(I) = E$ (over a range of appropriate energies $E_1 < E < E_2$).

Remark 2: Note that the conclusion $||I(t)-I(0)|| \leq R\epsilon^b$ is weaker than — but still closely analogous to — the conclusion that trajectories lie on invariant tori, and that it is further weakened by holding for finite times rather than for infinite times. But to compensate for this, there is also an important gain: namely, the conclusion holds for all initial conditions in the domain, rather than only for those in the Cantor family $\mathcal{T}$ of KAM theorems. This is another fact that makes Nekhoroshev theorem a (hypothetically) better choice in physical applications than the KAM theorem, especially when one realizes that exponential time intervals (of length $Texp(\epsilon^{-a})$ may be "practically as long as infinity" (e.g. as long as the age of the universe), depending on the parameters involved.

Finally, we mention two types of instability named after Chirikov and Arnold. Chirikov diffusion is one type of instability in Chirikov regime, which comprises systems whose perturbations are bigger than the Nekhoroshev threshold. In this regime the KAM tori are sparse or nonexistent and chaotic behavior of various kinds predominates in phase space. Arnold diffusion is another type of instability in the Nekhoroshev regime, for which systems are close to integrable. In this regime, KAM tori fill a large part of phase space, and any large-scale instability must manifest itself over very long times, beyond the exponentially long timescales on which Nekhoroshev theory guarantees stability of the actions.


Is the solar system stable?

Equipped with all above knowledge, let us come back to our original problem rooted from Newton: Is our solar system stable or not? To make things clear from the start, we have to point out that unfortunately KAM theory says very little about our own physical solar system. Although the masses of the planets appear very small, with the Sun (solar mass 1), Jupiter (0.001 solar mass), Saturn (0.0003 solar mass), Neptune (0.00005 solar mass), and so on, by the standards of rigorous KAM theory, they are "astronomical"! The study on the stability and chaotic behavior of the solar system will need to wait for the emergence of modern computers. However, the KAM theory has a great deal to say for small subsystems and n-body problems, which we review first.

We considered the scaled 1+n body problem, where a small parameter $\epsilon>0$ is used to roughly represent the ratio of mass of planets ($i\geq 1$) to the mass of the sun ($i=0$). The simplest case among this class of problems is the "restricted, planar, circular three body problem", or RPC3BP for short. Specifically, the word "restricted" means that the third body is a kind of ghost, or infinitesimal body with no mass. In other words, it is assumed to be so small that it exerts no force whatsoever on the first two bodies; it exists only as a "sensor" to test the motion of a body of negligible mass under the gravitational influence of the first two bodies with non-negligible mass. (One often imagines an asteroid moving between the Sun and Jupiter, or a high-orbit satellite between the Earth and Moon.) Next, the word "planar" simply means that all three bodies remain in a fixed plane in space. Finally, "circular" means that the integrable motion of the first two massive bodies is circular. The RPC3BP can be summarized in this way: Given two bodies in circular orbit around their barycenter, what is the motion of a massless test particle restricted to the orbital plane?

The KAM theory can be directly applied to RPC3BP, which roughly states that for the small parameter $\epsilon$ (ratio of body 0 and body 1) below a threshold, there exist a set of analytic KAM tori of positive Liouville measure. Furthermore, there is also a KAM theory for the $1+n$ body problem by J. Féjoz in 2004. 

Stability theorem for the 1 + n body problem: Consider $1+n$ point bodies with masses $\bar{m}_0$, $\epsilon \bar{m}_1$, ... $\epsilon \bar{m}_n$ whose motions are governed by the mutual gravitational forces. For all collections $\bar{m}_0$, $\bar{m}_1$, ... $ \bar{m}_n$$>0$ and $a_1>...>a_n>0$, there exists a critical value $\epsilon_0>0$ such that whenever $0<\epsilon<\epsilon_0$, in the neighborhood of circular and coplanar Keplerian motions with semi major axes $a_1$, . . ., $a_n$, there is a subset of initial conditions with positive Lebesgue measure leading to quasiperiodic motions with $3n − 1$ frequencies (i.e., there is a collection of invariant tori with positive Lebesgue measure).

Now, for the physical solar system, the KAM theory does not enter the picture on any practical and computational level. Modern astronomers make use of the best observations and the best computational models available to them, relying on the former to check the latter. The modern era of such work may be said to begin with the construction of the "Digital Orrery", a small, special-purpose computer built at Caltech in 1984 by G.J. Sussman and coworkers, then transported to MIT, where it was used over a period of seven years to simulate the dynamics of the solar system. One of its most notable discoveries concerned the orbit of (then-planet) Pluto. Working with colleague J. Wisdom at MIT, Sussman used the Orrery to integrate select orbit-elements of the five outer planets over an interval corresponding to 845 million years. They found that the simulated orbit of Pluto was noticeably chaotic; nearby orbits of Pluto diverged exponentially from one another with an e-folding time of only about 20 million years, indicating relatively large Lyapunov exponents and a strong exponential sensitivity to initial conditions. Sussman and Wisdom published their results on Pluto in 1988, and many see this as the "discovery" that the solar system is chaotic.

Not long after Wisdom and Sussman's discovery, others began their work. Jacques Laskar at the Bureau des Longitudes (BdL) in Paris took a somewhat different approach to simulate the solar system over long times. Laskar computed perturbation expansions (by machine) with as many as 150,000 terms in some cases, which permitted him to reliably calculate orbit elements using much longer time steps than possible with more direct methods. Using these techniques, Laskar turned his attention to models of the inner planets (Mercury, Venus, Earth, Mars) obtaining the remarkable result that they, like Pluto, had chaotic orbits, with even larger Lyapunov exponents (stronger sensitivity to initial conditions). Laskar's finding was later supported by Wisdom and Sussman, who performed in 1992 a more direct integration of the entire solar system up to time of 100 million years. Similar outcomes from such different methods offered strong evidence of chaos right in the heart of the solar system.

Today, after cross-checks and refinements, the chaotic nature of the solar system is a generally accepted fact among dynamical astronomers. This does not, however, lead automatically to the conclusion that the solar system is unstable. What can be said in a general way is this. Given phase-orbits of a solar system model starting at nearby initial conditions separated by a distance $d_0$ in phase space, their distance $d$ at a time $t$ million years later will be roughly $d \approx d_010^{t/10}$. Thus an initial indeterminacy in phase space of order $10^{-10}$ evolves to order 1 after 100 million years. This is a significant level of chaos given that the solar system is on the order of five billion years old.

More precise statements can be made by focusing on particular planets or groups of planets. Not surprisingly, those with the least chaotic orbits are the giant, or jovian, planets (Jupiter, Saturn, Uranus, Neptune). Though subject to some chaotic excursions, the jovian planets may reasonably expect to maintain their present configurations and orbital parameters over timescales of billions of years. In other words, they have a very good chance of remaining much as they are for the remainder of the solar system’s life span, estimated at roughly 4 billion years. 

The smaller bodies of the solar system — including the inner, or tellurian, planets — have orbits that are considerably more prone to instability. As the heaviest of the solar system’s small bodies and by virtue of its nice orbital parameters, the Earth may be the solar system’s most stable residence after the jovian planets; its orbit possesses only "average" chaos. In addition, variations in its obliquity (hence also its climate) have likely been moderated by the presence of the Moon, one of the largest satellites in the solar system. Other tellurian planets are not so lucky. Mars has probably experienced very large chaotic variations in obliquity (as much as 60 or 80 degrees) with correspondingly drastic changes in climate. And Venus probably suffered still more radical change: one of the most likely scenarios leading to its present slow retrograde rotation is that it was flipped upside down by chaotic interactions with Mercury. But no planet has a more uncertain future than the latter and innermost. There is a significant probability (about 1%) that over timescales on the order of a billion years, well inside the expected lifetime of the solar system, Mercury could undergo a collision with either Venus or the Sun. 


Appendix 1: Rational and irrational numbers

The rational numbers are dense in $\mathbb{R}$ because (intuitively) all irrational numbers can be obtained as limit point of rational numbers. To see that the rational numbers have measure zero (so that irrational numbers have full measure), we consider the fact that the rational number set $S=\{ s_1, s_2, s_3, ... \}$ is countable (i.e., it can be mapped to $\mathbb{N}=\{ 1,2,3,... \}$). For a set to have measure zero, we need to show that given any $\epsilon >0$, the sum of the measure (length, area of volume, in this case the length) is less than $\epsilon$. We pick a disk around $s_1$ that covers $\{ s_1-\epsilon/8, s_1+\epsilon/8 \} $, a disk around $s_2$ as  $\{ s_2-\epsilon/16, s_2+\epsilon/16 \} $, etc. After this is done for countable number of rational numbers, we have an infinite number of intervals whose lengths are $\epsilon/4, \epsilon/8, \epsilon/16, ...$, whose summation gives $\epsilon/2$ that is less than $\epsilon$. This completes the proof. 

Appendix 2: Ergodic theorem

It took a long time to develop the first mathematically precise results along the lines of the ergodic hypothesis—six decades, if one counts from Boltzmann’s first mention of it in 1871 to the first rigorous ergodic theorems announced in 1931 (proved by J. von Neumann and G.D. Birkhoff). The key mathematical tool which gave proper expression to the ergodic hypothesis was the Lebesgue-Borel theory of measure and integration. The new theory of integration allowed the evolution law (flow map) $\phi_t: M\rightarrow M$ to be abstracted as a measure-preserving flow on a probability space $(M, \Sigma, \mu)$. Or, to explain in more detail, if we consider that $\phi_t$ is a Hamiltonian flow, we know from Liouville’s theorem that the canonical volume element (Liouville measure) on $M$ is preserved by $\phi_t$. We can then use the volume element to define a measure $\rho$ on a natural class $\Sigma$ of measurable subsets of $M$, so $(M, \Sigma, \rho)$ becomes a measure space, with measure $\rho$ invariant with respect to $\phi_t$. Next, assuming the energy surface $M$ has finite measure $\rho(M)$, we can normalize $\rho$ so that it becomes the invariant probability measure $\mu=\rho/\rho(M)$ (in other words, $\mu(M)=1$), and we have the probability space $(M, \Sigma, \mu)$. Finally, we can now give a more precise meaning to the "space average": For any $\mu$-Lebesgue-integrable function $f : M \rightarrow R$, we define its space average $\bar{f}$ by $\bar{f}:=\int_M f(x) d\mu(x)$.

Birkhoff’ s ergodic theorem: Consider the probability space $(M, \Sigma, \mu)$ described above, with $\mu$ invariant with respect to $\phi_t$. Then given any observable $f$ that is Lebesgue-integrable on $M$ with respect to $\mu$, the time average $\langle f \rangle (x) := lim_{T\rightarrow \infty} \frac{1}{T}\int_0^T f(\phi_t) dt$ exists for almost every $x$ in $M$ (so that $\langle f \rangle$ is Lebesgue-integrable on $M$). Furthermore, the space average of $\langle f \rangle$ equals the space average of $f$.

Note that this theorem does not have an equality like "space average equal to time average" as a conclusion; instead, this follows as a corollary for a certain class of flows $\phi_t$ called ergodic, which are defined as follows.

Ergodic flow: If $(M, \Sigma, \mu)$ is a probability space, the measure-preserving flow $\phi_t$ : $M \rightarrow M$ is ergodic with respect to $\mu$ provided every measurable subset $A$ of $M$ which is invariant under $\phi_t$ has either $\mu(A)=0$ or $\mu(A) = 1$.

This very economical definition says that any measurable subset $B$ of $M$ with $0<\mu(B)<1$ must be "moved around all over the place" by $\phi_t$. Any exceptional sets must either have measure zero (and are thus negligible), or full measure 1 (thus essentially all of $M$, having "nowhere to move"). It is this property of "moving sets around all over the place" that takes the place of Boltzmann’s trajectories that pass through every point of $M$. This subtle shift is key, and allows one to prove the following corollary to Birkhoff’s ergodic theorem.

Corollary: Suppose $(M, \Sigma, \mu)$ is a probability space and the measure-preserving flow $\phi_t: M \rightarrow M$ is ergodic with respect to $\mu$. Then given any Lebesgue integrable function $f : M \rightarrow R$, for almost all $x\in M$, the time average exists and equals the space average, i.e., $\langle f \rangle (x) = \int_M f(x) d\mu(x)$.

This finally gave precise mathematical expression to Boltzmann’s intuition from six decades earlier, and launched the mathematical discipline of ergodic theory, which continues as a large and vigorous branch of dynamical systems to the present day. In addition, the physicist/engineering definition of ergodicity, in terms of "spatial/ensemble average equals time average" was also originated from here.




Comments

Popular posts from this blog

The Start

Useful inequalities in estimation