February 2024 (Original ≽)



Question: Should "information theory" be viewed so broadly, can it really be applied everywhere as you suggest?


Answer: Who would have thought that once, while discussing the meaninglessness of what we do not see and will never perceive, what potential infinities are and the infinitesimal calculus that arose from them, that the solution of those differential equations would be the dominant theme of (1945) The Electronic Numerical Integrator and Calculator, which actually dealt with the calculation of ballistic trajectories. With that banal task of artillery and killing people. Only Babbage's collaborator Ada (Ada Lovelace, 1815–1852, English mathematician and writer) may have predicted such a development of computer science at one time, in a time when she helped in the construction and realization of the first analytical engines.

Information is not an ordinary quantity; it is the structure of all others. It is the basis of space, time and matter, such is the setting of my theory, and on the other hand, that the essence of information is uncertainty. From what has already been said, its universality follows, even in phenomena about which we may not even be able to imagine. Physical nature is a matter of truth and only truth, we believe, so that by dealing with it so seriously we would overlook its other side.

What we see is mostly not what it is, because it becomes hard to believe the obvious (The Truth). However, the natural sciences are persistent, methodical and succeed in their consistency in discovering the less attractive parts of nature; a physical nature that knows only the bare truth, cannot lie, notices lie and does not react to them. But lies are also part of the world around us, according to this information theory — so, we come back to the same thing, that it looks more broadly than we expect from the ordinary science.

Surface II

Question: How can I, for example, understand the surface as information; is it simply "surface news" communicated to the interlocutor?


Answer: Yes, among other things, but "the rest" has a lot. I will explain one of those examples — which is probably unknown to you, and maybe at first glance irrelevant.

We've all heard of Kepler's Second Law, that the stroke between the Sun and the planet will sweep out equal areas in equal times, which we see in the picture on the left. However, it is less known that the same applies to all constant central forces (Notes to information theory I, Central Movement).

We also have this in the case of inertial rectilinear motion in the absence of force (zero force), when the body traverses’ equal paths in equal times (d = AB, Δt = const.). We then take that path as the base of the vertex triangle (C) of any point whose distance (h) to the direction of movement is constant. The area of the triangle is constant, Area(ABC) = hd/2 = const, as long as the body moves inertially, under the action of zero force.

A straight line, an ellipse, a hyperbola and a parabola are "conics" (intersection of the surface of a cone and a plane), so the same Kepler's law also applies to repulsive forces that are point (central) and constant, which move their charges along conics. The charges of electrons or protons of the same name are repelled by hyperbolas. We will find the same form (statistically, approximately) with force of probability.

That constant Kepler surface is "information" that should be as small as possible (in principle of less effect), but no one wants it from the same source, it is not given to it (law of conservation), so it is a measure of the amount of uncertainty of two charges. In this way, the surface measures the amount of communication between forces, what those forces act on, and thus it itself becomes the "quantity of options". The above interpretation does not challenge the principle of diversity; the rule not to communicate everything with everything persists in all of this.


There is a bijection (one-sided mapping) from physical effect (product of energy and time) to information (scope of options), so the constancy of these surfaces follows from the principle of least effect in a package with the law of its conservation. It is also the other way around, that from the maintenance of "Kepler's surface" the aforementioned laws follow. If we assume one (any), we derive the other. In an analogous way, the principle of minimalism follows from the more frequent occurrence of more likely outcomes, less informative.

Acceleration II

Question: How is the analogy of state and process possible, if space has three dimensions and the flow of time is only one?

Acceleration II

Answer: A witty question! It would not be possible enough if time did not have as many dimensions as space (Dimensions).

It has already been proven (Space-Time, 1.2.8 Vertical drop) that a body, during free gravitational fall, will the velocities themselves have as much a relatively slower flow of time as they would have fixed in the gravitational field. However, that part of the relatively invisible events, or time, that the body loses, we say passes through different time dimensions, analogous to its walking through spatial dimensions. In the previous image (Surface) this is happening with the planets in motion around the Sun, from the point of view of distant observers.

On the other hand, this information theory predicts that "only one" time stream should also change speed (Acceleration), but I guess that's not the topic of this question. Otherwise, whenever we have a state that changes, undergoes some process, then the process itself is defined, we should say dependent in an analogous way, on the given state. Additional "walks" of time flow along pseudo-real trajectories arising from possible unrealized outcomes, otherwise assumed by this theory.

Vectors II

Question: Do you have any further explanation of the process as "information"?

Vectors II

Answer: The question is naive, because there are many of them, like "I will tell you an important piece of information about such and such a process." But I also have deeper and more interesting connections.

Perception is communication:

a = (a1, a2, ..., an),
b = (b1, b2, ..., bn),

let's say the specified subject a with the object b, which we can define as vectors, or sequences of their respective independent properties. These can be pairs (ak, bk) of sensory perceptions such as intensity (sight, sight), or (hearing, hearing), or (touch, touch), but also many other types of communication of our cells that we are not aware of.

We extend the term "perception" from the unconscious activities of parts of living beings to the very physical substance of which they are composed and arrive at a general expression

Q = a1b1 + a2b2 + ... + anbn,

sum of products in general, and information perception in particular. Formally it is the scalar product of the perceptions of a and b. Just such kind of Hermitian vectors and operators is an accurate description of the state and process of quantum mechanics. A step further are the states and processes of macro-mechanics.

The beauty of this story is in the definition of vector spaces, which are not the only "oriented segments" on which we are trained in such an area of algebra in secondary school, but they are also the spaces of the real numbers themselves ℝ over real numbers, or spaces of complex numbers ℂ over complex numbers. Also, sequences of degree 1, t, t2, ..., tk, ... over the body of scalars Φ, usually real or complex numbers, form vector spaces, so polynomials are also vector spaces.

Solutions of the integral equation

\[ \int_a^b K(t,s)f(s)\ ds = f(t) \]

where K is the given function, form a vector space. Vector space is also the solutions of partial differential equations of the second order, and then wave equations as their subspecies, then Schrödinger wave equations quantum mechanics.

Algebra derives its theorems consistently, in this case from relatively only a few axioms of vector spaces, not caring about the representations some of which are mentioned above. This means that each of the theorems could be equally well interpreted, only if they are equally consistent with the axioms. Since the interpretations of vectors are physical states, and linear operators are also vectors, then processes are also types of states. In this way, processes are also "information".


Question: When is a "state" also a "process"?


Answer: Always. For example, the state of society through exciting times is an obvious process. But an ordinary stone changes slowly, and seems too simple to us, and yet, in the micro-world it is a small cosmic wave in constant motion.

Like the standing waves in the image to the right, the particle-waves that make up physical space, time, and matter are types of harmonic oscillators. These particles are "trapped" in standing waves of higher energies, as their nodes are denser, until eventual interaction (communication) "frees" them, so that they merge (interfere) with other particle-waves, or turn into a third one.

The number of wave packet nodes is the vector |n⟩. Here n = 0, 1, 2, ..., and we omit the detailed description of the vector for now. It is possible to avoid solving the wave equation, and especially the Schrödinger equation of quantum mechanics, and proceed to the results. Simply put, there is a quantum mechanics operator equivalent Hamiltonian to the classical H = T + V, that is, the sum of kinetic T and potential V energy of the given system. Values of constant total energy of, say, a body in free fall.

At the quantum level, there is a ground state of energy, the vector |0⟩, from which the next larger |1⟩ is first, then the next |2⟩, and so on the denser the number of nodes of that harmonic oscillator drawn above. The corresponding eigen, characteristic equation will be

|n⟩ = En|n⟩,

where En is not an operator, but a scalar (number). It is the quantity that represents the eigenvalue of the nth energy level of that state, that Hamiltonian to which the eigenvector |n⟩ belongs. We thus arrive at linear algebra, which more or less does not need differential equations of waves.

When looking for a matrix representation of the Hamiltonian, we compute the coefficients by algebraic methods using Dirac (bra-ket) notation, the usual in physics for scalar multiplication:

m||n⟩ = ⟨m|En|n⟩ = Enm|n⟩ = Enδm,n = ℏω(n + 1/2)δm,n.

Here we used the finding of possible energy levels En = ℏω(n + 1/2), which was calculated from Schrödinger equation and then well verified by experiments, and δm,n = 1 when m = n and δm,n = 0 when mn. Thus we find the operator matrix of the Hamiltonian

\[ \hat{H} = \hbar\omega\begin{pmatrix} \frac12 & 0 & 0 & 0 & ... \\ 0 & \frac32 & 0 & 0 & ... \\ 0 & 0 & \frac52 & 0 & ... \\ 0 & 0 & 0 & \frac72 & ... \\ ... & ... & ... & ... & ... \end{pmatrix}. \]

Here ℏ = h/(2π) ≈ 6.582×10-16 eV⋅s reduced Planck constant, while ω = 2πν circular frequency ν from Planck's equality E = hν. The matrix is infinite, because there is no upper limit to the energy levels. Compared to the characteristic equation above, we find:

\[ |0\rangle = \begin{pmatrix} 1 \\ 0 \\ 0 \\ \vdots \end{pmatrix}, \quad |1\rangle = \begin{pmatrix} 0 \\ 1 \\ 0 \\ \vdots \end{pmatrix}, \quad |2\rangle = \begin{pmatrix} 0 \\ 0 \\ 1 \\ \vdots \end{pmatrix}, \quad ... \]

The eigenvectors |n⟩ belonging to the eigenvalues En of the Hamiltonian operator are orthonormalized, the norms one and they are mutually perpendicular. Moreover, the forms are standard bases of vector spaces.

In the micro-world, the Hamiltonian is an operator and a process. There is no stillness there, amidst the ubiquitous undulations and their significant indeterminacy. But in the macro-world, where averaging and therefore laws of large numbers dominate, the Hamiltonian is the state of the total energy, H = T + V, kinetic and potential. For us, normal macro-systems, which we see and consider as solid bodies, unchanging states, are actually "cosmic" waves and cannot be calmed down.

As information is the equivalent of physical action, energy change and elapsed time, the smallest parts of the Hamiltonian are constantly changing, maintaining a (statistical) mean, which we see as something solid.


Question: At the micro level energy is constantly changing?


Answer: Yes, a change in energy ΔE over time Δt will produce an action ΔEΔt whose smallest amount is a quantum, the Planck's constant

h ≈ 6,626 × 10-34 J⋅Hz-1.

Information theory will then find that this action is equivalent to information. By assuming that information is a weaving of space, time and matter and that its essence is uncertainty, we come to the conclusion that there is no rest of energy, nor is there any stopping of time.

When we toss a coin in a sequence consisting of a tail and a head, it will rarely to find these two consecutive and alternating outcomes. However, statistical regularities apply.

A fair coin has the same probabilities of landing "tails" and "heads" p = q = 1/2. According to the binomial distribution, we find that in the case of n ∈ ℕ of throws, the mean value of the number of "tails" is μ = np = n/2. The variance, ie the root mean square deviation from μ, will be σ² = npq = n/4, and the dispersion σ is the root of this number. Average per throw, we find for these scatters:

\[ \frac{\mu}{n} = \frac{1}{2}, \quad \frac{\sigma}{n} = \frac{1}{\sqrt{2n}}, \]

which means that in the case of a large number of throws, n → ∞, we expect exactly half of the results to be "tails" and half to be "heads", and that the deviation from this is σ/n → 0.

Mostly in this way energy levels jump spontaneously, for example electrons in jumps from orbit to orbit of atoms. In detail, let's look at the processes, the decrease - and increase + operators for the smallest changes of energy:

\[ \hat{a}^-|n\rangle = \sqrt{n}|n-1\rangle, \quad \hat{a}^+ |n\rangle = \sqrt{n+1}|n+1\rangle. \]

We write the first one without minus in the upper index, = -, and then the latter has for the index above dagger †. As a Hamiltonian (Harmonic), scalar multiplication yields the coefficients of the matrix representation of them, ladder operators:

\[ \langle m|\hat{a}^-|m\rangle = \sqrt{n}\langle m|n\rangle = \sqrt{n} \ \delta_{m,n} \] \[ \langle m|\hat{a}^+|m\rangle = \sqrt{n+1}\langle m|n\rangle = \sqrt{n+1} \ \delta_{m,n}. \]

Those equalities define the matrix of descent, the annihilation operator

\[ \hat{a}^- = \begin{pmatrix} 0 & \sqrt{1} & 0 & 0 & ... \\ 0 & 0 & \sqrt{2} & 0 & ... \\ 0 & 0 & 0 & \sqrt{3} & ... \\ 0 & 0 & 0 & 0 & ... \\ ... & ... & ... & ... & ... \end{pmatrix} \]

and a matrix of lifting operator, i.e. creation

\[ \hat{a}^+ = \begin{pmatrix} 0 & 0 & 0 & 0 & ... \\ \sqrt{1} & 0 & 0 & 0 & ... \\ 0 & \sqrt{2} & 0 & 0 & ... \\ 0 & 0 & \sqrt{3} & 0 & ... \\ ... & ... & ... & ... & ... \end{pmatrix}. \]

Like the Hamiltonian matrix, the matrix of these is infinite due to the absence of an upper bound on the energies. You will find details about this in the book Quantum Mechanics (1.4.9 Oscillator operators).

Analogous to the tossing of a coin and the falling of "tails" and "heads", the energy levels of the quantum world are in a constant process of that "lowering" and "raising" of their energy levels. Their uncertainties cannot be stopped. However, the macro-world comes with averaging, the laws of large numbers of probability theory, from which comes the reduction of instability and the impossibility of seeing these fluctuations directly.


Question: Where is the "information" in the Ladder operators?


Answer: The commutator coordinates of two points A = (Ax, Ay) and B = (Bx, By) is surface area

[A, B] = AxBy - AyBy,

of the parallelogram spanned between sides OA and OB. Equivalent to the area (2-dim volume) is information (Surface), and in general determinant matrix measures its (n-dim) volume. At the same time, the commutator of the operator (matrix) is

\[ [\hat{A}, \hat{B}] = \hat{A}\hat{B} - \hat{B}\hat{A} = \hat{C} \]

also the operator (matrix of the same order), the determinants of the measure of volume and the equivalent of information. Only non-commutative operators communicate in this way.

We easily find that the ladder operators are not commutative and that:

\[ [\hat{a}^-, \hat{a}^+] = \hat{a}^- \hat{a}^+ - \hat{a}^+ \hat{a}^- = \hat{I}, \]

where is the identity matrix, square with ones on the diagonal and all other elements zero. Therefore, the constant oscillation of the energy of the lowest levels of physical existence is a confirmation of the possession of information.

Momentum II

Question: Ladder operators are not Hermitian, so how are they quantum-mechanical?

Momentum II

Answer: Quantum mechanics uses Hermitian operators, equal to themselves transposed conjugate, because they have real eigenvalues of magnitude measuring the intensity of observables.

Ladder operators, - and +, are real and by transposing one passes into the other, so there are two simple ways to form self-adjoint (Hermitian) operators from them:

\[ \hat{x} = \sqrt{\frac{\hbar}{2m\omega}}(\hat{a}^+ + \hat{a}^-), \quad \hat{p} = i\sqrt{\frac{\hbar m\omega}{2}}(\hat{a}^+ - \hat{a}^-). \]

These are the sum and difference of the ladder operators, where we must multiply the difference by an imaginary unit (i² = -1) to cancel the sign change by conjugation, so it is:

\[ \hat{x}^\dagger = (\hat{x}^*)^\top = \hat{x}, \quad \hat{p}^\dagger = (\hat{p}^*)^\top = \hat{p}. \]

The coefficients in front of the brackets give the operators the physical dimensions of position and momentum. By adding and subtracting the above matrices, we first find the matrix form of the position operator

\[ \hat{x} = \sqrt{\frac{\hbar}{2m\omega}}\begin{pmatrix} 0 & \sqrt{1} & 0 & 0 & ... \\ \sqrt{1} & 0 & \sqrt{2} & 0 & ... \\ 0 & \sqrt{2} & 0 & \sqrt{3} & ... \\ 0 & 0 & \sqrt{3} & 0 & ... \\ ... & ... & ... & ... & ... \end{pmatrix} \]

and second, the matrix form of the momentum operator

\[ \hat{p} = i\sqrt{\frac{\hbar m \omega}{2}}\begin{pmatrix} 0 & -\sqrt{1} & 0 & 0 & ... \\ \sqrt{1} & 0 & -\sqrt{2} & 0 & ... \\ 0 & \sqrt{2} & 0 & -\sqrt{3} & ... \\ 0 & 0 & \sqrt{3} & 0 & ... \\ ... & ... & ... & ... & ... \end{pmatrix} \]

to which we often refer to the differential form, because it is still \( \hat{p}\psi = p\psi \), when the position operator is a simple multiplication \( \hat{x}\psi = x\psi \).

The product of the change of path and momentum is action, so we have two cases

\[ \hat{x}\hat{p} = \frac{i\hbar}{2}\begin{pmatrix} 1 & 0 & -\sqrt{1\cdot 2} & 0 & ... \\ 0 & 1 & 0 & -\sqrt{2\cdot 3} & ... \\ \sqrt{1\cdot 2} & 0 & 1 & 0 & ... \\ 0 & \sqrt{2\cdot 3} & 0 & 1 & ... \\ ... & ... & ... & ... & ... \end{pmatrix} \]

and multiplication in reverse order

\[ \hat{p}\hat{x} = \frac{i\hbar}{2}\begin{pmatrix} -1 & 0 & -\sqrt{1\cdot 2} & 0 & ... \\ 0 & -1 & 0 & -\sqrt{2\cdot 3} & ... \\ \sqrt{1\cdot 2} & 0 & -1 & 0 & ... \\ 0 & \sqrt{2\cdot 3} & 0 & -1 & ... \\ ... & ... & ... & ... & ... \end{pmatrix} \]

so the commutator is constant

\[ [\hat{x}, \hat{p}] = \hat{x}\hat{p} - \hat{p}\hat{x} = i\hbar\hat{I}, \]

where is the identity matrix. Commutator is equivalent to "surface", and this "information" and constant result iℏÎ means keeping information (quantum of action) at the lowest level.


Question: What about angular momentum?


Answer: The image on the right shows the classic angular momentum. That physical size expresses the effort of the material body to continue the cycle. It is the vector product of the position vector r and the momentum p:

L = r × p,     L = rp⋅sin∠(r, p).

The vector L is perpendicular to the plane of position and momentum, the direction according to the rule of the right hand, or the right screw, and the intensity of the surface of the parallelogram spanned by those two vectors.

The angular momentum operator of quantum mechanics, due to the macro-mechanical definition itself, is a type of information. In the rectangular Cartesian coordinate system (Oxyz) the components of the angular momentum vector are:

Lx = ypz - zpy,   Ly = zpx - xpz,   Lz = xpy - ypx,

so L = (Lx, Ly, Lz). These components are also "surfaces". It turns out that the angular momentum vector is a surface, we can say that the information-vector is composed of information-components. That is why

[Lx, Ly] = iLz,   [Ly, Lz] = iLx,   [Lz, Lx] = iLy,

which means that from the "surface" (information) that they crucify (build) the first two components, La and Lb, followed by the value of the third Lc and cyclically.

Due to the expansion in all three dimensions, we do not proceed with the direct application of the above matrices (Momentum II), but we find the matrix representation of the moment in a roundabout way, through the analytical forms of the position and momentum operators. But it is also possible to find them as matrix (3 x 3) solutions of the above equalities. In short, they are:

\[ \frac{\hbar}{\sqrt{2}}\begin{pmatrix} 0 & 1 & 0 \\ 1 & 0 & 1 \\ 0 & 1 & 0 \end{pmatrix}, \quad \frac{\hbar}{\sqrt{2}i}\begin{pmatrix} 0 & 1 & 0 \\ -1 & 0 & 1 \\ 0 & -1 & 0 \end{pmatrix}, \quad \hbar\begin{pmatrix} 1 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & -1 \end{pmatrix}, \]

respectively Lx, Ly and Lz. It is easy to check that the commutator [Lx, Ly] = iLz, and then also cyclically. This idea can be generalized.

When we set the condition \( [\hat{\sigma}_1, \hat{\sigma}_2] = 2i\hat{\sigma}_3 \) and further cyclically, for matrices of the second order (2 × 2) we find:

\[ \hat{\sigma}_1 = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}, \quad \hat{\sigma}_2 = \begin{pmatrix} 0 & -i \\ i & 0 \end{pmatrix}, \quad \hat{\sigma}_3 = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}; \quad \hat{I} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}. \]

These are Pauli matrices (Fredkin, 3). These multiplied by the imaginary unit (i) give the quaternions and the corresponding commutator relations. Multiplied by other constants will give analogous formulas, expressions of the law of conservation of "information" (Pauli matrices can be of higher order).


Answer: What are observables?


Answer: Observable is what can be perceived, especially what we feel, hear, see. In quantum mechanics, an observable is a physically measurable quantity.

In ordinary life we have many unobservable phenomena such as emotions or ideas. Similarly, there are physically immeasurable quantities that we consider real in information theory.

1. In quantum mechanics, processes are representations of Hermitian operators (A), and states are vectors (x). Eigenvalues (λ) define observables by characteristic equations (Ax = λx). They depend on the operator as well as the associated, then we say, eigenvectors. It is preferable to work with an orthonormal set of eigenvectors, with the sequence x1, x2, ..., xn which forms the basis of the vector space, when the squares of the modulus of the components of a given vector determine the probabilities of measuring a particular state, the vector xk of a given experimental quantum apparatus system.

When the sum of the squares of the moduli of (each) eigenvector is one, then the sequence of possible outcomes represents a probability distribution. Each such vector, string is also called a superposition of states, and then we call their sum the same, state vector x = λ1x1 + λ2x2 + ... + λnxn. A special role is played by the peculiarly matrix P = P[x1, x2, ..., x n] whose columns are eigenvectors. With it we calculate the diagonal matrix D = P-1AP of the given matrix A, which, in addition to all other zero coefficients, has the eigenvalues of the matrix A on the diagonal.

2. The operator A is Hermitian when self-adjoint, A = (A*) = A, which means it is equal to itself conjugate transposed. As it is:

(AB) = BA = BA,

so, for the Hermitian operators A, B and C, and the constant, scalar α we have:

[A, B] = (αC),

(AB - BA) = αC,

(AB) - (BA) = α*C,

BA - AB = α*C,

-[A, B] = α*C,

* = α = iβ,

where i is an imaginary unit (i² = -1), and β is a constant real number (but it can also be a Hermitian operator). So, the commutator of Hermitian operators is not a Hermitian operator.

3. In other words, the constant information contained in commutators, for example ladder operators, or contained in the switching of positions and moments, is not an observable state, nor is it an observable process. It is a possible vagueness with perhaps a delivery in some interaction, when the subject of the process is like measurement, and only thus current information (for the communication participant).


Question: Explain to me "orthonormal" vectors in quantum mechanics?


Answer: A set of S vectors is orthonormal when every vector of that set has intensity one, and every two vectors are mutually perpendicular.

The picture on the right shows two mutually perpendicular "oriented lengths", vectors \( \vec{a} \perp \vec{b} \), which we usually first see in high school for the first knowledge of vector spaces. However, they are not of unit length, they are just "orthogonal". In order to be "orthonormal", in addition to "orthogonality", they must also have unit norms, length one.

1. Physics vector \( \vec{a} \), or a is written in Dirac (bra-ket) way |a⟩. The Hermitian matrix (operator) is self-adjoint, A = (A)* = A. By transposing and at the same time conjugating, the Hermitian matrix does not change its value. In general, when the columns of a matrix are linearly independent vectors (orthonormal or not) its determinant is nonzero and that matrix is "invertible" (there matrix A-1 such that A-1A = I, where is the I unit matrix).

Process observables A and states |a⟩ are defined by eigenvalues λ in the characteristic equations A|a⟩ = λ|a⟩, from which it follows:

A|a⟩ - λ|a⟩ = 0,

(A - λI)|a⟩ = 0,

det(A - λI) = 0.

This determinant is expanded in a polynomial by the unknown λ with roots that are intrinsic (eigen) values. When A is a Hermitian matrix (or operator) then all roots of this (characteristic) polynomial are real numbers, and if all roots are different from each other, then all of them will have eigenvectors, like |a ⟩, be mutually perpendicular.

2. Let us prove that the eigenvalue of the Hermitian operator is a real number. From the characteristic equation A|a⟩ = λ|a⟩ by covariant multiplication from the left we find:

a|A|a⟩ = ⟨a|λ|a⟩ = λa|a⟩.

Hermitizing ⟨a|A = ⟨a|λ*, due to self-adjoint, we find too:

a|A|a⟩ = ⟨a|λ*|a⟩ = λ*a|a⟩.

Compared to the previous one, λ* = λ, which means that the eigenvalue (λ) is a real number (equal to its conjugate).

3. That the eigenvectors |a1⟩ and |a2⟩ of the Hermitian operator A orthogonal, if they belong to different eigenvalues λ1 and λ2, can be proved as follows:

A|a1⟩ = λ1|a1⟩,   A|a2⟩ = λ2|a2⟩,

a1|λ1|a2⟩ = ⟨a1|A|a2⟩ = ⟨a1|λ2|a2⟩,

(λ1 - λ2)⟨a1|a2⟩ = 0,

from where, if λ1λ2, then the vector |a1⟩ is perpendicular to |a2⟩. This argument can be extended to the case of repeated eigenvalues; it is always possible to find an orthonormal basis of eigenvectors for any Hermitian matrix.

4. For example, the matrix

\[ A = \begin{pmatrix} 3 & 1 - i \\ 1 + i & 2 \end{pmatrix} \]

we check by adjoint, A = (A*) = A. So, it is a hermit. Then (Invariant subspaces, 1st Assertion) we calculate the eigenvalues λ1 = 1 and λ2 = 4, respectively:

det(A - λI) = 0,

\[ \det\left[\begin{pmatrix} 3 & 1 - i \\ 1 + i & 2 \end{pmatrix} - \lambda \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}\right] = 0, \] \[ \det \begin{pmatrix} 3 - \lambda & 1 - i \\ 1 + i & 2 - \lambda \end{pmatrix} = 0, \]

(3 - λ)(2 - λ) - (1 + i)(1 - i) = 0,

λ² - 5λ + 4 = 0,

λ1 = 1,   λ2 = 4.

Now with the known λ ∈ {1, 4} we solve A|p⟩ = λ|p⟩, the characteristic system of equations twice (the labels p will be handy). We find their corresponding eigenvectors:

\[ |1\rangle = \begin{pmatrix} \frac{1}{\sqrt{3}} \\ -\frac{1 + i}{\sqrt{3}} \end{pmatrix}, \quad |2\rangle = \begin{pmatrix} \frac{2}{\sqrt{6}} \\ \frac{1 + i}{\sqrt{6}} \end{pmatrix}. \]

Dividing their coefficients by their intensities, I reduced them to unity, ⟨1|1⟩ = ⟨2|2⟩ = 1, and their mutual scalar product is zero:

\[ \langle 1|2\rangle = \left(\frac{1}{\sqrt{3}}\right)^*\frac{2}{\sqrt{6}} + \left(-\frac{1 + i}{\sqrt{3}}\right)^*\frac{1 + i}{\sqrt{6}} = 0. \]

This means that these two eigenvectors form an "orthonormal" set S.

5. The eigenvectors p1 and p2 of the matrix A are the columns of the peculiar matrix:

P = P[p1, p2],   P-1P = 1,

\[ P = \begin{pmatrix} \frac{1}{\sqrt{3}} & \frac{2}{\sqrt{6}} \\ -\frac{1+i}{\sqrt{3}} & \frac{1+i}{\sqrt{6}} \end{pmatrix}, \quad P^{-1} = \frac{1-i}{\sqrt{2}}\begin{pmatrix} \frac{1+i}{\sqrt{6}} & -\frac{2}{\sqrt{6}} \\ \frac{1+i}{\sqrt{3}} & \frac{1}{\sqrt{3}} \end{pmatrix}. \]

The matrix P-1 inverse is the matrix P, but note that it is

\[ P^{-1} = P^\dagger = \begin{pmatrix} \frac{1}{\sqrt{3}} & -\frac{1-i}{\sqrt{3}} \\ \frac{2}{\sqrt{6}} & \frac{1-i}{\sqrt{6}} \end{pmatrix}, \]

that the inverse of the peculiarly matrix is equal to its adjoint. This is always the case when the eigenvalues of the Hermitian matrix A are different, i.e. when its eigenvectors are orthonormal.

6. Diagonal matrix D of A is found by peculiarly (auxiliary) matrix P:

D = P-1AP,

\[ \begin{pmatrix} 1 & 0 \\ 0 & 4 \end{pmatrix} = \begin{pmatrix} \frac{1}{\sqrt{3}} & -\frac{1-i}{\sqrt{3}} \\ \frac{2}{\sqrt{6}} & \frac{1-i}{\sqrt{6}} \end{pmatrix} \begin{pmatrix} 3 & 1 - i \\ 1 + i & 2 \end{pmatrix} \begin{pmatrix} \frac{1}{\sqrt{3}} & \frac{2}{\sqrt{6}} \\ -\frac{1+i}{\sqrt{3}} & \frac{1+i}{\sqrt{6}} \end{pmatrix}. \]

Note that from the previous property (5) it follows by adjunction:

D = (P-1AP) = PA(P-1) = P-1AP = D,

therefore, that the diagonal matrix has real coefficients. By the way, these are the eigenvalues of the A matrix, and as we saw here (2), they are always real for Hermitian matrices.

7. This is why the auxiliary matrix P[1, 2, ..., n] is just the way it is. When A is a Hermitian matrix of the n order, and all its eigenvalues are λ1, λ2, ..., λn are different (real) numbers, it will be:

A|p1⟩ = λ1|p1⟩,   A|p2⟩ = λ2|p2⟩,   ...,   A|pn⟩ = λn|pn⟩,

A(|p1⟩, |p2⟩, ..., |pn⟩) = (λ1|p1⟩, λ2|p2⟩, ..., λn|pn⟩),

AP[1, 2, ..., n] = P[1, 2, ..., n]D,

P-1AP = D.

When the columns of the matrix P are orthonormal vectors, then the sum of the squares of their modules is one. Therefore, the squares of the modulus of such a matrix P are some probability distributions, and the matrix is stochastic (Markov chain). More precisely, if P = (pjk) the auxiliary matrix will be Q = (qjk) stochastic matrix, where each qjk = |pjk|². If P is not a permutation of the columns (type) of the unit matrix, the columns of the matrix Q are not orthogonal.


Question: How do states define processes?


Answer: In various ways, the presence of a magnet will move metal filings, the application of heat will melt ice, participants will produce currents of state. And that is clear to us.

At the elementary (Harmonic) level of events, where the eigenvalue means the energy of the state, their difference defines the linear independence of states and phenomena (Orthonormal, 3). Hence the significance of the announcement (to the macro-world) of the differences of the sizes themselves.

1. Set of linearly independent vectors S = {s1, s2, ..., sn} we normalize (which is always possible) into vectors of unit intensity

\[ \textbf{p}_k = \frac{\textbf{s}_k}{\sqrt{\sum_{j=1}^n |\textbf{s}_j|^2}}, \quad k = 1, 2, ..., n \]

and we get an orthonormal set of vectors, a basis of some vector space. Let them be the columns of the "auxiliary" matrix P = P[p1, p2, ..., pn], i.e.

\[ P = \begin{pmatrix} p_{11} & p_{12} & ... & p_{1n} \\ p_{21} & p_{22} & ... & p_{2n} \\ ... & ... & ... & ... & \\ p_{n1} & p_{n2} & ... & p_{nn} \end{pmatrix} \]

wherein (k = 1, 2, ..., n):

\[ \sum_{i=1}^n p_{ij}^*p_{ik} = \begin{cases} 1 & i = j \\ 0 & i \ne j \end{cases} \]

which is the definition of orthonormality. Therefore, by adjoining and multiplying, we get PP = I = (δij). In more detail:

\[ \begin{pmatrix} p_{11} & ... & p_{n1} \\ ... & ... & ... & \\ p_{1n} & ... & p_{nn} \end{pmatrix}\begin{pmatrix} p_{11} & ... & p_{1n} \\ ... & ... & ... \\ p_{n1} & ... & p_{nn} \end{pmatrix} = \begin{pmatrix} 1 & ... & 0 \\ ... & ... & ... \\ 0 & ... & 1 \end{pmatrix}. \]

Note that the adjoint and inverse matrix in this case are equal, P = P-1. Then we form a diagonal matrix of the same order from any but different and non-zero scalars (λk ∈ ℂ )

\[ D = \begin{pmatrix} \lambda_1 & 0 & ... & 0 \\ 0 & \lambda_2 & ... & 0 \\ 0 & 0 & ... & \lambda_n \end{pmatrix} \]

and we form the matrix A = PDP. The matrix A is now a process with eigenequations Apk = λpk, respectively for k = 1, 2, ..., n, where pk are precisely the eigenvectors of previously arbitrarily set eigenvalues λk .

By means of the given states pk and in addition the parameters λk chosen at will we create processes A which have their interpretation in quantum. Therefore, the same series of eigenvalues (λk) can correspond to different eigenvectors (pk) depending on the operator (A).

2. There is always an operator \( \hat{B}: \vec{p} \to \vec{q} \) which will map a given vector, say one of the previous normed ones, into an arbitrary vector. When we write it, it will be a matrix

\[ \begin{pmatrix} q_1 \\ q_2 \\ ... \\ q_m \end{pmatrix} = \begin{pmatrix} q_1p_1^* & q_1p_2^* & ... & q_1p_n^* \\ q_2p_1^* & q_2p_2^* & ... & q_2p_n^* \\ ... & ... & ... & ... \\ q_mp_1^* & q_mp_2^* & ... & q_mp_n^* \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \\ ... \\ p_n \end{pmatrix}. \]

In the free interpretation, the vectors \( \vec{p} \) and \( \vec{q} \) are freely chosen states, and \( \hat{B} \) is a one-time process. It is an irreversible process, because the rows of that matrix are proportional and its determinant is zero.

This possibility supports the principle of (my) information theory according to which, at least in theory, every state with some probability, including zeros or infinitesimals, could transition to every other state. It is clear why such random events are not reversible; their matrices need not be invertible. Consistent with this interpretation, quantum processes (A) that are reversible, det(A) = det(PDP) = det(D) = λ1λ2⋅...⋅λn ≠ 0, if all eigenvalues λk ≠ 0, they can, for example, influence the past based on the present.

3. That an event now affects the past, on a quantum level, demonstrates the interaction interpretation. At the beginning of the 20th century, Heisenberg noticed how the path of the electron is defined by the measurement process. We now establish the communication and delivery of electron uncertainty to the apparatus, which makes its previous state more certain.

Brine Tank

Question: How do you relate continuous changes and discrete matrices?

Brine Tank

Answer: One of the most famous applications of matrices in continuous (time) changes are differential equations. Here I will interpret it on the most common school assignment, the uniform mixing of salt water that cascades from one container to another.

1. Water flows at a uniform velocity in cascade through three vessels of given volumes, so that the coefficients α, β and γ are (velocity/volume) the only ones relevant to us. In addition to the uniform mixing of each tank, a uniform concentration of salt in each of the tanks (vessels) is also implied, for example due to a sufficiently slow flow, overflow.

Let x1(t), x2(t) and x3(t) amount of salt at time t in each tank. We assume that salt-free water is added to the first tank, due to which its amount decreases in all three containers and eventually is lost to the drain. This cascading flow is modeled by chemical equilibrium, rate or tempo

(change) = (input) − (output).

We write the same as a system of linear differential equations:

\[ \begin{cases} \frac{dx_1}{dt} = -\alpha x_1 \\ \frac{dx_2}{dt} = \alpha x_1 - \beta x_2 \\ \frac{dx_3}{dt} = \beta x_2 - \gamma x_3 \end{cases} \]

and we can write this as a matrix

\[ \frac{d}{dt}\begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} -\alpha & 0 & 0 \\ \alpha & -\beta & 0 \\ 0 & \beta & -\gamma \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} \]

summarized x' = Ax. And that's it, for starters.

2. Let x1(0), x2(0) and x3(0) initial amounts of salt from the pool (container) respectively. We solve the first, homogeneous linear differential equation (First Order) of the given triangular system:

\[ \frac{dx_1(t)}{dt} = -\alpha x_1(t), \] \[ \frac{dx_1}{x_1} = -\alpha dt, \] \[ \ln \frac{x_1(t)}{x_1(0)} = -\alpha t, \] \[ x_1(t) = x_1(0)e^{-\alpha t}. \]

We solve the second one, which is now an inhomogeneous linear differential equation of the first order

\[ \frac{dx_2(t)}{dt} = \alpha x_1(t) - \beta x_2(t). \]

We start from the assumption βα and the shape of the solution

\[ x_2(t) = f(t) e^{-\beta t}, \]

so it is:

\[ x_2'(t) = f'(t)e^{-\beta t} - f(t)\beta e^{-\beta t}, \]

where by comparing and integrating we find:

\[ f'(t) = \alpha x_1(0) e^{(\beta - \alpha)t}, \quad f(t) = \frac{\alpha}{\beta - \alpha} x_1(0) e^{(\beta - \alpha)t} + C. \]

Putting it into solution form, it comes out:

\[ x_2(t) = \left[\frac{\alpha}{\beta - \alpha}x_1(0) e^{(\beta - \alpha)t} + C\right] e^{-\beta t}, \] \[ C = x_2(0) - \frac{\alpha}{\beta - \alpha} x_1(0), \] \[ x_2(t) = \frac{\alpha}{\beta - \alpha}x_1(0)(e^{-\alpha t} - e^{-\beta t}) + x_2(0)e^{-\beta t}. \]

We solve the third one, which is formally equal to the previous second differential equation, and we get the solution

\[ x_3(t) = e^{-\gamma t} \left[C + \beta\int x_2(t) e^{\gamma t} \ dt \right] = \] \[ = Ce^{-\gamma t} + \frac{\alpha\beta}{\beta - \alpha}x_1(0)\left(\frac{e^{-\alpha t}}{\gamma - \alpha} - \frac{e^{-\beta t}}{\gamma - \beta}\right) + \beta x_2(0)\frac{e^{-\beta t}}{\gamma - \beta}. \]

From the initial condition (t = 0) we find the constant

\[ C = x_3(0) - \frac{\beta}{\gamma - \beta}x_2(0) + \frac{\alpha\beta}{(\gamma - \alpha)(\gamma - \beta)}x_1(0). \]

All this is in case α, β and γ are different sizes.

3. Let's say that the second pool is twice and the third three times the size of the first. Let the volume of the first pool be 2 (liters, barrels, or similar) and let one unit of water flow per unit of time. Then α = 1/2, β = 1/4 and γ = 1/6. When the concentrations of salt at the initial moment in each of the three basins are equal and its total amount in the first one is x1(0) = a, then in the second it was x2(0) = 2a, and in the third x3 (0) = 3a. After time t the total amounts of salt in the pools are:

x1(t) = ae-t/2,

x2(t) = 2ae-t/4(2 - e-t/4),

x3(t) = 27ae-t/6/2 - 12ae-t/4 + 3e-t/2/2.

Brine Tank 2

On the graph Oxy is the amount of salt y over time x in the first (blue), second (red) and third (green) pools, from the bottom on up. For the initial amount of salt in the first pool, a = 1. We can see that the fastest salt washing is through the first pool, a little slower through the second, and much slower during the third, because the salt from the first passes through the second and third Swimming pool.

4. This well-known example is also instructive because of the lower triangular matrix A, which would be the upper one if the differential equations were reversed, and whose diagonal elements are (always are) eigenvalues. We will notice that the difference of eigenvalues is important for the solutions, because these would not exist due to division by zero (with the denominator γ - β, γ - α, or β - α).

Otherwise, the example can be used as a model to "flush" any fluids (gases or liquids) saturated with "substances" for which the law of conservation applies. It can also be used to analyze the idea of reducing the information of the present at the expense of the growing past (Alignment, 2).


Question: What reduction of the information of the present are you referring to?


Answer: In the picture on the right is the scheme of the energy "waterfall". Energy injected into the system flows into large structures that break down into smaller and smaller streams that eventually die. Looking from the present, an analogy is happening with the increasingly distant past.

1. As in the previous answer (Brine Tank), now with that new application, we form a system of n linear differential equations (k = 1, 2, ..., n), where α0 = 0, so x0 doesn't matter

x'k = αk-1xk-1 - αkxk

with constant coefficients αk. Matrix x' = x, that is

\[ \frac{d}{dt}\begin{pmatrix} x_1 \\ x_2 \\ ... \\ x_n \end{pmatrix} = \begin{pmatrix} -\alpha1 & 0 & ... & 0 \\ \alpha_1 & -\alpha_2 & ... & 0 \\ ... & ... & ... & ... \\ 0 & 0 & ... & -\alpha_n \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ ... \\ x_n \end{pmatrix}, \]

where all variables xk = xk(t) are functions of time t. It is the lower triangular matrix which moves to the upper triangle in the opposite order of writing the equations of the system, without changing the solution values. Coefficients αk form a series of monotonically increasing positive real numbers (0 < α1 < .. . < αn), so the eigenvalues, the diagonal elements of the matrix , are all distinct real numbers. Therefore, the eigenvectors, vk = -λk vk, more precisely

\[ \mathbf{v}_k = (0, ..., 1, \frac{\alpha_k}{\alpha_{k+1} - \alpha_k}, ..., \frac{\alpha_k...\alpha_{n-1}}{(\alpha_{k+1} - \alpha_k)...(\alpha_n - \alpha_k)})^\top \]

linearly independent (Diagonalization, 19. Proposition ).

2. The general solution of that system of differential equations (3. Theorem) is

\[ \textbf{x} = C_1\textbf{v}_1 e^{-\alpha_1 t} + C_2 \textbf{v}_2 e^{-\alpha_2 t} + ... + C_n \textbf{v}_n e^{-\alpha_n t}, \]

where C1, C2, ..., Cn constants of integration. Since αk are positive numbers, the coefficients with time (t) are negative and the sums tend to zero exponentially. The summaries of the solution reach zero faster the larger the αk number, and the larger the k, that is, the further the summation is from the initial (α1).

This is how we observe the "flushing" that the present is doing through the mass of options, choosing one outcome at a time. Accordingly, αk are the degrees of the reciprocal of the average of some number of possibilities, so the sum of the solutions will decrease very quickly. Quantizing the action (information) will make that sequence finite. In contrast to this, looking at used options, reduced uncertainties, a system of differential equations that highlight options differently could be much simpler, up to diagonal matrices. The second is a matter of accuracy.

3. In the given question, there was an interpretation of the present (the first summand) whose density of average (Shannon's) information decreases due to the increasingly long past (increasing n), so that the total sum of information now and all information of the past visible from it would remain constant. Interpreting the sum of the present and the past in this way is acceptable to information theory (mine), and the application of cascade flushing is one attempt to use simple systems of differential equations, of course, in addition to Markov chains (Alignment).

When we notice that this interpretation should be more accurate, because the coefficients of the system of equations could be variable, we notice a similar difficulty in the Markov chain. Messages adapt to the process, but the process also adapts to the messages it transmits. However, going down to the bottom to better analyze the upper, more complex structures is also possible (Packages), so this groping for complexity starting from simpler models could make sense.


Below I will list (abbreviated) answers to several questions that I received in connection with the previous couple of answers, or similar. I thank the collaborators who informed me of the errors in the previous text, which is why they have now been corrected (they asked for anonymity, so I will not mention their names).


Question: Is there any application of this method to living beings?


Answer: The question involves the solutions of a system of differential equations with constant coefficients and the eigenvalues of the matrix of the system. The answer is, of course, positive and is not unknown to the applications of mathematics. I will elaborate it on an interesting example, precisely because of the previous requirement to connect continuous changes with discrete matrix.

1. In the container with water is a radioactive isotope that can be used as a tracer for the food chain of aquatic plankton varieties I and II, in the previous picture, on the left phytoplankton, and on the right is zooplankton. These plant and animal species of plankton are aquatic organisms that float with currents. Let it be:

  • ξ(t) — isotope concentration in water,
  • η(t) — isotope concentration in phytoplankton (I),
  • ζ(t) — isotope concentration in zooplankton (II).

Typical differential equations are then:

\[ \begin{cases} \xi'(t) = a_{11}\xi(t) + a_{12}\eta(t) + a_{13}\zeta(t) \\ \eta'(t) = a_{21}\xi(t) + a_{22}\eta(t) + a_{23}\zeta(t) \\ \zeta'(t) = a_{31}\xi(t) + a_{32}\eta(t) + a_{33}\zeta(t) \end{cases} \]

or matrix x' = x, that is:

\[ \frac{d}{dt} \begin{pmatrix} \xi \\ \eta \\ \zeta \end{pmatrix} = \begin{pmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{pmatrix} \begin{pmatrix} \xi \\ \eta \\ \zeta \end{pmatrix}. \]

Thus, the task of infinitesimal calculus was reduced to the algebra of vectors. For example, a12 is the factor of change in phytoplankton isotope concentration η in water isotope concentration ξ, in at the moment t.

The characteristic equation v = λv is equivalent to ( - λÎ)v = 0, and this reduces to the characteristic polynomial by eigenvalues. It is again equal to the determinant det( - λÎ) = 0, which we usually refer to to calculate the eigenvalues. With the eigenvalues one by one, we return to the characteristic equation to then find the corresponding eigenvectors.

2. When the eigenvalues of the matrix are different numbers, the solution has the form (3. Theorem)

\[ \textbf{x} = C_1\textbf{v}_1 e^{\lambda_1 t} + C_2 \textbf{v}_2 e^{\lambda_2 t} + C_3\textbf{v}_3 e^{\lambda_3 t}, \]

where C1, C2 and C3 constants of integration given with initial values of isotope concentrations, vk are eigenvectors belonging to the eigenvalues λk of the given matrix (vk = λkvk , for k = 1, 2, 3).

For example, x' = x. The matrix of that system

\[ \frac{d}{dx} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} -2 & -4 & 2 \\ -2 & 1 & 2 \\ 4 & 2 & 5 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix}, \]

has eigenvalues λ1 = -5, λ2 = 3 and λ3 = 6 eigenvectors:

\[ \textbf{v}_1 = \begin{pmatrix} -2 \\ -1 \\ 1 \end{pmatrix}, \quad \textbf{v}_2 = \begin{pmatrix} 2 \\ -3 \\ -1 \end{pmatrix}, \quad \textbf{v}_3 = \begin{pmatrix} 1 \\ 6 \\ 16 \end{pmatrix}. \]

We check this by direct multiplication, vk = λkvk, all three k = 1, 2, 3. Hence solutions of differential equations:

x = C1e-5tv1 + C2e3tv2 + C3e6tv3.

The integration constants are determined by the initial conditions, but it is generally true that the solution vector (x) over time takes the direction of the eigenvector (v3) of the largest eigenvalue (λ3 = 6), the most dominant of the dominant.

3. When these eigenvalues are real numbers, then the concentrations of isotopes with positive (negative) exponents increase (decrease) then multiplied by the corresponding components of the eigenvectors. If one of the eigenvalues is a complex number (which can be the case with all real ajk), then due to (6. Theorem) of Euler's equality

eα + iβ = eα⋅(cos β + i⋅sin β),

we get periodic (cosine and sine) changes in concentrations in the corresponding isotopes.

4. If the eigenvalue (real or complex) is repeated, then we add two eigenvectors to it:

v = λv   and   = λu + v.

For example, do we have λ1 = λ2 = λ and λ3λ, the general solution x' = x is

x = C1eλtv + C2(teλtv + eλtu).

Indeed, the derivative of the solution is

x' = C1λeλtv + C2(eλtv + tλeλtv + λeλtu),

and on the other side of the equation is

x = C1eλtλv + C2[teλtλv + eλt(λu + v)].

So x' = x is true.

For even larger multiplicities of eigenvalues, see additional explanations (9. Proposition). Such pairs of dominants (λ, vk) are equal goals of convergence of solutions (direction x towards vk), while these "degenerate" vectors are the basis of an "eigenspace" (each vector v of that space satisfies v = λv).

5. Like the concentration of isotopes in the above example, their typical differential equations express changes that flow in the direction of the eigenvector (space) whose eigenvalue is the largest. If the characteristic is not one-dimensional (it is space), all its directions are equally dominant and the process has a dominant package of directions, reminiscent of the transfer of the possibilities of living beings during the reproduction of a species.

The development according to the "package of directions" will be the maintenance of such a package in the case of complex eigenvalues, analogous to such case matrices of different complex eigenvalues. This way will also open up new applications of this field of mathematics for modeling living systems, imitating their stages of birth and death. I have been using them for a long time in information theory, but more on that another time.


Question: Does the solution of differential equations also contain a "black box"?


Answer: Of course. If we have a vector x = x(t) of time functions t, or continuous variables and let be a system of differential equations

x' = x,

then (3. Theorem) is a solution

x = c1eλ1tv1 + c2eλ2tv2 + ... + cneλntvn,

where ck are arbitrary constants, λk different eigenvalues of the given matrix and vk are their eigenvectors, respectively for k = 1, 2, ..., n.

1. For example, the double stochastic matrix

\[ \hat{A} = \begin{pmatrix} 0.7 & 0.3 \\ 0.3 & 0.7 \end{pmatrix} \]

has eigenvalues λ1 = 1 and λ2 = 0.4 with corresponding eigenvectors (v = λv):

\[ \textbf{v}_1 = \begin{pmatrix} 0.5 \\ 0.5 \end{pmatrix}, \quad \textbf{v}_2 = \begin{pmatrix} 0.5 \\ -0.5 \end{pmatrix}. \]

The latter is not a probability distribution, but still participates in the solution

x = c1etv1 + c2e0.4tv2

system of differential equations x' = x. We can write this

e-tx = c1v1 + c2e-0.6tv2c1v1,   t → ∞,

so, if we normalize the result to be a probability distribution, then the solution becomes exactly v1. The matrix to which this would be each output vector, is

\[ \hat{B} = \begin{pmatrix} 0.5 & 0.5 \\ 0.5 & 0.5 \end{pmatrix}. \]

On the other hand, indeed \( \hat{A}^s \to \hat{B} \) when s → ∞. That limit stochastic matrix is the "black box" of exactly the Markov chain generated by the double stochastic matrix .

2. For example, the ordinary stochastic matrix (7. Example)

\[ A = \begin{pmatrix} 0.8 & 0.2 & 0.2 \\ 0.2 & 0.7 & 0.2 \\ 0.0 & 0.1 & 0.6 \end{pmatrix} \]

has eigenvalues λ1 = 1, λ2 = 0.6 and λ3 = 0.5. The eigenvectors are:

\[ \textbf{v}_1 = \begin{pmatrix} 0.5 \\ 0.4 \\ 0.1 \end{pmatrix}, \quad \textbf{v}_2 = \begin{pmatrix} 0.5 \\ 0 \\ -0.5 \end{pmatrix}, \quad \textbf{v}_3 = \begin{pmatrix} 0 \\ 0.5 \\ -0.5 \end{pmatrix}, \]

so, for the general solution of the system, x' = Ax and x = x (t), valid:

x = c1etv1 + c2e0.6tv2 + c3e0.5tv3,

e-tx = c1v1 + c2e-0.4tv2 + c3e-0.5tv3c1v1,   t → ∞.

If we normalize the left side to be a probability distribution and take the corresponding constant of integration, we find for the limit vector of the result exactly the characteristic, the eigenvector v1. The corresponding matrix, which would transform any distribution into this one, is

\[ B = \begin{pmatrix} 0.5 & 0.5 & 0.5 \\ 0.4 & 0.4 & 0.4 \\ 0.1 & 0.1 & 0.1 \end{pmatrix}, \]

so, it is also the "black box" of message transmission.

On the other hand, if we consider the matrix A as a Markov chain generator, and with the eigenvectors we form the auxiliary matrix P[v1, v2, v3], the column of eigenvectors given, will be D = P -1AP diagonal matrix with given eigenvalues on the diagonal and all other zeros . Knowing the eigenvalues and the auxiliary matrix, we write A = PDP-1, and then exponentiation As = PDsP-1, where the degree of the diagonal is equal to the diagonal degree of the elements. Thus we prove limes AsB when s → ∞. This has already been done (13. Example).

3. It is easy to see that the described method of calculating the "black box", using a system of differential equations, is also valid for non-stochastic matrices, moreover, for non-Hermitian ones. For me, it is one of the mainstays of the generalization of information theory to quantum states and processes, followed by the interpretation of (physical) states and processes using vectors and operators.

The common lesson of these interpretations is the adaptation of the state to the process, so that it becomes inherent in the ongoing change. To a lesser extent, the reverse will also apply, adapting the process to the conditions to which it reacts. All of them converge to a fictitious, each one in particular, let's call it the "black box" of the cosmos.

Limes II

Question: What does the solution tend to in the case of the same eigenvalues?

Limes II

Answer: Again, we have y = y(t), whose components are functions of time t, or some other continuous variable, as well as the system of equations y' = y, which define the derivatives y' = dy/dt.

When one or more of the eigenvalues of λ from matrix repeated (9. Proposition), say mn times, and when it has corresponding eigenvectors u1, ..., um for which:

u1 = λu1,   u2 = λu2 + u1,   ...,   um = λum + um-1,

then we have special solutions:

yk = [uk + tuk-1 + ... + (tk/k!)u1]eλt,   k = 1, 2, ..., m.

Their linear combination with the arbitrary constants c1, c2, ..., cm

y = c1y1 + c2y2 + ... + cmym

it is part of the general solution of the given system.

1. For example, when working with the matrix 1 of order n = 5, 6, 7, ..., the triple eigenvalues λ1 = λ2 = λ3 = 2 eigenvectors whose first three coordinates are as in 8. Example:

\[ \textbf{u}_1 = \begin{pmatrix} 1 \\ 0 \\ 1 \\ ... \end{pmatrix}, \quad \textbf{u}_2 = \begin{pmatrix} 1 \\ 0 \\ 0 \\ ... \end{pmatrix}, \quad \textbf{u}_3 = \begin{pmatrix} 0 \\ 0 \\ 1 \\ ... \end{pmatrix}. \]

Let the next two eigenvalues λ4 = λ5 = -1 be as in 7. Example, with the fourth and fifth coordinates of the eigenvectors (2):

\[ \textbf{u}_4 = \begin{pmatrix} ... \\ 1 \\ 1 \\ ... \end{pmatrix}, \quad \textbf{u}_5 = \begin{pmatrix} ... \\ 0 \\ 1 \\ ... \end{pmatrix}, \]

with all other eigenvalues less than 2. The general solution is

y = u1e2t + (u2 + tu1)e2t + (u3 + tu2 + t²u1/2)e2t + u4e-t + (u5 + tu4)e-t + ...

= [(1 + t + t²/2)u1 + (1 + t)u2 + u3]e2t + [(1 + t)u4 + u5]e-t + ...,

e-2ty = [(1 + t + t²/2)u1 + (1 + t)u2 + u3] + [(1 + t)u4 + u5]e-3t + ...

→ [(1 + t + t²/2)u1 + (1 + t)u2 + u3],   t → ∞.

This expression has summations that diverge. Simple normalization, as in the previous cases (Limes), now does not lead to convergence.

2. However, when the largest eigenvalue of the given matrix of the system is not degenerate (no two or more of them are the same), then there is convergence of the solution and we have a situation analogous to the previous one. Finally, those solutions move towards the most dominant of the dominant pairs (λ, v).

For example, the matrix

\[ \hat{A} = \begin{pmatrix} \begin{pmatrix} 3 & 0 & 1 \\ 0 & 2 & 0 \\ 1 & 3 & 1 \end{pmatrix} & \begin{matrix} 0 & 0 \\ 0 & 0 \\ 0 & 0 \end{matrix} & \begin{matrix} 0 \\ 0 \\ 0 \end{matrix} \\ \begin{matrix} 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix} & \begin{pmatrix} -2 & 1 \\ -1 & 0 \end{pmatrix} & \begin{matrix} 0 \\ 0 \end{matrix} \\ \begin{matrix} 0 & 0 & 0 \end{matrix} & \begin{matrix} 0 & 0 \end{matrix} & (3) \end{pmatrix} = \begin{pmatrix} \hat{A}_1 & 0 & 0 \\ 0 & \hat{A}_2 & 0 \\ 0 & 0 & \hat{A}_3 \end{pmatrix} \]

is block-diagonal. On the diagonal there are matrices from (1) the first 1 and the second 2 example and another matrix number 3. This third characteristic value is λ6 = 3, a number greater than all five previous lambdas.

That is why the mentioned solution (e-2ty) converges to the eigenvector v6 = (0, 0, 0, 0, 0, 1), of this sixth eigenvalue. Consistently, there is a limit matrix, for example

\[ \hat{B} = \begin{pmatrix} 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 & 1 & 1 \end{pmatrix} \]

which would transform every normalized vector (probability distribution) into one and the same v6. Depending on the integration constants and according to the needs of normalization of the vectors we work with, this limit matrix can be different.


Question: Do you have more examples of variable matrix coefficients?


Answer: Yes, as much as it takes, but let's keep this theory interesting. If the order of the variable derivation (Aquarium) is only one step larger, it becomes

x'' = x',

therefore, the expression of the acceleration x'' by the velocities x' where x = x(t), and vector

x = (x1, x2, ..., xn)

denotes the position of a given point at time t. That solution now is speed x' = x'(t). Convergence (Limes) then has a simple explanation: the points with the highest eigenvalue (λ), i.e. the fastest ones (v = λv).

1. In the world of physical spacetime, constant matrix coefficients = (ejk) mean constant velocity increments. However, it is not possible, because the speed of light c = 300,000 km/s cannot be exceeded. As the material point approaches that speed, it slows down. Maintaining constant growth requires investing more and more driving energy, and finally the stake becomes infinite.

As previously stated, the state adapts to the process and circumstances to a new level of movement, so that the ejk coefficients change. They change faster and faster with higher speeds, disproportionately, that is, non-linearly with movement closer to light. At lower velocities, the coefficients of the matrices are imperceptible to us, and they are significant only at velocities close to light.

2. The space-time of the theory of relativity is 4-dimensional, but originally geometric, in the picture above left, we imagine it as a 2-dimensional surface. Let the radial units of length to the center of these circles be shorter and shorter, while the perpendiculars to them (on the circles) do not change. We say that the space defined in this way is "curved" and it was the subject of the study of non-Euclidean geometries, before others by Gauss (Carl Friedrich Gauss, 1777-1855, German mathematician) and Lobachevsky (Nikolai Lobachevsky, 1792-1856, Russian), in order to others joined them later.

Let's try to understand that then the direct joint AB, the red dashed length in the picture, can be longer than the arc, the blue dashed on circle. We can observe two coordinate systems of such an image. The first is Euclidean-like, which would try to ignore changes in length units, while the second is curvilinear and exactly mimics the defined length units.

Ricci tensor Rαβ is a matrix-like quantity of type n × n, with n = 1, 2, 3, ... the number of dimensions of the observed space. This tensor represents the difference between the volume element of a given point of curved space and the Euclidean one. Those components of the Ritchie tensor describe how the volume element changes along the position of the geodetic line (path of shortest distance AB) in any direction. Therefore, the above differential equations will not have constant coefficients at the matrix during those curved space paths, when they represent motions.

3. Einstein (1916) came up with the idea of viewing these volume changes in 4-dimensional physical space-time as energy. He wrote

Gμν + Λgμν = κTμν

where the first additions of the geometry tensor Gμν with 4 × 4 components, contain the Ritchie tensor and measure the curvature of space, and on the right side is the energy tensor Tμν. By equating the "curvature" and "energy" of space, Einstein abandoned Newton's earlier "force" of gravity, discovering formulas for even more accurate predictions of classical celestial mechanics. Cosmological constant Λ, metric tensor gμν with constant κ fine-tune those general relativity equations.


The physical model of that curvilinear geometry, in addition to three spatial dimensions (x1, x2, x3) contains the path that imaginary light travels in time t, which makes the time coordinate (x4 = ict), so the shortest paths are not always the fastest. Such a case is described by the Brachistochrone curve in the picture on the right.

A ball that falls straight, or any blue line in the picture, reaches the other end more slowly than one that goes along the (middle) red Brachistochrone. Everywhere in physics, trajectories are consequences of the principle of least action, including here. It includes the shortest distances and shortest durations, when the two are not in collision. By the way, the general theory of relativity is also feasible from that very principle (Minimalism of Information, 2.5 Einstein’s general equations), that bodies strive to minimize spend effect (product of changed energy and elapsed time).

4. Bodies move in the gravitational field so that they have the least action, which means interaction, that is, communication with the environment. The energy potential of the lower places of the field is smaller, so under conditions of equal time intervals the effect of the lower information is growing, to saving action, and during free fall the body moves in that direction. But there is also the law of conservation, and the body compensates for the lack of potential energy by increasing kinetic and speed. That's why movements along ellipses also occur, for example the planet around the sun according to Kepler's law (Surface).

Places of stronger gravity are relatively less informative. There, bodies are pursued by principled minimalism. The relatively slower flow of time, which is not perceived at the target place, is attractive, so the bodies circle tirelessly. What is not perceived from other places are pieces of information that are just added in other time dimensions, outside of relative perceptions. Thus, as if due to some hallucinations, bodies are launched into free fall and orbits. Constant differences, which are the essence of information theory, are the reason for the volatility of the coefficients of the matrix.

Convergence (Limes) now also has a simple explanation, the furthest reaches the highest eigenvalue (λ) which means more prominent certainties. Uniform probabilities create a higher density of information (Extremes), so states tend to be more certain and stick "like a drunken man on a fence" of the most certain ones — natural laws.

Energy II

Question: Energy is information?

Energy II

Answer: Better to say, the change in energy multiplied by the elapsed time is information. There is no world without the flow of time, and there is no world without energy, so these two factors combined tell us about information.

How far-reaching is the principle of minimalism, that nature saves emissions of information, that by striving for states of smaller emissions, it successfully describes the worlds around us, we can see from Einstein's formula E = mc², or from a lump of sand whose grains do not reveal to our senses the least of the enormous complexity of their structure, if we know those structures.

However, energy and momentum are components of the 4-momentum vector of relativity spacetime, say (p0, p1, p2, p 3) where p0 = iE/c behind of which are the impulses of the spatial coordinates (abscissa, ordinate and applicate). To distinguish it from spatial momentum, vectors p = (p1, p2, p3), this 4-vector is denoted in a tensor way by pμ. Hence, the total impulse, i.e. energy, is the product of co and counter-variant coordinates:

\[ \sum_{\mu=0}^3 p_\mu p_\mu = p_\mu p^\mu = -\frac{E^2}{c^2} + p_1^2 + p_2^2 + p_3^2 = -\frac{E^2}{c^2} + p^2. \]

Given that this is equal to the Lorentz scalar -m²c², we find

\[ E = \sqrt{(mc^2)^2 + (pc)^2}. \]

It is an expression for total energy that is very accurate and therefore useful for physics in general. Along with that go the 4-vectors of coordinates (x0, x1, x 2, x3), where we have x0 = ict , and the rest are the spatial coordinates of the vector r = (x1, x2 , x3), say x, y and z-axes. If we denote this 4-vector by sμ, then:

sμsμ = -c²t² + x1² + x2² + x3² = r² - (ct

length of "event" 4-dim space-time.

In the macro world, elemental properties can be very well hidden, with new macro properties that are not easily recognized in the micro world. That's why we emphasize that these are small sizes, when we work with them, and which from the macro world can also be viewed as infinitesimals. Then we write the length of the event more typically as a space-time interval

ds² = dx² + dy² + dz² - c²dt²,

which is invariant (does not change with coordinate transformations), because it is the product of co and counter-variants. Consistent with this, the 4-actions are Δpμ⋅Δxμ, individually Δpx⋅Δx, Δpy⋅ Δy, Δpz⋅Δz and ΔE⋅Δt. Those products are the equivalent of information.

From the point of view of applying algebra (Aquarium) we also have an interesting outcome. As we know (Quantum Mechanics, (1.5), or (1.431)), the evolution of the state vector |ψ ⟩ describes the time-dependent Schrödinger equation

\[ \hat{H}|\psi\rangle = i\hbar\frac{\partial}{\partial t}|\psi\rangle. \]

It is easily translated into the algebraic form x' = x, where = iℏ∂t the energy operator, which I wrote about recently (Harmonic), and the vector x = |ψ⟩. It is now clear why eigenvalues (λ) and eigenvectors (v), then algebraic equations ( v = λv), parts of the solution

\[ \textbf{x} = c_1\textbf{v}_1 e^{\lambda_1 t} + c_2 \textbf{v}_2 e^{\lambda_2 t} + ... + c_n\textbf{v}_n e^{\lambda_n t}, \]

because they represent a superposition of particle-wave quantum states. Adders are orthogonal vectors, observable quantum states, and the physical meaning of the solution exists only if it is written as such.


February 2024 (Original ≽)