﻿ Blog, December 2023

# December 2023 (Original ≽)

## Exponential II

Question: What kind of information increases with probability?

Answer: That's an astute question. Although I have answered similar questions before, I have never been asked this directly before. Namely, Hartley's information, -log p, the larger the number, the greater the probability p smaller. If the event is known and happens, it is not news.

The distribution consists of probabilities p1, p2, ..., pn collects one, an experiment whose exactly one outcome is realized. Each has some Hartley information, -log pk, and their mean is the Shannon information

S = -p1⋅log p1 - p2⋅log p2 - ... - pn⋅log pn.

When n = 1, 2, 3, ... grows, there are more and more of these additions, the probabilities are reduced, some of them decrease, the Hartley factors of the Shannon additions increase, but the additions themselves reduce. When n → ∞, these probabilities become the densities of the distribution, pkρ(x), along a certain interval of possibilities (x-axis). Shannon summaries take the form φ(x) = - ρ(x)⋅log ρ(x), mean information density. Where the probability distribution densities ρ(x) are smaller, the information densities φ(x) will also be less. This can be seen in the graph above, which shows the density in blue of exponential probability and in red of information:

ρ(x) = e-x,     φ(x) = - ρ(x)⋅ln ρ(x).

Associations of a large number of independent and equally distributed random variables will tend to a (Gaussian) normal distribution of probability, but exponential usually random events related to the amount of time until a certain outcome. Exponential processes are continuous and independent events with a constant average rate, and in general they are the ones with the highest information of all distributions with a given expectation.

When p ∈ (0, 1) constant probability that some outcome occurs (tossing a coin, dice, drawing a lotto ball, etc.) in each trial, that is, in exactly the nth trial that outcome will occur for the first time by probability qn-1p, where q = 1 - p the probability that the desired outcome (in each trial) will not occur. Given that we can always write bx = e-ax, where a is a positive real number bounded by the probability p and base e, any such process will have an exponential distribution. In general, those probability density distributions, λ ∈ (0, e), have expectation (μ) and variance (σ²) respectively:

$\rho(x) = \lambda e^{-\lambda x}, \quad \int_0^\infty x\rho(x)\ dx = \frac{1}{\lambda}, \quad \int_0^\infty (x - \mu)^2\rho(x) \ dx = \frac{1}{\lambda^2}$

and the average (Shannon) information:

$\Phi = \int_0^\infty \varphi(x)\ dx = -\int_0^\infty \rho(x) \ln \rho(x)\ dx =$ $= -\int_0^\infty \lambda e^{-\lambda x} \ln (\lambda e^{-\lambda x})\ dx = - \int_0^\infty \lambda e^{-\lambda x} (\ln \lambda - \lambda x)\ dx$ $= -\ln \lambda \int_0^\infty \lambda e^{-\lambda x}\ dx + \int_0^\infty \lambda^2 x e^{-\lambda x}\ dx$ $= - \ln \lambda - \lambda\int_0^\infty x \ d(e^{-\lambda x}) = 1 - \ln \lambda.$

I especially note, because Shannon's information is not mentioned anywhere in the literature (today is October 2023), both number 1 - ln λ must be positive, as it is for the aforementioned λ ∈ (0, e). We know ρ(x) ∈ (0, 1), here for x > 0, and the density that defines the probability is

$\Pr(a < x < b) = \int_a^b \rho(x)\ dx, \quad (a, b) \subseteq (0, \infty),$

that the random variable x ∈ (a, b) will be realized.

The conclusion is that in the macro-world, the information of the average of many of the mean (Shannon) information distribution, increases with probability!

## Parameters

Question: Can you clarify a bit about the parameter of the exponential distribution?

Answer: We repeatedly repeat the random experiment until the given outcome of probability p ∈ (0, 1). If the desired outcome occurred exactly on the nth trial, the probability is pn = qn-1p, where the probability that the desired will not happen q = 1 - p. We have defined a series of probabilistic possibilities in order:

p1, p2, ..., pn, ...

which represents one probability distribution.

Indeed, each of these probabilities is a positive number less than 1, and the sum of all of them is:

p1 + p2 + ... + pn + ... =
= p + qp + ... + qn-1p + ...
= p(1 + q + ... + qn-1 + ...)
= p/(1 - q) = p/p = 1.

Therefore, we have a probability distribution.

We write it approximately in the exponential form pn = (1 - p)n-1λe-λn, and so that the sum of all is close enough to one, i.e. p1 + p2 + ... + pn + ... ≈ 1. This condition entails eλ ≈ λ + 1, which is all the more accurate as is λ ≈ 0. Therefore, λ cannot be a large number (not even close to e = 2.71828...) to have a process of continuous and independent occurrence with a constant average rate, and exponential distribution density ρ(x) = λe-λx.

If we take a too large λ > 0, we make a series of rounding errors from which the disputed Shannon information will result, 1 - ln λ, exponential distribution and then less than zero — which is impossible.

## Benford

Question: If they are dual, can the probability be the logarithm of the information?

Answer: In a way, we can do that too. Let's note that we often confuse the name with what it signifies when working with abstractions. The notation of numbers and sizes does not have an equal distribution of the first digits of various writing bases.

For example, binary 11012 has the same value as decimal 1⋅23 + 1⋅22 + 0⋅21 + 1⋅20 = 13, as well as 1⋅101 + 3⋅100 = 1⋅91 + 4⋅90 = 149 in bases 10 and 9 digits. The largest two-digit decimal number 99 will be with three (or more) digits in the lower base 1209 = 1⋅92 + 2⋅91 + 0⋅90, or 1318 which is 1⋅82 + 3⋅81 + 1&sdot ;80, i.e. record 2017 = 2⋅72 + 0⋅71 + 1⋅7 0, in bases 9, 8 and 7.

In lower bases, larger digits disappear, and smaller ones appear disproportionately more often, so we logically assume that this happens even more in higher bases, and that in some much higher bases those digits become far more evenly distributed. Canadian-American astronomer Simon Newcomb noticed this phenomenon in 1881 in logarithmic tables, that earlier pages (which started with a smaller number) were more worn than later pages and proposed the formula p( c) = log(c + 1) - log(c) which would be the probability of the first digit c of the number.

The phenomenon was noticed again in 1938 by physicist Frank Benford, who tested it on data from 20 different domains. His data set included the areas of 335 rivers, the sizes of 3,259 US populations, 104 physical constants, 1,800 molecular weights, 5,000 entries from various math manuals, 308 issues of Rider's Digest, the addresses of the first 342 people listed in American Men of Science, and 418 death rates. The total number of observations used in his work is 20 229. The discovery was later named Benford's law after him, and then other confirmations were ordered and various analytical evidence.

Benford's law determines the statistics of the occurrence of a leading digit d, one of 1, 2, ..., b - 1 in the base of numbers b ≥ 2. It appears with probability

$p(d) = \log_b(d+1) - \log_b(d) = \log_b\left(1+\frac{1}{d}\right).$

So in the decade system (b = 10) we have:

d p(d)
1 0.30103
2 0.17609
3 0.12494
4 0.09691
5 0.07918
6 0.06695
7 0.05799
8 0.05115
9 0.04576

Microeconomist Hal Varian, a PhD student at the University of California at Berkeley in 1972, noted that Benford's law can indicate possible fraud in socio-economic data in public planning claims. In 2020, I reported the anomalies of those health organization statistics to some people due to doubts about the corona pandemic at the time, and I admit, mostly in vain.

However, the probability formula p(d) is expressed using the logarithm of the first digit (as well as the following digits in the sequence) of the number, which is the answer to the above question. Benford's Law tells us about possible spoofing in numerical reports, when the fraudster misses the natural order of the numbers, usually aiming for a "realistic" situation. However, the nature of the display of numbers hides information about itself, whose logarithm is probability, and which we are not trained to guess intuitively.

## Coincidence

Question: How else can we test the numbers?

Answer: Following the answer above (Benford) new questions arise about randomness testing. If we work to expose fraud, first of all, we distinguish between two situations.

When the physical state of the measured phenomena could be "set up", with more or less falsification, to the reading of the digits, then some of the "randomness of numbers" tests will not help. An example would be election fraud where fake votes arrive in the mail before counting (perhaps US 46) and then apply Benford's statistics of first digits which will then not be violated. The second is the attempt of the "fraudster" to guess by heart, by intuition, for example, the exponential (red) curve in the picture on the right, or the (blue) Gaussian bell curve, given that he (doesn't) know the distribution parameters (mean value μ and dispersion σ).

In connection with this, I received another interesting question, about the prediction of results based on the limitation of the first digits, that is, about the very freedom of choice framed in Benford's formula. First, I would answer with a counter question, what if the distribution is uniform, if each number has an equal probability, wouldn't that also be some limitation of them. So then let's note that the "narrowing of choices" within the very nature of randomness always exists, but the question is to what extent. We only stick to the given topic.

The graph of a continuous uniform distribution is a line parallel to the abscissa, from point a to b, at the height h = 1/(b - a). All its random variables x ∈ (a, b) have equal chances, probability density ρ(x) = h, mean values, so-called. of mathematical expectation μ = (a + b)/2 and variances σ² = (b - a)²/12. So is the discrete uniform distribution. It is not difficult to understand that in guessing "random" numbers by heart, other values of these parameters (μ and σ) would appear and this is the difference detected by the "randomness test".

The problems of fraudsters, who adjust the final results for their own needs, are often increased by not knowing the type of distribution that would occur in the natural course of things. Compared with the same parameters without their intervention, even the best guess of the expectation and dispersion of the misallocation, would be suspicious of fraud and their failure. Already from the above three examples, the two graphs in the picture (exponential and Gaussian normal) as well as the uniform distribution, we see that different random processes do not have to have the same mean value μ and mean scatter σ from that expected (dispersion).

## Mystical

Question: How far can the "information of perception" throw into mysticism?

Answer: When you look at the clouds long enough, you can see any face you might expect in advance. If you roll the dice long enough, any previously assigned number will come up. It's real magic. It is nature's ability to have us to whom lies are accessible and who can believe in deception as truth.

A special side of the mystery would be the stupid theory, the special one may be correct, the next is mathematics, the practice is equally distant. Space is a great mystery, spatially or temporally as well as theoretically. In other words, what we think we know, what we know, we are just finding out is the tip of the iceberg, but it could also be very wrong. But, if we live in delusion, we are still doing well.

Our most successful logic taken by itself is an incredibly valid fantasy. Pure mathematics is, in its own way, the poetry of logical ideas (Einstein). Nature seems to be written in its language, and some believe that all that exists is that language itself. Why should the areas of mysticism be inaccessible to its abstract tools — is the right question. Especially when we consider information as the structure of space, time and matter, and uncertainty as its essence. However, the further we go it seems that we are further away from the mysticism we expect.

Information theory, with Hartley (1928) and Shannon (1948) and others, was the first to step into this zone in a mathematically consistent way. What could expand that area, I believe, is the information of perception. It includes parts of the inanimate and living world, physical substances with the wonder of biology, and the methods of algebra. It is the first one that takes into account the freedom of choice as an excess of the "quantity of possibilities", more precisely the form of "information of perception". It gives vitality to substances, and thus the possibility of physically leaving the trajectories of the principle of least effect of theoretical physics.

It is important to note that, with the transition to the macro-world, the information of the mean values of a set of average information distributions, such as Shannon's, increases with probability (Exponential II). The consequence is that the "amount of possibility" expressed by the information of perception (Amount) in the quantum world works with probabilities, states of the quantum system, while in classical technology communication works with Markov chains. At the level of normal quantities, we write the same as the scalar product of the vector of probabilities and information, almost equally.

Such Information of Perception, in "our" macro world (complex macro systems), we interpret as the amount of freedom. It is the one possessed by the inanimate world of physical substance plus eventual liveliness, i.e. vitality. It is therefore a convenient measure of situations arising from vitality, such as games, therefore economy, and then the character of the person. However, it is not such a path into "mysticism" as is widely expected. I do not believe that such, the known world of the occult, will appear with the mathematics of information that is "under the radar" of physical effect.

We are in new and unexplored fields of science, so it follows from the very objectivity of the uncertainty of this theory that there will be surprises along the way. As it was said at the beginning, the magic is also in the very ability of uncertainty that behind enough expectations in the clouds we can see almost any pre-imagined character.

## Non-physical

Question: What is non-physical information?

Answer: There are at least two types of non-physical phenomena, according to this information theory. All of them are also some informational structures, the assumption is.

The first one was mentioned in the previous answer (Mystical), as one of those unrealized options when rolling the dice. If, for example, the number "three" fell, then the number "six" did not fall and we consider it to have remained in some "pseudo-reality" (Dimensions). That non-reality, it can be shown, has its own rules of the game, its own (unreal) space, time and matter, which forms a 6-dim space-time with this (reality).

I hope it is easy to understand, that with the assumed theory, the idea of pseudo-physics as in physics, that pseudo-real world of unrealized outcomes of random events, has analogous validity. There are some (real) physical phenomena that could confirm such a thesis. Perhaps we already have them in the appearance of the bypass, or quantum tunneling (Solenoid).

I mentioned the second one recently, in the answer about half-truths (Half Truths). This would be information with insufficient effect to excite the dead physical substance, such as photons of energy "under the radar" of the electrons of the substrate, with the appearance of which they are not moved from the deposit. The starting point with universal information would have to reach such phenomena which, after all, we experience as everyday thoughts, fantasies, ideas, abstractions, about "worlds" that are not real.

Moreover, this kind of information theory pretends to interpret untruths, now with the third kind, as diluted information (The Truth). In this sense, tautology (a statement always true) and contradiction (always false) are two halves of a wider mixed "world of the truth and falsehood". Physical reality is only one part of such an extended world, and systems with vitality (living things) understand more than that.

The information of the past (Spreading) is a special story about fictions, which in its own way really reaches the present, maintaining the overall real information in the process of its decline over time. Mathematical logic, the structure of zero uncertainty, should also be part of a similar world (Theorems). In the end, I believe, this is all just another tip of the iceberg of non-physical information.

## Constancy

Question: Why do natural phenomena strive for permanence?

Answer: The deepest reason for the occurrences of nature are their coincidences, and then there is the persistence to realize more likely outcomes more often.

This is the information theory position (mine) with which I consistently answer questions. It implies some "conservation law" of randomness, from which constancy follows. The same follows immediately for the logarithms of those probabilities, and consequently for Hartley's information.

Due to decreasing summation -p log p with decreasing probabilities p, despite then the increase of Hartley's information -log p, the sum of which would give the mean value of the corresponding distribution, the so-called Shannon's information

S = -p1 log p1 - p2 log p2 - ... - pn log pn

will have the highest summations that have the highest probability. The approximation of Shannon's information to the probabilities will be the more accurate the number of n ∈ ℕ possibilities of a given distribution are greater. If there are infinitely many outcomes, only a finite number of them will always have a probability of one, and an infinite number of them are always irrelevant to the outcomes (Borel–Cantelli lemma).

In the macro-world and in general in situations of realizing many realizations at once (in a short interval of time), Shannon's information takes the forms of information perception's sums of which we can then speak of as products of probabilities (subject and object of communication). Continuities of chance are thus transferred to the same information.

That is why all the trajectories of theoretical physics known today can be derived from the Euler-Lagrange equations, and these from the principle of least action. You will find how the gravity equations are derived from them in the book Minimalisam of Information (2.5 Einstein’s general equations). Such regularities can, therefore, be derived from the law of chance, and then from informatic minimalism, if they are not already.

The same root has uniqueness. It does not only mean the differences of each of the subjects in relation to the objects with which they can communicate, which they observe and vice versa, but from the above we also see the constancy of natural phenomena for each of the subjects, regardless of the differences in conditions and processes that they might observe. That different subjects see differences in the same truths, demonstrated by the previous image, does not mean that any of those visions give up the described constancy.

Question: Everything is a trade, can you comment on that saying?

Answer: Trading is the buying and selling of securities, even goods for services, or money and other means of exchange.

Looking deeper, we are talking about the types of communication, the exchange of information that is immanent in everything we "trade". Then there is the law of conservation during the process.

In this sense, the trader's classic gain will be followed by some less visible or less important loss than the acquired profit. It follows from this the relativity of the evaluation of the "amount of options", that is, the information that is exchanged. For example, a politician trades for his own revenue (to avoid sanctions, a trial, or simply to earn something personally), a relatively big deal for himself, but ceding some state values to a cunning bidder.

From the point of view of information theory, giving more for less means losing vitality (amount of options, power), so the politician who profits then, personally at the expense of the state, gains locally but loses globally. By reducing the power of the state, it undermines its position from that point of view for the sake of narrow comfort. Conversely, by giving less for more, the negotiator can get both, but the personal "income" is so usually delayed that it is unattractive to many.

What I'm talking about here is the Pareto Rule, or a similar personality type relationship (Traits) in which the number "goodness" compared to the number of others, or with others, that number is "manipulators" in relation to the number of "evils" (Suppression). Those who succeed in delaying profits are generally higher in the competition to win than the impatient ones (Reciprocity). That is why the number of "impatient" (bad) trades in politics is significantly higher than the more useful ones, but it is also more useful to support the strongest players who can master this and the states (Deep State).

Regardless of (my) information theory, simulations have long shown that the "surely" game, by combining only good with good, will almost always lose to the occasional "sacrifice" game, by risky giving less for more. Thus, they are Markale typically Anglo-Saxon maneuver. Now everyone knows that Markale and the explosion in Vase Miskina Street (February 5, 1994, when 68 people were killed and 144 wounded, and another attack on August 28, 1995, when 37 people were killed and 90 were wounded) were a set-up, which Aliya made a profit.

The "sacrifice to victory" strategy owes its success to the mentioned relativity of values, and on the other hand to "constancy". The latter are deviations from good habits developed through evolution. Other "forces of probability" are added to the inevitable changes, which constantly occur in the environment, finding the subject somewhat unadjusted, to the power of "risk". What we are comfortable with, in other words, is rare and profitable.

Trade in the narrower sense (exchange of goods, money and services) is also subject to change. The manufacturer and the seller are constantly obliged to innovate in order to maintain and increase the level of profit. With the same goal, they avoid the "surely game" and take loans (risk for profit), see "patience" as a value, but do not always manage to practice it, because the vitality of the top league belongs to the few. The rarest ones, with vitality and morality, who can win and be honest, we recognize as charismatic.

## Autonomy

Question: How many degrees of freedom does "information" have?

Answer: Elementary "news" should have only two "arms", one in the present and the other in the future. It is something 2-dim, like a sphere into which a thinning object that collapses into a "black hole" is stretched, due to the shortening of radial lengths and the slower flow of time under the influence of its increasingly powerful gravity.

Those are already two arguments for only two degrees of freedom of elementary, let's say, physical information. The third is the Heisenberg effect, the product of the uncertainty of time Δt and energy ΔE, i.e. position Δx and momentum Δp, where ΔtΔE ≈ ΔxΔph, written approximately for comparison with the Planck quantum of energy E = hf, where the oscillation frequency of light is f = 1/τ, with τ period of oscillations, and h is the elementary action and Planck's constant. In those packages, quanta, there are the smallest amounts of physical information.

Hence the value of the commutator as an interpreter of the surface in the understanding of Kepler's second law, which says that the radius vector from the Sun to the planet in equal times covers equal surfaces — by "force of probability".

A different insight into the two-dimensionality of least uncertainty gives us Chebyshev's inequality

Pr{|X - μ| ≥ ε} ≤ σ²/ε²

where X are the random variable, μ its mean value, σ dispersion and σ² variance, and ε is the assumed deviation of the random variable from its mean value. When the distribution of a random variable is spread over a larger set of values, a smaller number of members remain relevant to the mean (Borel–Cantelli lemma), so that the product the probability of that deviation (ε = |X - μ|) and the square of the deviation (ε²) remains constant. We interpret this as the two-dimensional randomness of the deviation.

As the system grows, with the accumulation of random events and distributions, its smallest parts become less and less visible, so that the phenomenon is properly called the law of large numbers of probability theory.

## Deceit

Question: Deceptions are "non-physical" information?

Answer: Imagine yourself in the company of a murderer who hides a crime thinking that you are not a witness, that you are uninformed about it. You pretend to be naive by deceiving him, because otherwise he would have to kill you.

It's a deception-deception situation, like a game of "reciprocity", at the level of League I strategy. None of these lies affect physical substance directly, although their omission can drastically alter reality. It is a kind of deception that does not sit in the "front row" of natural performances, but "screams" from the background that it also participates in physical reality.

Deceptions do not have to have such drastic consequences, as the triggers of Chaos theory nor the cumulative flows of "butterflies in Mexico whose wingspan would cause a hurricane in Texas" — in the words of Lorenz (Edward Lorenz, 1917-2008, American mathematician and meteorologist). We can be aware of those lies that will not cause effects on dead physical matter, and those, if they could, belong to some pseudo-physical reality.

The theory of information that I am working on pretends to include all such.

## Occam's razor

Question: Do physical laws apply to illusions?

Answer: It is possible that "deceit" is somewhat subject to physical laws. As for the truths, given the realities that still life hold as a "drunken man clinging to a fence", and on the other side of the informational structure of the world, abstractions of parts that would be "true", would have their analogy in theoretical physics.

For example: "of two theories that make the same prediction, choose the simpler one" (Occam's Razor, 14th century) — would be the "principle of least action" of physics. To avoid confusion, we know that even in the real world it is not always possible to get from point A to point B by the shortest route, so let's not expect that in the unreal one either. The main example, of course, is the connection between mathematics and theoretical physics.

The origin of abstract truths is a new topic, and perhaps for the first time it will be seriously raised in information theory. My proposal (Theorems) is to consider the existence of "everything" longer than 13.8 billion years, where the "Big Bang", which we take the beginning of the cosmos as the limit of the "black box", the farthest moment of visible data from the past of the physical universe.

A much longer history of the existence of the universe, than those 13.8 billion years, would allow the principle of saving (minimalism) information to give rise to mathematical truths, among other certainties. That idea has no problem with logic (mathematical, of course), as it will have with measurements, but it will not be able to seriously threaten it. There are no mathematical theorems that can be proved or disproved by experiments.

For example, the adherence of all "other events" (less certain) to those "laws" (certainties) is also a consequence of the mentioned principle of minimalism, i.e. more frequent occurrence of more probable outcomes, from which we can also derive physical forces. However, the worlds of "illusions" are infinitely greater than the worlds of "truths themselves", even thus supplemented.

## Perception

Question: What is the "information of perception"?

Answer: The answer to this question would be a long elaboration and too professional for the interlocutor, from which I extracted and reinforced a few details, that were not outlined before.

Information of perception is first a sum of products. Then, it is a number that measures the intensity and overlap of two vectors

Q = a1b1 + a2b2 + ... + anbn = ab cos φ

which is the larger the vectors are "longer" (the intensities a and b are higher), and the angle (φ) between them is smaller. According to the Cauchy–Schwarz inequality, and from the above equality quite obvious (Qab), will be Q probabilities if the vectors are some probability distributions.

In Hermitian vector spaces (quantum mechanics) all such sums of products, which are scalar products vectors, be real, and their coefficients can be complex numbers. Therefore, the squares of the modulus of the coefficients |ak|² and |bk|² we can also treat them as probabilities, and the vectors themselves (in quantum mechanics they are called superpositions) as probability distributions. Therefore, Q is a probability observable. That perception information (Q) is thus the value of the chance that in the environment (b1, b2, ..., b< sub>n) is measured particle-wave (a1, a2, ..., an ). Different choices of string members will give different measurement situations, but also explain quantum entanglement (Summary).

The states of the quantum system are defined by the Hermitian space vector with its components. They can be quantum numbers (for electrons they are: principal quantum number, orbital, magnetic and spin quantum number) that can be used to describe the path and movement of electrons in an atom. Combined, they must be aligned with the Schrödinger equation.

We interpret the Hermitian space of vectors as a quantum system, vectors are quantum states with Hermitian operators as quantum processes. When we multiply processes, processes are created, products of processes and states are states, products of states are probabilities, and all these are information of perception.

This quantum-mechanical, somewhat unusual for mathematics, interpretation of probabilities is also consistent with the understanding of probability force (my) information theory. However, classical probabilities with information theory have their own interpretations of "information of perception", once they recognize them (see Transformations et seq.). It's mostly a theory about Markov chains. The novelty in this will be the observations of the differences between Markov processes that become "black boxes" and those that will preserve information by means of "rotations", which I have already written about.

The novelty is also the observation of the fusion of probability and information in the macro-world, which gives the information of perception a new quality. From the information about the measurement of the quantum state, it thus reaches the "quantity of options" in the world of our sizes via transmission through the Markov channel, so it is also a measure of vitality, or especially, the strength and skill of the game to win. We do not have to be aware of all our perceptions measured in this way, while we possess them. Moreover, we can perceive the same object differently at different times, like, for example, a vase or the two faces in the picture above.

## Volume

Question: Is all information two-dimensional?

Answer: No, at all, where did such a thought come from, perhaps from two degrees of freedom (Autonomy)? If small particles are not large, or at all as they are, complex bodies are not exactly equal to them, because such would then be the same particles.

Binary choices, indicators that something did or did not happen, and the so-called "elementary news" there, would have only two "legs", the first of probability p1 ∈ (0, 1) occurrence of the given outcome and another probability p2 = 1 - p1 of non-occurrence of the expected outcome, or one in the present and the other in the pseudo-present. The vector p = (p1, p2) represents an elementary probability distribution.

We write the transfer (q = Ap) of this probability distribution as a matrix

$\begin{pmatrix} q_1 \\ q_2 \end{pmatrix} = \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \end{pmatrix}.$

Here q = (q1, q2) is also a probability distribution. We assume that the matrix A = (aij) is stochastic, where the indices i, j ∈ {1, 2}, that both its columns are a.1 = (a11, a21) and a.2 = (a12 , a22) of some distribution, so we denote the same matrix with A[a.1, a.2]. The determinant of a matrix is the "volume", it is a measure of the dimension of the row of vectors that the matrix maps.

In the case of a second-order matrix, we have a determinant:

$\det \begin{pmatrix} A_x & B_x \\ A_y & B_y \end{pmatrix} = \begin{vmatrix} A_x & B_x \\ A_y & B_y \end{vmatrix} = A_xB_y - A_yB_x = [A, B],$

and it is a commutator that interprets the surface. The determinant of a third-order matrix (maps tripartite arrays, vectors) is a 3-dim volume, and the determinant of a n matrix (maps ordered n-tuples) is n-dim "volume", where n is a natural number, from the set ℕ = {1, 2, 3, ...}. When n = 1, the matrix is one-term, it is a number, scalar, and so is its determinant.

Accepting the logic of this n-dim information inevitably leads to the expansion of the term "perception information" (Perception). This multiplication of matrices, which can be interpreted as information, already exists in quantum mechanics, when, for example, we interpret position and momentum as operators, and their product is an action. When we multiply processes, processes arise, products of processes and states are states, products of states are probabilities, and all these components are information of perception.

The components of vectors or matrices, from scalars Φ, are usually real ℝ or complex ℂ the numbers and have a commutative multiplication operation, but it doesn't have to. Commutativity of multiplication is not necessary for that field structure, so that and regular (invertible, square) matrices could be used for scalars of vector spaces, although they are not of commutative multiplication. Along with those cases of non-commutativity of processes come their relations of uncertainty, otherwise necessary for vitality.

When the scalars in the sum of product are non-commutative matrices:

Q' = a1b1 + a2b2 + ... + anbn
Q'' = b1a1 + b2a2 + ... + bnan

then Q' ≠ Q'', which means that the mutual perceptions of the second by the first and the perception of the first by the second subject are not necessarily equal. That formalism, therefore, will have intuitively acceptable interpretations, and this possibility comes precisely from the existence of information with dimensions greater than two.

We see from the last one

Q = Q' - Q'' = [a1, b2] + [a2, b2] + ... + [an, bn]

where individual commutators [ak, bk] = akbk - bkak are matrices of order n, that is, information or actions. Such a Q is again some information of perception, and on the other hand, it is a process.

## Far Before

Question: How do you imagine the "time before time"?

Answer: What I am going to talk about the history "before the Big Bang" does not concern any of the famous fantasy theories, probably, but only the concept of (this) information theory.

Timelessness, or all-time events that would extend indefinitely through the past of the given present, and given the limitations n-dim volume, it would stretch like very thin threads that could be of infinite length. As information builds structures of physical action, energy changes over time, the energies of those threads would be infinite, unless they are under the radar types of action.

Infinities and other logical knowledge (mathematics) are necessary parts of this theory, and the way to make such durations possible is just stated. Time is a factor of change, so we see greater complexity of logical structures in the emergence, that is, in the addition of the axioms one by one. Thus, topology (geometry without metrics) could appear earlier, and absolute geometry (without the 5th Euclidean postulate) before the geometries of Euclid, Lobachevsky, or Riemann. Each mathematical structure is enriched by the addition of new independent rules, with a reason described by evolution.

Path, length x = ict is traveled by light at speed c in time t, where i² = -1. It is an imaginary number if time is real and vice versa. It gives meaning to bypass and secrecy to processes and spatial distribution. The appearance of space was preceded by bosons (all of which can fit into a single one), as we know, so that from them (the Higgs bosons) created fermions. The development of the universe then took place in the formation of substance and the separation (formation) of the laws of physics, at the same time with the increasingly important decomposition (melting) of matter into space, from which the expansion of the universe follows.

I have already described this finale, from the Big Bang to the universe today, and now I will not go into detail, but only mention it to emphasize the logical connection of the part with the whole, and all together with the (hypo)theses of information theory. Growing and more numerous obligations make the future more mandatory, the present thinner, rarer information and less and less uncertainty. The dregs of the past focus what could be next.

## Chance Matrix

Question: Can you interpret probability with a matrix?

Answer: Yes, it is logical to have a probabilistic interpretation for matrices, when we have it in the expression of pairs of products of perception information (Volume). I am not referring only to stochastic matrices (6.2 and further), but also to the "intensities" of matrices, their determinants.

I will demonstrate it here with an easier example, the second order stochastic matrix, where a, b ∈ (0, 1), with a' = 1 - a and b' = 1 - b

$M = \begin{pmatrix} a & b' \\ a' & b \end{pmatrix}.$

The numbers in the columns are the (conditional) probabilities of two independent distributions:

a : xy,     b : x' → y'.

The matrix transfers (any) quadratic distribution u = (x, x') to quadratic distribution v = (y, y'), so that v = Mu, which we also write M : uv. The starting distribution can be given by flipping a fair coin (x = 1/2, x' = 1/2), also dice (x = 1/6, x' = 5/6), or similar yes-no outcome (Indicators).

A stochastic matrix has an inverse stochastic matrix if it is obtained by permutations of the columns (or rows) of the unit matrix (Informatic Theory I, 07. Channel matrix, Theorem 1). This does not claim that there is no inverse M-1 such that MM-1 = I, but that M-1 is not stochastic, except when M is the result of mixing the columns of the unit matrix I. For example

$MM^{-1} = \begin{pmatrix} \frac{7}{10} & \frac{4}{10} \\ \frac{3}{10} & \frac{6}{10} \end{pmatrix} \begin{pmatrix} 2 & -\frac43 \\ -1 & \frac73 \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}$

and we see that the inverse but non-stochastic matrix exists.

The following theorem (Channel) states that the determinant of a matrix is nonzero when each of its diagonal elements is greater than the sum of the rest of that row. Then from the received message it is possible to reconstruct, discover the sent one (without the inverse stochastic matrix). Thus, when the transmission of every k-th message is more likely in k-th than in some another, then the determinant |det M| is a larger number.

However, this determinant cannot be greater than one. For example:

det M = ab - a'b' = ab - (1 - a)(1 - b) = a + b - 1 ≤ 1,

for the specified matrices of the second order. In the general case, let's refer to the trace of the matrix, which we know is the sum of the eigenvalues of the matrix, which in the stochastic case is less than the sum of the diagonal elements (not greater than 1). Therefore, its characteristic values (eigenvalues) are numbers from 0 to 1. The determinant is the product of the characteristic values of the (stochastic) matrix, so it cannot be greater than one.

In other words, the determinant of the stochastic matrix is a number from the interval [0, 1], which means that it takes probability values. That number is all the greater the more accurate, more certain the transfer is, and on top of that, the determinant of the n-th order is the n-dim volume of the matrix (Volume), which completes the meaning.

In this sense, the matrices by which we would represent sums of products as probability perception information are regardless of the order of the matrix. For n = 1 this was obvious to us before, and for n > 1 we will understand the same when we notice that macro-intermediate information tends towards more probable outcomes (Exponential II).

## Quantum Matrix

Question: Is the quantum matrix also a probability?

Answer: It is, analogous to the expression of pairs of information products of perception (Volume), which means in the manner of probability in quantum mechanics (Born rule).

First, I will explain a bit about the quantum Born law with the help of the picture on the right. I then translate that into Dirac notation and the actual quantum mechanical explanation.

1. A vector is given

$\overrightarrow{OA} = \vec{a} = (a_x, a_y, a_z)$

in the Cartesian rectangular coordinate system. Let the angles of the vector according to the directions of the coordinate axes be:

$\alpha_x = \angle(\vec{a}, \vec{x}), \quad \alpha_y = \angle(\vec{a}, \vec{y}), \quad \alpha_z = \angle(\vec{a}, \vec{z}).$

Then the sum of the squares of the cosines of these angles

$\cos^2 \alpha_x + \cos^2 \alpha_y + \cos^2 \alpha_z = 1.$

We know that from elementary mathematics. As in the 3-dim coordinate system, the same is true in the n-dim system where we have n ∈ ℕ coordinate axes and as many angles α1, α2, ..., αn of the slope of the vector to the axes. The sum of the squares of the cosines of these angles is again one. □

In quantum mechanics, coordinate axes are chosen to represent observables, and vectors are quantum states. The closer the vector is to the k axis, the smaller the angle αk and the larger the cosine of that angle. There is a greater chance that the quantum state in the measurement appears as the k-th observable. The sum of the squares of the "cosines", their non-negativity and the chance in the measurement say that they are the distribution of probabilities of finding a state in a given quantum system (vector space).

2. The example was a convenient introduction to targeted explanation (Quantum Mechanics, 1.1.6 Born rule). The corresponding vector to the previous one is |a⟩ written Dirac notation. It represents OA in the given image. The orthogonal projection of the point A onto the axis |x⟩ is the point Ax, and the orthogonal projection of the point A x on the given vector let be the point B. We write the measurement of the observable

$\overrightarrow{OA} = |x\rangle\langle x|a\rangle.$

The vectors representing different physical properties are not collinear (they are orthogonal) and their sum is not equal to the scalar sum of the intensities of the parts. Therefore, the quotient of the numbers ⟨a|x⟩ and |a⟩ is not a probability. To get the participation of individual coordinates in the total probability, let's find the contribution of each component of the vector along the direction |a⟩. We get:

$\overrightarrow{OB} = |a\rangle \langle a|\overrightarrow{OA} = |a\rangle \langle a|x\rangle \langle x|a\rangle = |a\rangle |\langle x|a\rangle|^2.$

We decompose the vector of the system into the sum of the contributions of individual components along the direction |a⟩, however many of them there are

$|a\rangle = |\langle x|a\rangle|^2 + |\langle y|a\rangle|^2 + |\langle z|a\rangle|^2 + ... =$ $= (|\langle x|a\rangle|^2 + |\langle y|a\rangle|^2 + |\langle z|a\rangle|^2 + ...)|a\rangle = |a\rangle,$

because the sum of the probabilities of the independent outcomes (the expression in parentheses) is one. That's why we have to use squares for probabilities

$|\langle x_k|a\rangle|^2 = |a_k|^2 = a^*_ka_k.$

These, and not some others, are functions of the amplitude ak in the representation of the quantum state vector. □

No one will tell you that the mathematics of quantum mechanics is an easy subject, so even if you didn't understand the explanation, it's not up to you. Trust me, let's say everything is clear, and let's continue.

Therefore formula: the number n of elements of the sets A and B, the probability P of the sum of such random events, and the same labels, but now the amplitude A and B of quantum particle-waves |.|², we can write:

n(AB) = n(A) + n(B) - n(AB),
P(A + B) = P(A) + P(B) - P(AB),
|A + B|² = |A|² + |B|² + 2ℛℯ(A*B).

Note the squares of the moduli (complex numbers) of the wave amplitudes in the third row, with the real part ℛℯ(A*B) = axbx + ayby, we have A = ax + iay and B = bx + iby , with ax, ay, bx and by real. The imaginary unit squared is i² = -1.

3. Let A and B be Hermitian matrices or operators, i.e. the ones we use in quantum mechanics. If they are of the second order (n = 2), then it is

det(A + B) = det A + det B + (det A)⋅Tr(A-1B),

which is not difficult to prove.

Namely, it is known that the trace of the matrix is the sum of eigenvalues (λ1 + λ2), and the determinant is the product of those values (λ1 ⋅ λ2), where Mx = λx. The matrix M is A or B and x is the eigenvector of the corresponding eigenvalue λ given matrices. The square of the absolute value, |λ|², is the (upper) probability.

Especially, when the eigenvalues of the matrices A and B are respectively αk and βk, by two (k = 1, 2), then we arrive at the previous equality from:

det(A + B) = (α1 + β1)(α2 + β2) =
= α1α2 + β1β2 + α1β2 + β1α2
= det A + det B + α1α222 + β11).

This third addition with parentheses is exactly (det A)⋅Tr(A-1B), because the inverse matrix has a reciprocal determinant of the starting one, with reciprocal eigenvalues. After this, the above expression, the determinant of the sum, must be squared. □

As we can see, summing these matrices as "quantum probabilities" makes sense and is not impossible, but it is more complicated than reducing them to classical quantum and ordinary probabilities. It becomes even more complex for quantum states with a greater number of freedoms, then dimensions n > 2 (Determinants of Sums). However, the determinant of the product, det(A ⋅ B) = (det A) ⋅ (det B), is the product of the determinants of those matrices, so the interpretation by probabilistic of the matrices becomes easier in calculus. □

4. As determinants of the matrix, det M = λ1 ⋅ λ2 ⋅ ... ⋅ λn, increases in value if the higher values of the product of all probabilities |λk|² observable, then the determinant of the quantum matrix will have a larger value if the measurement probabilities for more of these n options of the quantum state are larger. At the same time, we give "matrix probability" the meaning suggested by the question. A more probable process and the corresponding operator, i.e. the matrix, have a larger determinant. □

Thus, we can consider the answer to the question as positive, but perhaps not in the way expected by the one who would ask it. Although the quantum matrix is a far form from the stochastic one, this result brings them closer together.

## Absurd Converge

Question: An absurd is "rapprochement" of the information with probability?

Answer: You noted well (context from the broader question), that we have (Matrix) another way of rapprochement information, or drowning it in probability.

The determinant of the matrix is the product of its eigenvalues, and these of the measured wave amplitudes of quantum particles and the probability that it appears as an observable (physically measurable quantity) in the process of physical observation.

On the other hand, limiting the probabilities to a maximum of 1, therefore their accumulation around the upper value, results in an equalization of the outcome chances and an increase in information. However, nature "doesn't like" (Minimalism) more information, it cannot return it, which leads us to interesting developments.

If the probabilities belong to a distribution (they sum to one), then their product is maximal when they are uniform, and when they are free from distributions (each can go up to one), then their product is greater when they are larger, the closer to one. In both cases, the process information grows.

First, the increasing determinant will be nonzero and represent an invertible matrix. It is not necessarily a stochastic process (Chance Matrix), it does not have to be reversible itself, but it is one whose input data can be deciphered based on the output. Thus, we have a formal-mathematical model of the explanation of "memorization". For the first time in the entire history of science, we can analyze "recollection" with the certainty of geometric probability. For now, we are only interested in the details, that the information-laden process "bursts at the seams" and, among other things, deposits surpluses in "storing".

The second is the phenomenon of "accumulation in the past" which I have mentioned several times (Past II) as a possibility that would explain the conservation of the total amount of information, with the necessity of changes and the decrease in the density of information the present. While we are now discovering the condition of the origin of the past. Memory thus determines the course of the present, limits its options, but also has forms like "dark matter" (gravitational action of the past) with which galaxies without it can be understood (AGC 114905).

The third is the treatment of time as parallel dimensions. These are similar states of the past, but without communication with our present. In fact, they are their own present physically and logically equivalent to ours, of which there is a continuum. More precisely, if infinitely many moments, events of our reality are countable, then there are the continuum of options. And the options build worlds of parallel dimensions, mostly without mutual communication.

The fourth would be the relative slowing down of the time of the system in motion, or in the gravitational field, as a response of principled minimalism to efforts to condense information by adding energy to it. Due to the equivalence of action (energy × time) and information, and the law of conservation, times become that much longer. Processes slow down, also events that remain with one foot in another reality, invisible to a relative observer.

The slower flow of time and the contraction of lengths in the direction of movement create the effect of a false curve on the surface of the cylinder (roller), as opposed to the real Gaussian curvature (Theorema Egregium) which we express by the Riemann tensor for radial changes in the gravitational field. According to information theory, consider this curvature to be 4-dim within 6-dim spacetime. The appearance of gravitational attraction is thus, absurdly, a movement due to a relative lack of information due to its real (own) excess.

The above four cases are examples of real-imaginary space-time mapping (x = ict) which relaxes the "convergence" of information to the probability. They are not the only ones, but I believe they are a surprise to those who do not know this (new) theory of information, so I highlight them. Other forms of spontaneous avoidance of denser information or details like this will be a topic on another occasion.