
Sufficiency
Question: Why do we have different theories that speak of the same truths?
Answer: This is a question of sufficiency first, then abundance. Let's try to understand it through mathematical truths, using Russell's paradox of sets and by it inspired the Gödel's impossibility theorems, then information theory.
Russell's paradox (1903) is easier to understand with the imaginary "village with a barber that shaves all those who do not shave themselves." The paradox of that situation is that the barber can't shave himself, because he doesn't shave people like them, but then he has to shave himself because he shaves everyone like them. He cannot remain unshaven, let us note, because he shaves those who do not shave himself.
Let's further imagine replacing the concept of "shaving" with various abstract ideas of "belonging" which we have a lot in the fields of mathematics. We thus arrive at the impossible set of all sets that are not themselves members. Hence the very idea of "the set of all sets" is contradictory. A variation on this is the observation that there is no such thing as a "theory of all theories" discovered by Gödel (1931).
Mathematics is the "being of truth" in which we deposit and which returns truths and only truths. The smallest such units are axioms, which we choose so that they are mutually independent and sufficient for a given area of mathematics, and from which, by combining and reasoning, we will obtain conclusions, more and more complex positions, theorems of a given branch. For a particular axiom, let's imagine "a field of attitudes that cannot be derived without a given axiom" and here we are in Russell's paradox, and actually in Gödel’s impossibility of "a theory of all theories".
Said being of mathematics cannot tell us a falsehood, for example that 2 = 3. But it can tell us that it cannot tell us that falsehood. Therefore, it is "aware" of untruths, in the sense that the mathematics can assume them, moreover, it cannot exist without such issues. The latter is evidenced by every proof by the method of contradiction, that by assuming the accuracy of something and reducing it to a contradiction, we obtain its inaccuracy. In some other choice of axioms, that "something", rejected before, would be mathematically acceptable.
In order not to go further here than the different axioms of parallelism and at least three types of geometries, in which the ratio of the circle's circumference and diameter is less than, equal to, or greater than π, let us note that the concept of "truth" is, in addition to its incompleteness, chronical replenishment, and at its own way relative. It's not just that we don't have the truth of "everything about everything," but we also don't have the truth of "everything about something." On the other hand, by changing only one of the given axioms, we can get the other side of the truth of the others.
So, we have different theories that talk about the "same truths", because they have different faces. They are layered because they belong to the world of uncertainty of (my) information theory, without which there are no physical phenomena (let's say space, time and matter) or abstract ideas.
Аbundance
Question: Now explain to me the question of "abundance"?
Answer: In my scripts, for example at the end of Information Theory III, you will find several difficult theorems about data transmission channel matrices that I will now try explain easily, but not trivially.
In short, nature spontaneously tends to minimalism and tries to avoid equal chance probability distributions. That is why there are unequal chances of environments, first of an exponential form that have maximum mean information for a given expectation (μ), then distributions of Gaussian types (bellshaped) with less information, but maximum for a given dispersion (σ), and then further fragmented and rarefied. That is why differences multiply, due to the pressure of the desire for smaller given information and the law of conservation of total information.
Example 1. A uniform distribution of n = 2, 3, 4, ... outcomes would have equal probabilities p_{1} = ... = p_{n} = 1/n and mean (Shannon) information
\[ S_1 = p_1 \log_b p_1  ...  p_n \log_b p_n = \] \[ = \frac{1}{n} \log_b n + ... + \frac{1}{n} \log_b n = \log_b n \]which grows logarithmically with the number of outcomes n. When \(n \to \infty\), then \(S_1 \to \infty\). If the base of the logarithm is \( b = 2 \), the unit of information will be "bit" (binary digit). □
Example 2. If the distribution becomes exponential, like the infinite:
\[ p_1 = \frac12, \ p_2 = \frac{1}{2^2}, \ p_3 = \frac{1}{2^3}, ..., \ p_k = \frac{1}{2^k}, ... \]will have the sum \(\sum_{k=1}^\infty p_k = 1 \) and the final mean information:
\[ S_2 = \sum_{k=1}^\infty p_k \log_2 p_k = \] \[ = \frac12 \log_2 2 + \frac{1}{2^2} + \frac{1}{2^3} \log_2 2^3 + ... + \frac{1}{2^k} \log_2 2^k + ... \] \[ = \frac12 + \frac{2}{2^2} + \frac{3}{2^3} + ... + \frac{k}{2^k} + ... \] \[= \frac12 \left( x + x^2 + x^3 + ... + x^k + ... \right)'_{x = 1/2} = \] \[ = \frac12\left( \frac{1}{1  x} \right)'_{x = 1/2} = \frac12 \left.\frac{1}{(1  x)^2}\right_{x=1/2} = 2 \]in binary units (bit). □
Layering creates multiplicities wherever possible, and it is a spontaneous process. Metaphorically speaking, nature does not like equality. But this theory is not only valid for the material aspect of reality, physically, but also applies to abstractions such as mathematical ones. There does not have to be an excessive analogy between the material and the abstract, although the connection between the two phenomena can already be noticed when applying the multiplication table, or by learning the first laws of mechanics.
The wholes become more and more "their own", dispersed and different, individual more, and more the more they are determined. Systems striving for greater certainty become more detailed and numerous, but on the other hand, they cannot go into complete dispersal, from which they also flee. The optimum difference is somewhere there, in the possibility of proving purely geometrical items also by exact algebraic methods (say, analytical geometry), or mathematical analysis, or from the axioms of probability theory.
Everything that happens is true, even what can happen is true, and what we can prove can happen, and then we come to the abovementioned "mathematical being" because of which we could also accept some assumptions as a kind the truth. It is a way to understand the spread of layering and the mixed worlds of truths and lies (The Truth).
I emphasize once again, nature does not lie, it literally follows the principle of least action (EulerLagrange equations) from which all the trajectories of physics known today can be derived. Lying and defying laws are characteristics of vitality.
Meanings
Question: What did you mean by "changing one axiom we change the meaning of others", is truth relative in mathematics?
Answer: Yes, truth is also a phenomenon subject to the laws of information, to its own personal rules. She is that "barber" of Russell's set paradox (Sufficiency) "who shaves those and only those who do not shave themselves". That is, "the truth regulates everything that is not regulated by itself." The continuation is the same conclusion, that there is no comprehensive truth, neither everything nor anything.
In the picture on the left we will see two faces, or a vase, considering that the ambiguity comes from less information than necessary. But the theory of information (mine) reveals something else to us, that by changing one axiom, we change the meaning of the others, and that they themselves can be so universal that they are meaningless, in an indisputable way. For example, the axiom of incidence from Hilbert's Foundations of Geometry (1899), states that "there are at least two points on a line". Its "line" may be the house and the "points" the windows, or the text may be a detail about the legs of a species of animal in a story.
It is a typical mathematical universality on the basis of which we will say that there is nothing more practical than a good theory. Abstract truths are the most profound because of the breadth of their coverage of both the senses and those derived from the senses. That scope rightly reminds us of the measurement of uncertainty, which is all the greater as it relates to a larger scope of outcomes.
Another important similarity between truth and uncertainty is layering. If we were to progress in understanding the meaning of some truths, as we have seen, we would be in a situation of learning, studying, unraveling uncertainty. Like a hunter who hunts a game then unaware of the hunting traps, we can have more or less truth about something, as we will be in more or less uncertainty in a given situation.
In addition to all that relativity, layering, and even the objectivity of truth and uncertainty, the former as the realization of the latter, the two are like antithesis and thesis, the mutual negations. One would not exist without the other, because information is not only the fabric of space, time and matter in the literal, naked physical sense, but also of the truth.
Axioms
Question: Give an example of an inverted and yet true axiom?
Answer: In 1823, Lobachevski published a model of geometry named after him, which was not appreciated by the St. Petersburg Academy at the time, so it is known to us after its publication in Paris in 1837. He talks about "inverted" Euclidean geometry, where at least two parallel lines can be drawn from a point outside the given line.
Euclidean geometry created in Ancient Greece is based on five postulates often contested, of which the fifth in particular seemed clumsy, extensive and redundant. However, centuries of attempts have passed, but no one has succeeded in deriving that postulate based on the previous ones. Lobachevski's initial idea was instead to take the opposite statement and find a contradiction in the new geometry. That would mean that somehow that one can be derived from the others.
We are talking about lines that are parallel if they are in the same plane and have no common points or coincide. The original fifth postulate establishes that through a given point outside the given line it is possible to draw one and only one line parallel to the given one. Lobachevsky's fifth postulate states that through a point outside a given line it is always possible to draw at least two lines parallel to the given line. After an unsuccessful long search for a contradiction, Lobachevsky noticed that the chords of a given circle, seen as "straight lines", fulfill the postulates of the geometry he was developing. He brilliantly punctuated that finding.
Since circles and chords are elements of Euclidean geometry, and the new, his geometry can be reduced to them, then the accuracy of the new geometry derives from the accuracy of the ancient. Lobachevski noticed that "saddle surfaces" also imitate his geometry, when we consider "straight lines" as the shortest distances between given points in movements where that surface is not left. Because of such a model, we call it hyperbolic geometry, and from the mentioned and similar appliances, we conclude that the old and new geometry are equivalent, they are equally correct, or incorrect.
Checking on the surrounding hills whether the sum of the angles of the triangle is less than, equal to or greater than 180^{o}, Gauss was supposedly trying to find out which of the geometries (hyperbolic, flat, or spherical) we live in. What we might be aware of today is the inevitability of the consequences of the axiomatics to which we belong, but which we frivolously hold to be absolutely correct and the only possible physical reality.
We have an interesting example of "inverted and yet true axioms" in Zermelo's theory of sets (1908). He and later Frankel discovered that between two infinities, the countable and the continuum, it is possible to postulate, inserted or not, another infinity, and that the resulting theories be equally correct in the two cases.
We know that Cantor and Dedekind (1870s) developed the idea of sets and various infinities, the cardinal numbers of their elements. There are infinitely many countable natural numbers (labels ℵ_{o}), they found, and their continuum was the infinity of real numbers. They were careful not to claim that there were no other infinities between such.
Formation
Question: Do you believe that the abstract world develops like the material world?
Answer: Yes, but "belief" is not the best name for finding solutions to problems. A cold assumption and thesis to stick to, knowledge and caution not to miss the point is a better description.
The ultimate statement of the comprehensiveness of information in the structure of our world would be exactly that, the participation of information in the development of, say, the laws of mathematics. Unlike physics, which also deals with time but as an added phenomenon, information theory puts time before everything else. The repeated news is no longer the previous one, if it is news at all, but rather an evolution of the original one. The essence of information is uncertainty, which evolution must also have.
For example, when we are waiting for the result of the drawing of the numbers of the lottery drum, the announcement "the winning numbers are xyz" in us as a participant of the game triggers emotions on a scale from severe disappointment to great delight, while the same news for the organizer of the game would could range from concern to relief. News is equivalent to action, and emotions are its effects, which retroactively define itself. The uncertainty of all actions will not allow us to predict exact outcomes, and therefore neither accurate feedback reactions. It is not possible to definitively define neither the communicated information nor its consequences.
To the repeated announcement about the outcome of the drawing of lotto numbers, neither the game participant nor the organizer can react with the same intensity, nor with the same kind of emotions, because they may already be making plans based on the observed consequences, so that "same news" is especially no longer the same. Let's further notice that there is a lot in this simple sentence "the winning numbers are xyz" that is actually abstract, and on the other hand, coercive.
Every emission of information is a realization of uncertainty. Whether it's rolling a dice with six possibilities and one outcome of information that matches the amount of initial uncertainty, or it is a financial news that would get us very excited. The consequences of uncertainty persist due to the conservation of information, but they are stratified and locally become increasingly certain. It is the process of the emergence of abstractions, that is, phenomena that escape the scope of our basic senses.
Look again at the explanation of "abundance". We find ourselves in an infinity of options, of which only a small part is relevant and on the basis of which our senses were formed, as well as the belief that there are no other possibilities. However, layering, which we can also understand through principled minimalism, leads states of uncertainty along oneway streets that are unpredictable in advance, but then towards intersections with fewer and fewer branches. The force of attraction of smaller choices grows to an enormous one, like that between quarks, trapping and directing further processes.
This part of the evolution of the universe in this (hypo)thesis would be more dominant in its earlier stages, perhaps even (almost) completed at the time of the first fermions.
Recursion
Question: Does the form of material bodies appear in process forms, or in abstract structures?
Answer: Of course, this happens in all interpretations of mathematics in physics. When we think about it, there is a physical memory in physical space, because everything we see around us are past events. They are exactly as old as the time it takes for light to reach us from them.
Pictured at right is the recent discovery by physicists at MIT and elsewhere, of fractal patterns in quantum material that exhibit strange electromagnetic behavior on the atomic scale. By the way, fractal is a geometric figure that can be broken down into smaller parts so that each part is, at least approximately, a reduced copy of the whole. It is also said that a fractal is a figure that is similar to itself. The term was derived by Mandelbrot (1975) from the Latin word fractus, which means broken.
The coastline has a fractal shape, hence the Coastline paradox, that it does not have a welldefined length. The measured length of the coast depends on the method of measurement and the scale of the map, and in the case of fractal structures, by increasing the scale, it can grow indefinitely despite the approximately unchanged surface area of the island (Britain). Koch snowflake is a fractal shape built by adding smaller and smaller triangles, which tends to a finite surface and infinite volume.
A similar conception to fractal is recursion, meaning "definition by itself". It has become a powerful tool in writing algorithms, and it comes directly from mathematics. For example, the Fibonacci sequence 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, ... is defined by the recursive function \( f(n) = f(n1) + f(n 2) \) and with two initial values \(f(0) = 0\) and \(f(1) = 1\) for \(n = 0\) and \(n = 1\), and then is \(f(2) = f(1) + f(0) = 1\), so \(f(3) = f(2) + f(1) = 2\) and so on.
The visual property of fractals, that the parts are reduced copies of the whole, no matter how complex they may look at first glance, tells us about the way of repeating nature through the scale of sizes. Going into the world of increasing values, some patterns of the micro world seem to repeat themselves, despite, for example, the law of large numbers of probability, or the fact that the volume of the body (mostly) increases with the cube and the surface area with the square of the length.
There are obvious similarities to this in situations of distance in space or time. On a cosmic scale, the distant galaxies and the universe of that time will still reflect our current laws of physics, and yet they will have parallel and visible differences with today's. The mutual distances of bodies and structures, as well as the entropy of the universe, change, but it seems that everything remains almost the same all the time. In the theory of information, I find that certainties also grow over time and that this is "maturing", or let's say "aging" of the present, like an analogical increase by increasing the number of experiments.
Laxity
Question: Do you avoid interpreting increases in body volume and surface area by (mostly) the cube and square of length?
Answer: This is a good observation from the previous answer! The reason is complexity. It is a big question that has a lot to add to information theory.
A square with an edge of length a will have a perimeter of 4a and an area of a^{2}, and a cube of area 6a^{2} is of volume a^{3}. For increasing lengths of a, we find how much this second measure exceeds the first. The principled minimalism of information stands in the way of their relationship, which is why these growing structures become loose.
Large masses will become gravitational forces spreading their information into additional spacetime dimensions, thus diluting it into to the current reality that volume units are becoming scarce, with a slow flow of time and thus attractive. The deficiencies that arise are comparable to the cavities of Sierpiński triangles, which we would not notice if we were to go by their solid parts, but the calculation would reveal a deficit in relation to the complete space.
The Sierpiński triangle (2D) in steps by removing more and more of its parts gains in the length of the edges (perimeter) and loses in the area, so that in the limiting case the first diverges and the second converges to zero. The threedimensional version will be more and more a hollow body — infinitely increasing surface area and vanishing volume. It is a process that enables increasingly intense connections between the hollow and the full environment, that is, it increases the chances of even greater looseness.
The measure of roughness, or fractal dimension introduced into mathematics by Hausdorff (1918) generalizes the notion of vector space dimension with inner product, expanding it to noninteger numbers. In this sense, fractal geometry goes beyond relativistic and its simple complement in (this) information theory, but it is an instructive example.
Hausdorff
Question: Explain Hausdorff's definition of dimension to me?
Answer: Ok, let's see a simple but not banal explanation of this abstract measure of "roughness" through examples. First, we look at the generation of the first four steps of the Koch snowflake in the image above, and then the Sierpiński triangle in the previous one.
1. The first Koch triangle is equilateral, let's say side 1, from which the second is created by adding 4 such 3 times smaller to each of the 3 sides. The periphery of the first is \(p_1 = 3\), the second \(p_2 = p_1\cdot\frac{4}{3}\), and in general the \(k\)th \( p_k = p_{k1} \cdot \frac43 \), i.e.
\[ p_k = 3\cdot \left(\frac{4}{3}\right)^{k1}. \]Let's pay attention to this fraction, the numerator N = 4 — number of selfsimilar objects and denominator S = 3 — the scale factor of the Hausdorff dimension
\[ D_1 = \frac{\log N}{\log S} = \frac{\log 4}{\log 3} \approx 1,26. \]2. Now let's notice the transition from the second to the third step of the five thumbnails of the Sierpiński triangle. It is characteristically repetitive. In each black triangle, of which there are three times as many (N = 3), twice as small as the given one (S = 2) is hollowed out, so the Hausdorff dimension is
\[ D_2 = \frac{\log N}{\log S} = \frac{\log 3}{\log 2} \approx 1,58. \]3. In Euclidean space, we follow the transitions: length > square > cube, with an increase in dimension D = 1, 2, 3. If the unit of length is reduced, it becomes 1/S steps S = 1, 2, 3 in each spatial direction, the measure will increase \( N = S^D \) times.
Namely, when we cut the length in half (S = 2), the square is divided into four squares (N = 2^{2}), and the cube into eight cubes (N = 2^{3}). When the length is divided into three parts (S = 3), nine squares form from the square (N = 3^{2}), and from the cube 27 cubes (N = 3^{3}). In general, if we observe the Sth part of a unit of length, then a square will have N = S^{2} squares, and a cube N = S ^{3} dices.
Logarithmizing the equation \( N = S^D \) we find \( D = \log N / \log S \) and the choice of the base of the logarithm, \( b > 1 \), is not important, because \( \log_b N/ \log_b S = \log_S N\). And that's it, Hausdorff's definition of dimension. It generalizes Euclidean.
Lessons
Question: Can you share some lessons you draw from Hausdorff's definition of dimension for information theory?
Answer: I can, and I want some of it. First of all, its dimensions are irrational and as such refer to fractals, characters that are similar in themselves. That is close to the definition of infinity.
A set is infinite, we say, iff it can be equivalent to some proper subset of it. Let's say, between all natural numbers (n = 1, 2, 3, ...) and the odd numbers themselves (2n  1 = 1, 3, 5, . ..) stands for bijection (n > 2n  1) and we consider the set of N natural numbers to be infinite.
I join this lesson with the conclusion about the ultimate division of information (Packages), which becomes inevitable when we accept the law of conservation of information. There are no literally the fractal properties in the nature, then, they change little by little by mapping themselves. Many natural phenomena are cyclical, periodic or wavelike, but we now know that their periods will change at least a little from time to time.
On the other hand, when we try to define the set of all those and only those figures not similar to themselves, the opposite of fractals, we come across Russell's paradox "the barber who shaves everyone who doesn't shave himself" (sufficiency). Then we come to a situation where there are characters but also the impossibility of catching them all. One way or another, the world is infinite to us, if it is not contradictory — the lesson is fractal.
The second lesson concerns the gradualness, through the irrationality of the Hausdorff dimension, of the escape of information into the worlds of parallel whole dimensions universes. When there is a greater number of N selfsimilar characters in the given unit, then we have a faster shredding, but we can also say a greater amplitude of fractal repetition. The Hausdorff dimension is proportional to the logarithm of that number. The two of them, N and D, behave as the number of equally likely outcomes (probability of each) and information, when the scale factor (S) is constant.
We remember that my theory of information talks about the development of the universe, so let's add that it can also happen in the gradual clearing of the paths of current realities through broken values of dimensions towards whole numbers.
The next lesson is a continuation of this one, in the observation that larger steps of fractal fragmentation (N) and, therefore, their logarithmically increasing dimensions (D), represent a type of amplitude. That part of the story will go in favor of wave functions in quantum mechanics, for example free particles (I still keep it to myself), when let's extend fractal dimensions, or topological ones, to complex numbers.
Further examples, the socalled lessons, would be too fantastic for the public, until further notice. I sent some of these to the interlocutor with a request that he keep them to himself unless he worked them out successfully.
Ripples
Question: You didn't convince me with the infinities, but maybe try to explain "present travel through the continuum" again?
Answer: I know, everything that is said about, the finite is about the infinite. And as for the second, it fits well, but in a speculative idea. The story is interesting though.
Imagine the present, all of our reality, like the ripples of water in the sea. It cannot last, because it is made up of information, and the law of conservation does not allow it to disappear. It is therefore cyclical, or almost periodic because it slowly dissolves, and is driven by the uncertainty force. It moves away from uncertainty and is increasingly in certainty. Such is our present.
This concept of "movement of the present through uncertainty" does not imply complete unpredictability of the environment, like many other phenomena known to us, but its changeability and gradual reduction. The growth of certainty, through the state of greater probability and savings of communication, will result in a relative slowing down of time, seen from the point of view of the past. It is as if the flow of time loses its momentum, dissipates and solidifies in the surrounding dimensions.
Blinking through time, the present advances discretely in the perceptions of any of its individual participants, but from their multitude we derive it in a Cauchy sequence. From the position of one observer, the events form a countably infinite set, cardinal number ℵ_{0}, which in infinitely many states has at least two possible outcomes. The observer lives in only one of such realizations, but there are them in an infinite series of multiplications 2⋅2⋅... = 2^{ℵ0}, i.e. continuum 𝔠 — the next greater infinity. The path takes us along a continuum of possibilities (Values).
Set theories revealed that the continuum as the next larger infinity after the countable can be supplemented by the assumption of an infinity that would be inserted between the two. The equal accuracy of such tells us about the potential existence of some other worlds around that do not actually concern us and do not necessarily interest us. It seems to be the same with all other infinities.
Space II
Question: I got it! Your "space" is a kind of storehouse of memories, and that's why it's getting bigger, because the history of the universe is getting longer. Did I miss something?
Answer: Space remembers, but it is itself a kind of memory. So it is in (my) information theory. And what we see at the speed of light came from its own, or rather from our common past. The study of the geometry of physical space is the study of processes.
There is more and more space simply because there is more and more memory, and it is growing because the certainty of the ongoing present is increasing! That's the point that took me a long time to make, which is the very core of this story.
What I call the "ergodic theorem" does not have only one form, say in the script "Informatička Teorija II" (61.2), but is a whole series of similar ones in those three volumes. In my interpretation, it is about otherwise wellknown damage to the information that is transmitted through the channels. Messages are states, or vectors, that the transmission process maps further, and they are distorted and converged by the inherent values of the channel they are embedded in. After long transmissions, in the limiting case, the channel becomes a "black box"; based on the output we have no knowledge of the input.
Observing distant galaxies is an example of such transformations. That's why we have a cosmic "event horizon", because we have a limit by the ergodic theorem. Secondarily, we have it because of the everincreasing speed of the increasingly distant galaxies moving away from us, so that we do not even see those beyond the speed of light. The second question, about the reason and consequences of the possibility of having an increasingly long memory, is equally crucial. In short, read the previous answer (Ripples), and for slightly more detailed descriptions, read "Memories".
Black Box
Question: Do you see any analogy between the cosmos and the quantum?
Answer: Of course, there are as many of them as there are applications of mathematical forms. It would take us too much time and text to list them, but until then here is an interesting example.
The distant objects of the cosmos are in the deep past, so the information that reaches us from them very slowly deteriorates. It becomes characteristic (eigen) after at least 13.8 billion years — how old we consider the universe to be. The slowness of the information damage of the cosmicscale transfer process derives from the law of large numbers (probability theory), but the micro world of physics offers us something completely different.
One typical eigenequation \(\hat{K}\vec{v} = \lambda \vec{v} \) in the interpretation of matrices of the second order is given in the figure on the upper right. It is a double stochastic matrix, where \(p\) is the probability that the signal will be correctly transmitted, and \( q = 1  p \) is that it will not. Here, \( x\) is the probability that the signal will be sent, and \( y = 1  x\) that it will not. The equation in the picture is easy to solve.
The two eigenvalues \(\lambda_1 = 1\) and \(\lambda_2 = 2p  1\) lead respectively to the equations of the eigenvectors \(q(x_1  y_1) = 0\) and \(q(x_2 + y_2) = 0\). The first gives \(x = y \), from the second follows \(q = 0\) and that the matrix is then unit. Already with this simple example we recognize the abovementioned ergodic theorem. The eigenvector is impersonal and, if the transmission takes time, all outgoing messages become so.
We know from experience that mathematical positions are extremely synchronized, both with each other and with all observations or inferences derived using them, so it is all the more difficult for us to know whether the "Big Bang" of the universe really happened 13.8 billion years ago, or if it is an illusion derived from the ergodic theorem. Be that as it may, the micro world greatly accelerates processes, that is, makes them uncertain, stripped of the lawfulness of large numbers of cosmic proportions.
In other words, the transition of the sent message into its eigenvalue is a very fast process in the micro world, it is that collapse from the state of superposition that is still troubling quantum physicists. That is why the observables (measurable quantities) of quantum mechanics are always the eigenvalues of the corresponding operators, because the state we would observe is very much usurped by measuring devices. We provide them with an environment of less information and lead to spontaneous emission, communication with the apparatus (otherwise contrary to principled minimalism).
The micro world has another mode of operation that we rarely see in the big world, or our world. There are more frequent "bypasses", also due to the same greater uncertainties, so the use of complex numbers makes more sense there. As we can see, the theory of information offers physics simple solutions to its difficult dilemmas, but it also seeks a breakthrough into the more abstract socalled reality.
Reception
Question: Why are we not seeing additional dimensions?
Answer: Everything does not communicate with everything; no message is conveyed to someone who cannot read it. Let's add to that that the environment defines the subject of communication, that states tend to be less informative, that the law of conservation applies.
I am setting the table for informatic interpretation, but there will be others, also correct. Multiplicity is in the nature of mathematics.
When one of the options is declared, it picks up all the previous uncertainty. Thus, the amount of uncertainty before throwing a fair die is log 6, what is the (amount of) information after the throw, when, say, a "three" falls. This means, if a given particle (uncertainty) is declared in one reality, it becomes determined, its trajectory is defined. The elementary particle can then tell the object of interaction everything about itself as it can so that it, or its appearances, in the old way — no more.
Determining in this way is a process that goes on and on, defining our reality. At the same time, large bodies, with many stories and interlocutors, can be longlived and weakly dependent on particular individuals. What they jointly build in our reality is defining the transmission channel, profiling the outcome of all participants (Black Box). It is an ongoing process and, like public opinion, is largely decisive for the outcomes of individual participants.
In other words, the reality we live in does not allow us to see the other. We cannot go into the past, because it is declared and unacceptable, and the future is "far away" as is the parallel reality. I mean, we don't have a proper "transmission channel" that would transfer us to those other times as its eigenvalue. They can't come here either. In a world of small sizes, freed from the laws of large numbers (probability theory), such transitions are easier (Bypass).
In short, our reality builds such a channel of information transmission (us) in which what it transmits are eigenvalues and possible observables. And for the other structure, they are not.
Past
Question: Why do you think the past is "impenetrable", how?
Answer: In terms of "present travel through the continuum" (Ripples) the past would remain in less likely states. According to the "cosmos and quantum analogy" (Black Box), the past would be in a spent or overcome state of "transmission channel" for the states of the present. The new ones are insufficiently eigenvalue to be equally realized there.
However, I also have older observations (Information of Perception, p. 66) still current. Time is slower for a body in relative motion. When it comes to us, the present is getting closer to each other, so that they become equal in passing. This means that until the moment of passing, that body was in our future. The opposite state occurs when the body then continues to move away from us, it then goes into our past.
I interpret this example of the special theory of relativity with the Doppler effect, which says that the wavelengths of, for example, the light of a moving source will be shorter in arrival and longer in departure, as more probable future states than the present, and especially the past. Longer wavelengths are more smeared particlewave positions and define positions of lower probability densities.
Simply because the past is less likely, the present moves towards the future. But one way or another, all such reasons come down to the principle of minimalism information.
Eigenvector
Question: What is the process eigenvalue and why is it so important?
Answer: This is one of the more frequent questions me (Eigenvalue) and also one of the most grateful for various aspects of the answers. When you (if ever) take lectures in higher algebra, or analysis, you will find how extensively and not simply the eigenvalues of operators are studied in mathematics, but you again may miss what I am about to tell you.
Eigenvalues are the results of processes that we can perceive, or measure. Among the multitude of options that the process (operator) can take to the states (vectors) we are studying, those states that change little or not at all are characteristic. And that means that they are those solid places in the nightmare that we can cling to. In the theory of quantum mechanics, they are, so to speak, the only ones that experimental physicists today consider worthy of attention.
For example, the image above left shows \(\psi\) a simple wave function, the solution of the Schrödinger equation for a free particle. Let's say that it extends along the abscissa (xaxis). The process that changes its positions is described by the operator \(\hat{x}\), so that \(\hat{x}\psi = x\psi\). Hence \(\psi = \psi(x)\) is an eigenfunction (it is also a vector) position operator whose eigenvalue is precisely the position \(x\) of the particlewave.
Another example, the same wave function that we write in more detail \(\psi = e^{ipx/\hbar}\), where \(p\) is the particlewave momentum, and \(\hbar\) is the reduced Planck constant, with by the momentum operator \(\hat{p} = i\hbar\frac{\partial}{\partial x}\) will give \(\hat{p}\psi = p\psi\). In other words, the same \(\psi\) is the eigenfunction of the momentum operator, now with the eigenvalue momentum \(p\).
Third example, let's describe the measurement of momentum and position, than, the position and momentum of the wave function:
\[ \hat{x}\hat{p}\psi = \hat{x}(\hat{p}\psi) = \hat{x}p\psi = p(\hat{x}\psi) = px\psi, \] \[ \hat{p}\hat{x}\psi = i\hbar\frac{\partial}{\partial x}(x\psi) = i\hbar \psi + xp\psi. \]By subtraction we find:
\[ (\hat{x}\hat{p}  \hat{p}\hat{x})\psi = [\hat{x}, \hat{p}]\psi = i\hbar\psi, \] \[ [\hat{x}, \hat{p}] = i\hbar, \]and these are the Heisenberg uncertainty relations of position and momentum.
By measuring the momentum and position in different orders, we get different results. That is, if we more accurately measure the position of a particle using gamma rays of smaller wavelengths but greater momentum, the more inaccurately we find out its momentum. According to this, the impossibility of exact measurement in general (in information theory) is not an engineering problem, but comes from the very noncommutativity of some operators.
Everything in physics revolves around measurement, so eigenvalues are more important in mathematics. However, in information theory it will become an expression of the principle of minimalism. A process that changes state requires more actions, simply so.
Dispersion
Question: How to transfer eigenvalues to perception information?
Answer: Let's say as a dispersion. It is the "range", a simple difference of the end values. Or it is the "variance", the root mean of square deviation from the mean
\[ \sigma^2 = \sum_{k=1}^n p_k(x_k  \mu)^2 \]where \(p_k\) is the probability of the variable \(x_k\) with \(n = 2, 3, ...\) possible outcomes, and
\[ \mu = \sum_{k=1}^n p_k x_k \]is the mean value, average value, the socalled mathematical expectation. The root of the variance \(\sigma\) in statistics is called "standard deviation", and is often implied by "dispersion".
In the book "Quantum Mechanics" (1.4.4 Uncertainty Principle) you will find how Heisenberg's uncertainties are defined by the "implicit" dispersion, called the standard deviation, i.e. by the root of the variance. These statistical estimates are easily transferred to a vector space provided with a scalar product, and thus to operators (which are a type of vectors). And here is an example with the operators in the book worked out in detail.
Let's imagine that under the given circumstances, the state of \(\psi\) the environment, we have players \(A\) and \(B\) who compete with a strategy, either superior (Reciprocity) or another. During the game, they lead the processes \(\hat{A}\) and \(\hat{B}\), which are initiatives, or reactions to the opponent's actions, and we calculate the dispersions of their responses, so we estimate the value of the game. The random variables are from some set \(X\), which is a continuum of values (real or complex numbers).
The variances of these operators will be:
\[ \overline{(\Delta\hat{A})^2} = \int_X \psi^* \hat{A}  \mu_A^2\psi, \quad \overline{(\Delta\hat{B})^2} = \int_X \psi^* \hat{B}  \mu_B^2\psi, \]where \(\mu_A\) and \(\mu_B\) are the mean values of these operators. Otherwise, observable probabilities are products of conjugate complex states, wave vectors \(\psi^*\psi\), here probability densities. The calculation I provided in the book ends up with:
\[ \sqrt{\overline{(\Delta\hat{A})^2}} \cdot \sqrt{\overline{(\Delta\hat{B})^2}} \ge \left\frac{[\hat{A}, \hat{B}]}{2}\right, \]where the commutator \( [\hat{A}, \hat{B}] = \hat{A}\hat{B}  \hat{B}\hat{A} \) of these operators. It is not zero when the operators are not commutative, which they generally are not.
For the position and momentum operators (previous question) is \( [\hat{x}, \hat{p}] = i\hbar \) and the time and energy commutator is \( [\hat{t}, \hat{E}] = i\hbar \), so the previous inequality generalizes Heisenberg's uncertainty relations in principle. Whenever processes are performed by noncommutative operators (of significant order of application), the uncertainty of one increase if the uncertainty of the other decreases. In the example I gave, then it is not possible to predict the exact outcome of the game.
Perceptual information is an expression of such calculations. In the discrete case, we write it simplified \(S = a_1b_2 + a_2b_2 + ...\), but we understand that this sum can also be an integral. in addition the factors in summations can be the uncertainties, \( \sigma_A = \Delta A = \overline{(\Delta\hat{A})^2} \), similar to Heisenberg's. The interesting thing about this method is that in it, summands of commutative operators (processes) do not contribute to the information of perception.
Commutation
Question: Can you explain to me how and when the commutativity of processes is a sign of the absence of their mutual communication?
Answer: That's the tip of the iceberg. Intuitively, commutativity would be a kind of symmetry in the use of operators, the independence of the changes of each of them on the processes of the other, the perpendicularity, or parallelism of them in the corresponding space.
Each of these expectations can be refined, but I chose the demonstration shown in the image above left. Let's find the area of the triangle OAB and note it as information, then look around a bit more.
1. Let's see that the hatched triangle is obtained by adding the areas of one triangle and a rightangled trapezoid and subtracting the area of the other triangle:
\[ S(OAB) = S(OB_xB) + S(B_xA_xAB)  S(OA_xA) = \] \[ = \frac12 B_xB_y + \frac12 (A_y + B_y)(A_x  A_y)  \frac12A_xA_y \] \[ = \frac12 (A_xB_y  A_yB_x) = \frac12 [A, B]. \]That surface is a "commutator" that I have long used in geometry (Triangle) and which was my inspiration in studying force (Conics), but also for much in information theory. Follow the given links (texts) if you would like to see how extensive, generous the topic started by the question is.
2. Based on the same picture, the vector intensities \(\vec{a} = \overrightarrow{OB}\) and \(\vec{b} = \overrightarrow{OA}\) determine the area of the triangle in the old fashioned way:
\[ a^2 = \\vec{a}\^2 = A_x^2 + A_y^2, \quad b^2 = \\vec{b}\^2 = B_x^2 + B_y^2, \] \[ S(OAB) = \frac12 ab\sin \varphi, \] \[ ab\sin\varphi = [A, B], \]where \(\varphi = \angle AOB\) is the angle between the given vectors.
3. From this we calculate:
\[ \cos^2\varphi = 1  \sin^2\varphi = 1  \left(\frac{[A, B]}{ab}\right)^2, \] \[ a^2b^2\cos^2\varphi = a^2b^2  [A, B]^2 = (A_xB_x + A_yB_y)^2, \] \[ ab\cos\varphi = \{A, B\}, \]where \( \{A, B\} = A_xB_x + A_yB_y\) is an anticommutator.
4. Finally, we form the Euler function
\[ e^{i\varphi} = \cos\varphi + i\sin\varphi = \frac{1}{ab}([A, B] + i\{A, B\}). \]It is the basic form of wave functions of quantum mechanics (Free particle), which contains the probability of the observable and the particlewave information.
From these few examples, we see how much nonzero commutativity, or even anticommutativity, means for the presence of mutual information between two states (processes), for their mutual perception, or communication.
Tempo
Question: How do you find the nth power of a matrix?
Answer: The question had nothing to do with information theory, but the answer will. In principle, I don't give instructions in mathematics and it rarely happens that I interpret some such (I consider it easier) problem.
For example, consider a secondorder stochastic double matrix (Black Box). It is symmetric, and by multiplying such matrices, they again give symmetric matrices. Namely:
\[ \begin{pmatrix} a & b \\ b & a \end{pmatrix} \begin{pmatrix} p & q \\ q & p \end{pmatrix} = \begin{pmatrix} ap + bq & aq + bp \\ aq + bp & ap + bq \end{pmatrix}. \]Now we are looking for the nth power of such a matrix in the form:
\[ \hat{A}^n = \begin{pmatrix} a & b \\ b & a \end{pmatrix}^n = \begin{pmatrix} A_n & B_n \\ B_n & A_n \end{pmatrix}, \] \[ \hat{A}^{n+1} = \begin{pmatrix} aA_n + bB_n & aB_n + bA_n \\ aB_n + bA_n & aA_n + bB_n \end{pmatrix}, \] \[ \begin{cases} A_{n+1} = aA_n + bB_n \\ B_{n+1} = aB_n + bA_n \end{cases} \]By addition and subtraction we find:
\[ \begin{cases} A_{n+1} + B_{n+1} = (a + b)(A_n + B_n) = ... = (a + b)^{n+1} \\ A_{n+1}  B_{n+1} = (a  b)(A_n  B_n) = ... = (a  b)^{n+1} \end{cases} \] \[ \begin{cases} A_n = \frac12 [(a + b)^n + (a  b)^n] \\ B_n = \frac12 [(a + b)^n  (a  b)^n] \end{cases} \]When \(a \in (0, 1)\) is the probability that a certain outcome will occur, and \(b = 1  a\) is the probability that it will not occur, it will be \(a + b = 1\) and let's put \(a  b = \alpha \). Then the power of this matrix holds:
\[ \hat{A}^n = \begin{pmatrix} \frac{1 + \alpha^n}{2} & \frac{1  \alpha^n}{2} \\ \frac{1  \alpha^n}{2} & \frac{1 + \alpha^n}{2} \end{pmatrix} \to \begin{pmatrix} \frac12 & \frac12 \\ \frac12 & \frac12 \end{pmatrix}, \]when \(n \to \infty \). This conclusion concerns information theory.
We see that the matrix composition \(\hat{A}\), which is of the form \(\hat{A}^n\), slowly slides into a faceless, black box, which will map every distribution into the faceless one:
\[ \begin{pmatrix} \frac12 & \frac12 \\ \frac12 & \frac12 \end{pmatrix} \begin{pmatrix} p \\ q \end{pmatrix} = \begin{pmatrix} \frac12 \\ \frac12 \end{pmatrix}, \]and that is what was previously (Black Box) explained in another way. Looking at the same problem in this way, we find that the speed of the transition of the double stochastic matrix to the impersonal one is the greater the closer it was (in the beginning) to the impersonal one, if the difference \( \alpha = a  b\) is smaller.
I note once again that such situations in the macro world with real numbers of matrix coefficients, unlike the micro world where we have complex numbers in addition, and therefore additional and new concepts. Here are some more similar examples.
Induction II
Question: Aren't you using mathematical induction to find the nth power of a given matrix?
Answer: The socalled mathematical induction is suitable when we have assumed solutions whose accuracy needs to be proven. It goes as follows.
We have an unbounded sequence of assertions \(T_1, T_2, ..., T_n, ...\) in which we directly check the correctness of the first term, and then we also check the correctness of the implication \(T_n \Rightarrow T_{n+1}\) for the index \(n = 1, 2, 3, ...\) in general.
Note that the proof of implication is often easier than the immediate proof. So, for example, it is easier to say "if I stood on the moon, I would see that the Earth is round", than to go to the moon and check it. Here's an example.
1. Let's prove by the method of mathematical induction that:
\[ \hat{A}^n = \begin{pmatrix} a & b \\ 0 & c \end{pmatrix}^n = \begin{pmatrix} a^n & b(a^{n1} + a^{n2}c + ... + ac^{n2} + c^{n1}) \\ 0 & c^n \end{pmatrix}. \]The first step of induction, for \(n = 1\), is obviously correct, because \(\hat{A}^1 = \hat{A}\).
In the second step of the induction, we assume that the statement is true for general \(n\) and check the correctness for \(n + 1\):
\[ \hat{A}^{n+1} = \hat{A}\hat{A}^n = \begin{pmatrix} a & b \\ 0 & c \end{pmatrix} \begin{pmatrix} a^n & b(a^{n1} + a^{n2}c + ... + ac^{n2} + c^{n1}) \\ 0 & c^n \end{pmatrix} = \] \[ = \begin{pmatrix} a^{n+1} & b(a^n + a^{n1}c + ... + ac^{n1}) + bc^n \\ 0 & c^{n+1} \end{pmatrix}, \]and that is exactly n + 1 power of the given matrix. As both steps of mathematical induction have been verified, the correctness of the stated statement has been verified.
2. Let us reduce the given matrix to (ordinary) stochastic, putting \(b = 1  a\) and \(c = 1\), where \( a \in (0, 1)\) is the probability of correct transmission of the (first) signal :
\[ \hat{A}^n = \begin{pmatrix} a^n & (1  a)(a^{n1} + a^{n2} + ... + 1) \\ 0 & 1 \end{pmatrix} = \begin{pmatrix} a^n & 1  a^n \\ 0 & 1 \end{pmatrix}. \]If the probability of a would be any number other than 1, then the informational interpretation of this example is reduced to the previous one (Tempo). Then \( a_n \to 0 \), when \( n \to \infty \), and so
\[ \hat{A}^n \to \begin{pmatrix} 0 & 1 \\ 0 & 1 \end{pmatrix}, \]which is also an impersonal matrix of information transmission, a black box, because each signal come as another, that is, an error.
Let's note that the ergodic theorem ("Informatička Teorija II", 61.2.) recognizes both of these "faceless" matrices, in addition to others similar to them.
Nothing
Question: How can "something" come from "nothing"?
Answer: So that the total of "everything" remains as it was. "Nothing" is thus decomposed into +1 and 1, unit number into the intensity of the unit vector, unit matrix into the product of two unitary, or inverse.
These are the processes that occur more often in the micro world, and we can observe them most easily in a vacuum, where they are already routinely detected by now:
\[ 0 = \begin{pmatrix} 0 & 1 \end{pmatrix} \begin{pmatrix} 1 \\ 0 \end{pmatrix} = \begin{pmatrix} 0 & 1 \end{pmatrix} \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 1 \\ 0 \end{pmatrix} = \] \[ = \begin{pmatrix} 0 & 1 \end{pmatrix} \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} 1 \\ 0 \end{pmatrix} = \begin{pmatrix} 1 & 0 \end{pmatrix} \begin{pmatrix} 0 \\ 1 \end{pmatrix} = 0. \]The process of creation and annihilation that actually, randomly happens in the quantum world is described. Physics in each of these stages has its own markings:
\[ 0 = \langle 0 1 \rangle = \langle 0 \hat{I} 1\rangle = \langle 0 \hat{\sigma}_x \hat{\sigma}_x 1\rangle = \langle 1 0 \rangle = 0. \]Here \(\hat{\sigma}_x\) is the first of three Pauli matrices (Quantum Mechanics), which are defined so that their square is identity, unit matrix. They will transform the co and countervariant vectors \(0\rangle\) and \(1\rangle\) into each other.
It is easy to combine in this way and get "whatever" we want. In fact, these are only temporary phenomena, but of any particles, if the basic laws of physics are not violated, primarily the laws of conservation.
