
Amount
Question: What is the "amount of possibilities"?
Answer: A body can have various measures of itself: length, speed, weight, volume, ..., and each of them can be exactly defined. So, the same is with the "counting of options".
The five coins on the table can be placed differently with the Heads or not, in 2^{5} = 32, or there can be, 3^{5} = 243, ways of different purchase values, of which is five pieces of information, 32 is the number of combinations, and especially the three nominal values. This is also the definition of information by binary search.
With a sequence of k = 1, 2, 3, ... binary digits, 0 or 1, it is possible to write N = 2^{k} of different numbers, while the equally probable information of all such ktuples is the logarithm of their number log_{2}N = k, where the base of the logarithm determines the unit of measure. In general, according to Hartley (1928), the information of the N = 1, 2, 3, ... equally possible chances is log N.
Weber (1834) found a mathematical formulation that approximately, within certain limits, the differential threshold of the stimulus ΔW is a constant fraction of the intensity W of the stimulus
\[ \frac{\Delta W}{W} = \text{const}. \]That is, the amount of energy ΔW that must be added to produce the perceived stimuli should be proportional to the energy W already given. That differential threshold for a weight of 50 grams would amount to, say, one gram, that is, a weight of at least 51 grams is needed to feel the difference in weight. If the weight is 100 grams, say sand in the hand, the differential threshold is 2 grams, ie. twice the weight corresponds to twice the insensitivity to differences.
Summing up according to this Weber's law, for the total large number of changes n that one sense can perceive, we find
\[ \frac{\Delta W_1}{W_1} + \frac{\Delta W_2}{W_2} + ... + \frac{\Delta W_n}{W_n} = n\cdot(\text{const}.) \]where W_{1} is the absolute (lowest) threshold for a given sense, then W_{2} = W _{1} + ΔW_{1}, W_{3} = W_{2} + ΔW_{2}, ..., W_{n} = W_{n1} + ΔW_{n1}. For extremely sensitive senses, the difference ΔW is infinitesimal, and so is the constant of Weber's law, but that is why the number of different observations (n) is very large. Then we can calculate the above sum using the infinitesimal calculus, and get the logarithm of the quotient, W_{n}/W_{1}, while the meaning of the right side remains unchanged. That number is proportional to the number of observations, a measure of the "amount of possibilities" that a century later Hartley would call information
\[ I(W) = C\cdot\log\frac{W}{W_1}. \]There C is some constant that depends on the base of the logarithm, on the unit used to measure information. The number W measures the intensity of the stimulus up to the total number of data (information I) that the sense distinguishes, from the absolute threshold (W_{1}). Rastko Vuković: The Nature of Data, Novi Glas, Banja Luka, 1999 (in Serbian).
Mean
Question: What is "Shannon Information"?
Answer: Shannon's (1948) information is another measure of the "surprise" of the outcome of random events that represents the average of Hartley's (1928). As we have various "means" in mathematics among which there are certain inequalities, in addition to some others in statistics, I will dwell on the peculiarities of Shannon's "middle".
The probability distribution is based on the complete set of elements ω_{k} ∈ Ω that are independent random events. This means that one and only one of them will definitely happen. When their set is discrete, with n ∈ ℕ members, and the probability of the kth outcome is p_{k} ∈ (0, 1), it is clear that the sum of all is a certain event:
\[ p_1 + p_2 + ... + p_n = \sum_{k=1}^n p_k = 1. \]We consider each of the events ω_{k} as a set of m_{k} equally probable (fictitious) outcomes, so that p_{k} = 1/m_{k}, just so we can count on the Hartley information I_{k} = log m_{k} = log p_{k}. Within each of such n groups, in the kth, there are m_{k} equally likely elements, so is the sum of all m = ∑_{k} m_{k} far greater than n. The mean value of the information of those groups, their socalled the mathematical expectation is:
\[ S =  p_1 \log p_1  ...  p_n \log p_n = \sum_{k=1}^n p_k \log p_k = \sum_{k=1}^n p_kI_k. \]That's Shannon's discreet information.
For example, when 0 < p < 1 probability that event A will happen, then q = 1  p probability that it will not happen (A' will happen). The formula for Shannon's information gives
\[ S = p \log p  q \log q, \]which with a coin toss and the event A = "head", with p = 0.5 and the binary logarithm would give S = 1 bit (binary digit).
When there are uncountably many outcomes and we work with probability densities ρ = ρ(ω) > 0, then for density and Shannon's information are:
\[ \int_\Omega \rho\ d\omega = 1, \quad S = \int_\Omega \rho \cdot \log \rho \ d\omega. \]Otherwise ρ⋅log ρ → 0, when ρ → 0, so the intervals of individual densities and probabilities can be extended to closed intervals [0, 1].
Extreme values Shannon information of densities are on the uniform, exponential and Gaussian probability distributions, depending on the given conditions, as follows:
 On given Ω limited interval, the maximum Shannon distribution will have a uniform probability density distribution.
 With a given mean (expectation μ), the exponential distribution will have maximum Shannon information.
 In the case of a given dispersion (variance σ²) it is maximal for the Gaussian distribution.
The consequences of these limitations are manifold. The uniform distribution has maximum Shannon information on limited intervals, so the principle of minimalism information will spontaneously change the probabilities into inhomogeneous ones. Loose networks (equal links) will tend towards groups of few nodes with many links and many nodes with few links. Therefore, with very long strings of possibilities, only a small number of them will be relevant.
For example, the free market (connection) of money and services develops into a few very rich and many poor. Another example, our relatively small number of senses compared to the relatively vast multitude of all other stimuli, is relevant to the survival of the species. Again, the stem cells of a young organism, shall we say, develop into the required cell types during embryonic development (embryonic stem) , as well as the body of an adult (adult parent).
The multiplication of bacilli would have an exponential distribution (as claimed by medical encyclopedias), but this will not be due to the principle of minimalism. The infected will become immune or they will die and the trend of spreading will decrease; this is the way to realize that principle. In this manner or accordingly, the process of the present will slow down to the detriment of an increasingly thick memory. The past inhibits the future, so the total amount of information is conserved, with consequence that the current event becomes more and more certain.
I have often mentioned, or interpreted, these distributions and their Shannon information. For example, how to avoid the Gaussian distribution (Dispersion II) and how to reduce it to an exponential of the kinetic energy by interpreting its variable as velocity (Veracity). That’s why I don't go into details here.
Surprises
Question: Can you explain the concept of "surprise" to me?
Answer: In this question, the emphasis should be placed on the action, the transition from the state of superposition (option) to one outcome. When we flip a coin and a "tail" or "head" falls, it is in the act of "flipping".
I mean, to pay attention to the process of transformation that can make the "surprise" happen, or to the energy and time it takes to make it happen. Consistent with nature's effort to make more likely events happen more often, to be less informative. We come to the point that the "information theory" (which I am developing) bases every force and physical action on an analogous "natural tendency".
Starting from the principle of least action, the general equations of gravitation are feasible, as well as all other known trajectories from theoretical physics today. Examples are Minimalism of Information, 2.5 Einstein’s general equations, and the famous law of refraction of light (Speed of light, Snell's law). Many other apparently unrelated phenomena have a common cause in this principle, it in the frugality of information, and both in the tendency towards more probable outcomes.
The physical substance in the sandwich is of these inclinations, striving for less, at least as far as information is concerned. On the one hand, there is vitality, which is reached together with the surplus information of the system in relation to the special information from the substance of which it consists, and on the other hand, physical substances unnoticed worlds of ideas and abstraction. The interesting thing about this construction, the theory of information, is the possibility of describing intelligent living beings that can perceive parts of each of those three layers. The peculiarity is the residue of ideas that separates living from nonliving beings, or a physical substance due to the (absence) of which ideas differ from vital ones.
When the probabilities of some discrete distributions are in a decreasing (nonincreasing) sequence p_{1} ≥ p_{2} ≥ p_{3} ≥ ... ≥ 0, or we have a decreasing probability density ρ(ω), then (when p < e^{1}) the Shannon information summands are also decreasing:
\[ \rho(\omega) \cdot \log \rho(\omega) \ge \rho(\omega + \varepsilon) \cdot \log \rho(\omega + \varepsilon) \]for each \( \varepsilon \ge 0 \). You can prove this, for example, with the derivative of a function
\[ f(x) = \frac{\log_bx}{x}, \]and see that \( f'(x) \lt 0 \) for every \( x \gt e \), and that the function f(x) is decreasing . The conclusion is that the less likely p ∈ (0, 1), although with greater Hartley information I = log_{b} p > 0, make minor additions, the summands of Shannon's information.
Another way to see the same is in the next picture. The upper, blue graph is the function y = x⋅log_{2} x, and the lower red y = x⋅log_{10} x. Both ordinates rise up to the same abscissa x = e^{1} ≈ 0.37 after which they decrease. The abscissas are now with probability values x ∈ [0, 1], and in the long sum of Shannon's formula there are not (many) summands in which they are greater than 0.37.
In a series of decreasing (abscissa) probabilities x_{k} → 0, ordinates y_{k} = x_{k}⋅log x_{k} rapidly become smaller and less important for the sum S = ∑_{k} y_{k} which gives the mean value of the information. The Shannon value of S is contributed more by outcomes with higher probabilities!
The consequences of this attitude are numerous and varied. For example, very large surprises contribute very little to the middle information, that is mean, the Shannon's. Second, minimalism changes the distribution of probabilities by separating a smaller group of information significant for the average, from numerous noninformative ones with, paradoxically, less likely outcomes.
We understand freedom as the "quantity of options" of a given system, in the form of information, and especially as its mean value. Then great uncertainties contribute less to greater freedom. It turns out that regulation increases free choices, say in societies. When we ignore improbabilities, we avoid wasting our opportunities, (Shannon's, medium) information.
Disposition
Question: Are the probabilities always in some socalled distribution?
Answer: "Distribution" is the most and the easiest explainable structure of probabilities, so much that one gets the incorrect impression that it is the only one.
When we toss a fair coin, we work with the distribution of two outcomes (tails and heads) with equal probabilities, that is p = q = 0.5. When we roll the dice, the distribution has six possibilities, sides of the dice and usually marked with the numbers 1, 2, ..., 6. With a fair dice, each has the same probability p = 1/6.
Shannon's information, the mean uncertainties of tossing fair coins and dice expressed in logarithms of base b = 2 (in bits) are:
\[ S(\text{coin}) = \frac12 \log_2 \frac12  \frac12 \log_2 \frac12 = \frac12 \log_2 2 + \frac12 \log_2 2 = \log_2 2 = 1, \] \[ S(\text{dice}) = \frac16 \log_2 \frac16  ...  \frac16 \log_2 \frac16 = 6\cdot \frac16 \log_2 6 = \log_2 6 \approx 2.58. \]A dice is two and a half times more uncertain than a coin, according to Shannon. In general, if a distribution has n ∈ ℕ equally likely outcomes, and the logarithms are with base b = n, then its Shannon information is S(n) = log_{n} n = 1. But the ratio of the results in any of the bases (b) of the logarithm is the same, because:
\[ \log_2 6 : \log_2 2 = \frac{\log_b 6}{\log_b 2} : \frac{\log_b 2}{\log_b 2} = \frac{\log_b 6}{\log_b 2} : 1 = \log_b 6 : \log_b 2, \]and the ratio is the same in all measurement units of information. Unfair games, outcomes of different probabilities, will give less (Shannon) information than these.
Namely, when we know that something will happen, and it does happen, it is less news. But, from previous we see that, by changing x = 1/p, for the probability p < 1/e, the Shannon sum becomes smaller when the probability (p) is smaller. This reduction in tail probability hurriedly reduces the Shannon summand, as opposed to the contribution in the middle part. Thus, we intuitively understand that leaving the uniform distribution reduces the Shannon information, and a more rigorous proof is in the attachment Extremes.
The distribution consists of such a group of possibilities that (always) only one of them occurs. We say that that group, or set (Ω) is complete and that it has the said dependence, the univocity of the outcome. They appear all around us, in drawing a ball from a lotto drum, or waiting to see if it will stop raining in the next hour or so. They also run parallel to each other, overlapping.
Let's say we have n = 1, 2, 3, ... probability distribution Ω_{1}, Ω_{2}, ..., Ω_{n} each with some Shannon information, respectively S_{1}, S_{2}, ..., S_{n}, such that over a period of time we can (statistically) establish that kth distribution is notified N_{k} times. Let the sum of all these declarations be
Then fractions, numbers P_{k} = N_{k}/N will express "frequencies", i.e. "statistical probabilities" the outcome of individual distributions. We can further use them as probability distributions of mean information. We find again the middle information of the given Shannon information
It behaves similarly to Shannon's, then as Hartley's mean information values (I_{k} ), and now the means of those means (S_{k}). Because of this ease of creating probability distributions even where there are obviously none, they have become very interesting in mathematics and important in application. Their consequences are mostly similar, which you can research yourself.
However, one of the differences between the mean values of S and L stands out. From the higher probability p, the lower logarithm of the probability "log p" is obtained, and the same and vice versa, the (minus) logarithm of the lower probability is a larger number, so in the first case (S) we have the situation of the principle of least action, while in the second (L) this does not have to be the case. A more informative distribution may occur more often, or a less informative one less often.
This observation further explains synergy and emergence, the emergence of greater freedoms of a given system from inanimate matter, and with them the announcement of vitality. We see that these can be depicted in complex systems with simultaneous, parallel occurrences of probability distributions, as opposed to a linear, onedimensional one. We will see how this story would go on later, because it has many sequels.
Tendency
Question: If equivalent to action, does uncertainty create force?
Answer: The question is rhetorical? Due to principled minimalism, the outcome of uncertainty into information does not happen, except most often, with a lack of options that would trigger such an activity.
We understand that the most common outcomes are the most probable, but we can understand this as the action of some abstract probability force. The same, which would pull states towards more probable ones, is the opposite of the uncertainty force, which pushes states away. The third way is to observe the tendency to less information. So we have three approaches. Let's look at another one below.
Let's imagine a vibrating wire stretched between two fixed points, substitute for the motion of the waveparticle along the abscissa under the influence of the potential V(x) = 0 for x ∈ [0, a], and V(x) = ∞ for x ∉ [0, a], as in the book Quantum Mechanics, 1.3.8 Waves of matter, Example 1.3.74 (Particle in box). It forms a standing wave so that the whole curls up and down while the ends are at rest (n = 1), or two of its parts curl (n = 2) with another stationary point between the ends, or three of its parts are curved (n = 3) with two fixed points between the ends, and so on.
The Schrödinger equation for x ∈ [0, a]
\[ \frac{\hbar}{2m}\frac{d^2\psi(x)}{dx^2} = E\psi(x), \]gives eigenvalues and corresponding eigenvectors:
\[ E_n = \frac{\hbar^2n^2}{8ma^2}, \quad \psi_n(x) = \sqrt{\frac{2}{a}}\sin\left(\frac{n\pi x}{a}\right), \]for n = 1, 2, 3, ..., and outside that interval for x ∉ [0, a] these values are zero. There is ℏ reduced Planck's constant, m is the mass and x is the position of the particlewave. The function ψ_{n} gives the amplitudes of the corresponding energy E_{n}.
In the book follows the solutions of the Schrödinger equation, for example, for simple particleswaves, penetration through obstacles, hydrogen atom. These results agree very well with the experimental findings. Moreover, it is the most accurate agreement between theory and measurement in physics and science in general, which justifies The de Broglie's hypothesis (1924) about matter waves.
This gives us the right to take waves as the deeper basis of matter, then energy and mass derived from them. The measurement defines the previous path of the particlewave (Heisenberg, 1927), so the bends that give the upper sinusoids only indicate the probability densities of the eventual finding of the path. Then the change of these "sinusoids" represents the change of probabilities and the change of these becomes the energy needed to change the movement of particleswaves. In other words, we have complied with the introductory part of this text (answer).
The probability waves of elementary particles in more complex structures are more and more determined, according to the laws of large numbers of probability theory. They also have increasing energy, which, starting from the vibration of molecules and ending with sound and waves on the surface of water, we can detect with simple devices and even directly with our senses. The solutions of such differential equations satisfy the axioms of vector spaces; therefore, these waves are vectors. In general, vectors are states, but they are also processes. Those that interpret linear operations on vectors.
In the book SpaceTime (1.1.5 The universe and chaos, p. 22), I gave an example of a small town of about 10 thousand inhabitants with periodic rise and fall of trade. There are also examples of the rise, maturation and decay of the vitality of civilizations behind which new ones arise, or the evolution of life through similar processes of individuals, which also have elements of matter waves, sinusoidal and uncertainty mixtures. All of them convey some energy, have their own slowness, inertial, and force of action, although we don't have to be aware of it.
Summary
Question: Information is just a number?
Answer: Classically, it is a mere "number of options", it is said, which should be distinguished from content, data. Even then, more adroit accountants see beyond the numbers, sensing what they bridge.
It is a clear example of Einstein's (1935) intuition of the then disputed "quantum entanglement" seen from the numbers themselves, so clear and at the same time terribly unusual, that it could not be believed.
We know how by rotating the Cartesian rectangular system, its various positions are obtained and the values of the points change, while the coordinate axes always remain mutually perpendicular (orthogonal). Thus, the scalar product Ψ can develop in different ways into a series of orthogonal functions:
\[ \Psi(x_1, x_2) = \sum_{n=1}^\infty \psi_n(x_2) u_n(x_1) = \sum_{s=1}^n \varphi_s(x_2) v_s(x_1), \]where the first factors of the addition are the coefficients of the development of the sum function Ψ into a series of orthogonal functions represented by the second factors. Here, x_{1} and x_{2} are the variables of the first and second systems. Freedom in the choice of coordinates, I and II systems, makes this equality paradoxical.
Namely, the eigenvalues a_{n} and b_{s} to which the eigenfunction belong respectively u_{n} and v_{s} are interpreted, the first by observables and the second by quantum states, or more precisely by the probabilities of the occurrence of those and stated situations. The description, the quantum system, is redefined by these sums, giving it the meaning of the measurement environment. In two different environments, very far apart, a measurement of one would instantaneously reflect a measurement of the other and, Einstein noted, would be a "phantom action at a distance" that would violate the limit of information transmission at the speed of light.
The summary of that "harmless" calculus scalar product of linear algebra is in its background the observation of the simultaneity of different states, which considering that we are working with the transfer of spacetime coordinates in physics, means that there are different simultaneities that other participants of the time flow do not notice as such. Also, the spatial distances of the same for relative observers differ, which is at the time of the discovery of the mentioned APRparadox was familiar and did not help understanding.
The existence of ideas different from numbers is proved by musicians, painters, even geometers of topology (forms without defining length), graphs or various others in mathematics for whom metrics are not a topic. Even those sum of products functions, either of the above expression Ψ, or of the simpler type of "perception information", say much more than the results of concrete calculations. They actually tell us more generally than calculated.
When we discuss the information of perception, you will notice, we interpret uncalculated quantities, because they are more useful to us as "blurred" wholes, than much "clear" pieces. It is impossible or almost useless to define all the details of an otherwise very long series of perceptions of a macroobject, such as the futile enumeration of all down to one molecule of objects that we would wave our hand at. From the observation of the superposition (probability distribution), one can extract more data, also, than from its individual collapses (outcomes).
This is a topic we often return to (Farming, Q: Knowledge is information endowed with meaning?). The essence of nature is in such layering that you can dress it in various shirts that fit it equally (starting from the geometry of shapes through equations, analysis with "pure" numbers, or "nebulae" of probabilities, sets, to various already, recently discovered abstractions) , and then there is no "the best".
Namely, next to Gödel's first incompleteness theorem (all consistent axiomatic forms contain undecidable statements) there is also the second incompleteness theorem (that no consistent axiomatic system can prove its own consistency). In other words, there is no such "sense" that would be selfsufficient. On the other hand, there is no such "nonsense" that we will not to peel like an onion, using numbers as tools.
Creativity II
Question: What is creativity?
Answer: The first question was how to make creativity, the second was what it could be. The meaning of the latter is a big topic of this kind of information theory.
The assumption of creativity would be vitality, and then a leap into uncertainty. Leaving the comfortable majority opinion, however, is not pleasant. The escape from freedom, coded in principled minimalism, becomes strongly attractive to us when we are face to face with larger real news. Descending into lower "option grips" than normal or our own is also repulsive to us, due to occupancy and conservation laws. In lower, weaker vitalities, routines are denser and creativity remains rarer (in the living world in general).
From a distance, the science looks creative. To a layman, that warehouse routine is still fresh, innovative, therefore difficult or repulsive, but to many practitioners, science is a factory assembly line. It is the emptier the better established it is, and paradoxically, the more difficult it is for serious innovation. In many areas, creativity is easier than in the exact, especially the more abstract and precise.
"All creative people want to do something unexpected" — Hedy Lamarr (1914  2000, AustrianAmerican actress and inventor), but whoever tastes creativity at least once will want it again, I would add. Such dependence as light drug addicts are a puzzling feature of creatives. It may be part of the sexual drive (Origin), or it may be the anxiety of the person being squeezed in the wrong place. However, not belonging is the greatest gift that excites us then.
According to the explanation of quantum entanglement (Summary), especially the part (of mine, informatics) that it is about simultaneity that other observers do not see as such, there is a possibility of a large concentration of such quantum entangled substance and, I further hypothesize, which could manipulate the "under the radar actions." The thesis is that the brain could have such unusual substances and that it is therefore special, separated from the predominantly living tissue.
Nature does not like equality and it turns out that a centralized hierarchy is worse than a scattered one like the nodes of a free network. That's why, no matter how powerful a brain we have, it wouldn't manage the body functions of an even more complex living organism well on its own (the creation of erythrocytes, liver processes, metabolism, in our body). Hence the special ability of the brain that gives consciousness, intelligence and creativity.
Dual Likely
Question: Is "information of perception" more information than probability?
Answer: There is a dualism between information and probability, so it is not enough to find some "percentage of the mixture" to answer such questions. Otherwise, it could be set equally to Hartley's information (log p) as well as Shannon's (Hartley's products with probability).
The figure shows the Venn diagram of the union of sets (A ∪ B) decomposed into three disjoint sets, without common elements (A\B , A∩B and B\A). From this, equalities for the number of elements of these sets, probabilities are easily found when we see the sets as random events and interferences, in the equations:
P(A + B) = P(A) + P(B)  P(AB),
A + B² = A² + B² + 2ℛℯ(A^{*}B).
The labels for the letters A and B are the same, but consistent with the meaning of the operations, in the first equality they are sets, in the second they are events, and in the third they are particlewave superpositions (or say and probability waves).
In the third, the real double part of the product conjugated A with B, increases for interferences in the same direction (constructive), and decreases in the opposite direction (destructive). This is seen from ℛℯ(A^{*}B) = a_{x}b_{x} + a_{y}b_{y} = a⋅b which easier we recognize as the scalar product of the vector a = (a_{x}, a_{y}) and b = (b_{x}, b_{y}), where the complex numbers are A = a_{x} + ia_{y} and B = b_{x} + ib_{y}, and i² = 1.
That universality of application (set, event, wave), typically mathematical, actually arises wherever concepts are worked with sufficiently precisely or stripped bare, abstractly. Information is to us discrete phenomena (Packages) that are physically isolated are not incorrect, they are "algebraically true" in themselves. That we can perceive them as such is established by the Borel–Cantelli lemma of probability theory, in addition to other findings of information theory (in a large set of clearly defined ones, only a small number relevant).
When the probabilities of the discrete distribution in the decreasing (nonincreasing) sequence p_{1} ≥ p_{2} ≥ p_{3} ≥ ... ≥ 0, or we have a decreasing probability density ρ(ω), then (p < b^{1}) decreasing and adding Shannon information (Surprises). Then p log p → 0, when p → 0. In other words, in the multitude of additions of average, more precisely Shannon information, only those additions with the highest probabilities remain relevant. It is paradoxical, but, in the crowd, a greater total sum of information gives greater probabilities.
At a higher, more complex level, where the law of large numbers sets in, uncertainty slowly grows into certainty, and probabilities qualitatively change into quantitative "information of perception". Thus (Disposition) the collapsing of the superposition into one and only one of the possibilities becomes complex, more dimensional, or let's say a parallel event full of outcomes at once. The macro world around us does not unfold like a chess game, I move and you move, or the interaction of elementary particles.
Therefore, the above forms of the number of elements of the set, or the probability, that is, the interference, are transferred to the macro information itself. We can say mass, energy, or some other physical quantities of our order. Just as the physical powers of the body come from the molecules contained in it, and these from their vibrations and even lower parts, so macro information contains a latent "power" manifested by synergy and emergence.
Computable
Question: Is the universe computable?
Answer: Not. The assumption of the objectivity of uncertainty is equivalent to such unpredictability that it is not possible to complete and simulate the universe completely.
Deterministic phenomena would be those that always have at most one outcome of their states. But we work with a theory that puts information at the heart of everything, and we see its essence as uncertainty. Calculability, nor predictability of the outcome, like flipping a coin, is not possible.
Due to the conservation law, infinite divisibility of information is not possible and it is transmitted in packets. That's why there are discrete steps of paragraphs, clauses and proofs. Such are the quantum of physical action (product of energy change and elapsed time), and hence the equivalence of physical information and action. On the other hand, every such structure, even the "whole" universe, is in a state of some uncertainty, either internal or external. In this sense, the universe cannot be limited either spatially, temporally, or materially.
This is consistent with elaborations of Russell's paradox (Sufficiency). There is no set of all sets, so there is no such universe, nor multiverse, which should include all options. The assumption that there is such coverage is contradictory. Here we are talking about information, not only physical information, which is also in accordance with Kurt Gödel's theorem (1931) that there is no theory of all theories. Consequently, the idea that (somewhat almighty, someday) we could have such a "supercomputer", mind, or any kind of projector that could exactly copy, simulate "all" the universe is contradictory.
Question: Can you explain that Gödel’s theorem to me?
Answer: OK. Gödel’s proof of his theorem is based on selfinvocation: in mathematical form, the statement "This sentence is unprovable" is both true and formally unprovable. The details of the proof are difficult, but the American logician Raymond Smullyan (19192017) still found a great way to convey the essence of Gödel's incompleteness theorem using simple logic puzzles about truth tellers and liars. Here's his one.
In the Ocean of Deduction lies the logical island of If. People born there belong to one of two tribes: the Alethians and the Pseudians. The only way to tell an Alethian from a Pseudian is to talk to them. The former always tell the truth, the latter will always utter lies, no matter what.
In the center of the island, the Lord of the Aletians holds the Book of Identity, a book that lists the names of all born islanders including their tribe. The information in the Identity Book is accurate and free to anyone who asks. And, one day, the intrepid explorer will arrive on If. She meets various inhabitants and identifies them as Aletians and Pseudians by asking wise questions.
After several successful such encounters, she meets a man named Kurt. The researcher doesn't know his ethnicity, but before she can ask him a question, he says, "You'll never have concrete evidence that I'm Aletian." Such a Kurt is neither an Aletian nor a Pseudian!
Here's how. If he were a Pseudian, then the statement that "You will never have proof that I am an Aletian" would be fine. But the Pseudian never tells the truth, so Kurt can't be the Pseudian. If he were an Aletian, then everything he says is true, and therefore the statement "You will never have concrete evidence to confirm that I am an Aletian." But we know that the statement is false, because the Book contains his name and the verification that he is an Aletian. So, Kurt is not an Alethian. The bottom line is that Kurt can't have been born on If Island.
Gödel's incompleteness theorem states that there are mathematical statements that are true but not formally provable. A version of this puzzle leads us to something similar, a statement that is true, but there is no concrete evidence of its truth.
Distances
Question: Why is the universe flat?
Answer: Whether the universe is flat is a debatable question. Namely, the corresponding distance they could measure from Earth to the end of the visible universe is about 46 billion light years, and one light year is the length that light travels at a speed of about c = 300,000 km/s in one year. On the other hand, the universe was created with the Big Bang 13.8 billion years ago.
Those two numbers will tell us that the universe expanded at speeds greater than light, that is, that we can see further than we should, if the speed of light was always the same.
Second, we know that the average distance Earth from the Sun is about 150 million kilometers. Light travels that distance in about 500 seconds, or 8 minutes. A distance of 8 minutes versus 46 billion years, traveled at c, is much less than even the best visible distance on earth compared to the size of our planet, and we know how hard it would be to convince some of us naked observers’ eye, that the Earth is not a flat plate. To the perception of ants, the detail of a circle with a diameter of kilometers is a straight line.Unlike the simpler distance measurements from us to distant galaxies, using fluctuations in surface brightness and galaxy color, to search for curves of space, sooner or later, we use triangulations. We need the most distant points from which we observe the universe, so that the measurements are as accurate as possible. And they are in this relationship, a few minutes compared to tens of billions of years, that we see the universe as flat, even if it is not like that. The points from which we observe the universe are still only those from the Earth in its orbit around the Sun.
By the way, I considered the tensor model rotating universe (2014), or metric of the torus as a model of the universe (2016), to mention only some exotic ones in addition to the more wellknown ones, for the sake of comparison with possible measurements of not only the curvature of spacetime of the universe. What can I say except that the way we measure the universe today, until further notice, it remains flat.
Liberty
Question: What would that new informatic meaning of "freedom" be?
Answer: Information Stories popularly talk about freedoms, especially the first of them. However, such a direct question should be answered using Information of Perception, starting from its formula summary of products
The first factors form a sequence, a vector \( \vec{a} = (a_1, a_2, ..., a_n) \), the second factors form a vector \( \vec{b} = (b_1, b_2, ..., b_n) \). These are, for example, the components of the respective perceptions of the subject and the object, and then their otherwise natural number is n ∈ ℕ so large that it is more useful to investigate the information of perception theoretically than concretely.
At a higher, more complex level, where the law of large numbers sets in, uncertainty slowly grows into certainty, and probabilities qualitatively change into quantitative "information of perception" (a sentence from Dual Likely). Hence, such macrolevel coefficients are real numbers, a_{k}, b_{k} ∈ ℝ. The Vitality of that connection means greater possibilities of movement than mere physical ones, from the solution of the EulerLagrange equations from which it is possible to derive all the trajectories of theoretical physics known to this day (Lagrangian). They are an expression of the principle of least effect.
Not everything communicates with everything, so that individual components, members of these sequences, can really be represented by vector components, and the problem can be treated using vector spaces of linear algebra. With this we know that the smaller angle φ between two vectors \( \vec{a} \) and \( \vec{b} \) means a greater similarity of their respective coefficients and a greater scalar product \( Q = \vec{a}\cdot\vec{b} = ab \cos\varphi \), labeled ab the product of the intensity of these vectors. This is because the cosine of φ increases as the angle of the vector span decreases towards zero.
Try to understand the size of Q as a result of coupling between yourself and the environment. The higher this sum is, the higher the level of your communication, the higher your vitality, the ability not to act like the dead substance of which you are composed, therefore, you have greater freedoms. At the same time, the corresponding coefficients a_{k} and b_{k}, respectively for k = 1, 2, ..., n, represent information that the subject would exchange with the environment.
1. When we respond to stronger actions of the environment (opponent) with stronger reactions, positive to positive and negative to negative, the sum Q is higher, the vitality of the coupling is higher, that is, the level of competition is higher. For example:
The smallest resistance will provide a weaker response to a stronger one and a stronger one to a weaker one, for the same coefficients (unchanged intensity) differently permuted. The option that literally always gives way, which sticks to the left side of this inequality, is still life, and the I league strategy sticks to the right.
In contrast to "least action", the main determinant of physics, a larger sum of the products Q indicates additional choices of the subject in the coupling. That excess, or vitality, is the augmentation of the subject's freedom against the dead physical substance of which it is composed. A larger sum means a greater ability to outplay simple natural flows, a stronger defiance of spontaneity, or let's call it the powers of reciprocity the subject's response to environmental challenges. At the same time, the environment itself "adapts to the bully"; physical nature "yields".
2. When we understand "freedom" in this way, it is the subject of the study of the "sum of products" formula. The first thing we see then is that more additions n mean more senses and perceptions, and then greater vitality. An entity, say some communication center, with more connections n is more informative. At the same time, the greater intensity of the components in the exchanged messages will contribute to a greater sum when the product in the sum is positive (a_{k} ⋅ b_{k} > 0), and to a smaller sum when such a product is negative (a_{k} ⋅ b_{k} < 0). The first will be when the factors are of the same sign (both positive or both negative), and the second is with those multipliers a_{k} and b_{k} real numbers with different signs.
At the lowest level, in the microworld, where these additions are reduced to multiplications of complex numbers, A = a_{x} + ia_{y} and B = b_{x} + ib_{y}, we have seen (Dual Likely) that they can interpret as interferences, each with the addition of 2ℛℯ(A^{*}B) ≷ 0, which can amplify or weaken the amplitudes of the sum of the waves depending on their phases. Higher levels also distinguish constructive and destructive influences in their own way, because of which there are positive or negative these real numbers — if the subject manages to distinguish them.
3. Among such differences is the "divide and rule" tactic. Namely, the subject who has enough subtlety in the game of competition against the opponent and manages to break down the kth sum of negative multipliers into corresponding pairs, a_{k}b_{k} = (a_{k}' + a_{k}'')(b_{k}' + b_{k}'') → a_{k}'b_{k}' + a_{k}''b_{k}'' , will decrease the total for the mixed product of these and lower the opponent's (object) initiative:
Notice that I separate the answers measuredly, to maximize the total, so a smaller attack on the smaller of the two parts of the initial initiative looks like more support (less damage) to the weaker side, hence the name of the tactic.
4. The demonstrated separation of negative influences is desirable for the subject, because it means their overall reduction. On the contrary, the same splitting of an aggregate with a positive impact into two positive ones means less "good" for the subject, so it is undesirable for them:
It then corresponds to the tactics of "grouping and organizing", in military and similar conflicts, or simply expresses synergy, or the emergence.
Let's note that by grouping and better organizing cities, we achieve a greater power of absorption of residents, a greater range of their activities, and that means greater vitality, that is, freedom. Let us distinguish the freedom of the group from the (average) freedom of its members.
5. Finally, when splitting the positive addition a_{k}b_{k} into two a_{k}'b_{k}' + a_{k}''b_{k}'', again a positive addition, but of factors with different signs, the sum increases. This is demonstrated by an example:
The lesson is that in the fight for greater profit, vitality, or victory in competition, we will try to be subtle. No matter how positive some of our opponents are, if there are negatives, they should be proportionately prevented, just as it is good to accept the remaining positives.
6. Conversely, when splitting the negative influence, then the negative multiplier (b_{k} < 0) which is opposed by the negative (a_{k} < 0), for their product to remain positive (a_{k}b_{k} > 0), the game becomes hard and it is good to avoid it. In other words, with predominantly negative influences, it is more laborious to separate the "wheat from the chaff" so much so that sometimes it is not worth it:
The cost increases and the intervention can prove harmful, so that such a division is not worth the effort. However, when it is present, it is an indicator of greater vitality, or freedom of the subject.
In short, it would be the new informatic meaning of "freedom". It's quite algebraic and complicated to explain in a couple of sentences, but it's actually much more precise than the previous intuitive one.
Channeling
Question: What are the limits of information transmission, especially Markov chains?
Answer: The information theory I work with will consider all truths (pseudo) real until further notice. It thus encompasses both physical and mathematical phenomena, together with its infinities.
1. That is why the Borel–Cantelli lemma suddenly becomes more important, that in the convergent series of the probability distribution, only a finite number of events can be realized with a probability of 1 represent. Thus, from almost countless vocal possibilities as children, we develop into people who use only a twodigit number of letters in daily communication.
2. Moreover, the frequency of appearance of letters of the text depends on the language, that is, the alphabet, or on the topic, on the style of the writer. You can check this with the program "rates", where I cited as an example a passage from Andrić's novel "На Дрини ћуприја" (On the Drina bridge), with found letters (Ivo Andrić wrote in Serbian Cyrillic):
frequency in descending order:
\[ \frac{579}{5284} \ \frac{564}{5284}, \ \frac{526}{5284}, \ \frac{459}{5284}, \ \frac{288}{5284}, \ \frac{274}{5284}, \ ... \]It is a series of statistical probabilities of the given letters, approximately:
of a total of 1. Let's note that there are also several letters of the Serbian alphabet that do not appear at all in Andrić's passage.
3. Different letter frequencies come from the informativeness of the given text (Letter Frequency) and are a kind of "fingerprint" of the language, topic, or writer. The more substantial the text, in the sense that it is clearer, more specific, less scattered and "hollow talk", the less informative it is! That may seem as absurd as it is logical, so I checked that thesis (in the aforementioned attachment) on different texts.
However, distributions of equal probabilities are the largest information, and that is why nature "doesn't like" them. She will spontaneously strive for diversity. Thus random, monotonous text also becomes less informative, letters of different frequencies and in their own way characteristic.
4. Copying ideas into a story and text are types of information transfer. In the previous image, the link leads to the interpretation of multichannel and omnichannel. The first term refers to the use of more than one channel for the execution of marketing campaigns, and the second one orients them to the customer. They are also ways of transmitting messages. Oral traditions over the centuries, shaped and preserved by rhyme and rhythm, are also specific transmissions in Serbian folk songs (see below 7.).
When we copy the text verbatim, letter by letter, word by word, we avoid creativity as in copying ideas into a story, but no matter how hard we try, transmission errors will appear. Let's say that the probabilities of the abovementioned letters (Andrić's passage) are in the same descending order p_{1}, p_{2}, . .., p_{n}, where n is the total number of letters (uppercase, lowercase, spaces, punctuation). If m_{ij} is the probability of copying the ith letter into the jth, then we have the transmission channel matrix M = (m_{ij}), where \( \vec{q} = M\vec{p} \), that is
\[ \begin{pmatrix} q_1 \\ q_2 \\ ... \\ q_n \end{pmatrix} = \begin{pmatrix} m_{11} & m_{12} & ... & m_{1n} \\ m_{21} & m_{22} & ... & m_{2n} \\ ... & ... & ... & ... \\ m_{n1} & m_{n2} & ... & m_{nn} \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \\ ... \\ p_n \end{pmatrix}, \]where m_{ii} are the probabilities of correct transmission of the ith letter (i = 1, 2, ..., n), and m_{ij} is the error probability, that the ith letter is written as jth (j ≠ i). When those conditional probabilities of the M matrix are constant, then that matrix is one of the links of the Markov chain, the socalled chain generator.
The last image was an example of writing (in English) with many mistakes but still recognizable content. This type tells us that from a series of letters of decreasing frequency, such as Andrić's mentioned above, several last or rarer letters can be removed, while the content of the text remains legible.
5. When the least probable letters are removed from the text, the mean (Shannon) information of the text changes the least. In general, when the probability series of a discrete distribution is decreasing p_{1} ≥ p_{2} ≥ p_{3} ≥ ... ≥ 0, then the sequence of additions is also decreasing p_{1}⋅log_{b} p_{1} ≥ p_{2}⋅log_{b} p_{2} ≥ p_{3}⋅log_{b} p_{3} ≥ ... ≥ 0, so such terms of Shannon information become all the less important in the sum (Surprises), for probabilities less than e^{1}. For example:
I mentioned this as the reason that in the macroworld (Hartley's) information can be (roughly) considered as proportional probabilities, which becomes significant in the expression of perception information.
6. By copying, then repeatedly copying the last ones into further new copies, the original message is increasingly deformed, lost. The second step is:
\[ \vec{q} = M\vec{q} = M(M\vec{p}) = M^2\vec{p}, \]and in general in the kth step \( \vec{s}(k) = M^k\vec{p} \). This \( \vec{s}(2) = \vec{q} \) and \( \vec{s}(1) = \vec{p} \). The question arises as to what happens after a long scaling of the Markov matrix (what happens to M^{k}, when k → ∞), whether there is limes \( \vec{s}(k) \to \vec{x} \), when k → ∞ and if so, what is its value?
In the contribution "Transformations" (7.5) you will find why each step of the Markov chain will narrow the range of probabilities of the input distribution, followed by examples of marginal results after infinite steps. The extended examples from "Spectral Theorem" (8.3  8.6) are actually a way to search for or prove the existence of a limiting distribution \( \vec{x} \), such that \( M\vec{x} = \vec{x} \).
The name of this \( \vec{x} \) is "eigenvector", or "characteristic vector" of the given matrix M. The action of its matrix will not change its component values, but it represents a uniform distribution (of the same chances), so a long series of scaling (M^{k}, for k → ∞), a long Markov chain, will become a "black box".
Each input vector, passing through such a chain, will absorb more and more noise, interference, misinformation, in order to become characteristic with a uniform distribution of maximum information. From the exit, we see the further and further entrance loaded with more and more distractions, that the steps closer to us are less and less informative, and the further ones are more and more blurred and uncertain.
7. The addition "Unit Roots" explains the situation, the type of transmission channel, when the input information lasts. These are oscillations, periodic, or let's call them circular, repetitions of the input message and its forms with some finite period. Almost as rhyme and rhythm were kept by the aforementioned (4) Serbian folk songs. Similar examples of copying are all forms of wave motion, from familiar physical forms to the transmission of life through birth and death.
Stalemate
Question: Are the two players of the "rematch" league always waiting for a draw?
Answer: Of course not. But if the same that software plays against each other, draws are more common than wins. Similar to real situations (wars, politics, economy), where the struggles are multidimensional, the rules of the "rematch" strategy seem to lead from the same to roughly the same, mostly.
However, if we strengthen one of the software, a win is more likely than a draw. Let's call it another situation and it is always possible — simply because of the principle of objective uncertainty. There are always new options that would increase the sum of the products of "perception information", and thus the vitality of the coupling subjectobject, i.e. playeropponent, which increases the vitality and power of the game.
Here are some more of my experiences from repeated simulations. Blockades that occur with equal players of that I league occur even though the software has enough crossroads (intersections) with random selection of continuations. I put together various modules with slightly different but essentially the same answers to the same challenges. It happens that the processes run seemingly unpredictably, but actually in the long run cyclical and — no progression, no winners.
I assume that the same can happen in competitions of plants (for habitat), animals, people, when the individuals are equal from the same species. The strengthening of the game will give a more frequent assessment of the position (better insight, perceptions), greater possibilities of response (vitality of the subject), greater subtlety. Larger theoretical values are generally confirmed by these simulations. Entries that increase "perceptual information", embedding those finesse in program packages, increase the chances of victory of the software itself (Liberty).
However, if the level (of both players) is lowered from the "evil" category to the second league, avoiding sometimes "retaliations", opponents with the same software make less frequent draws. A draw becomes even rarer when we relax the software from the league of "manipulators" to the bottom, into the "good" class. This absurd phenomenon indicates greater instability of states prone to yielding, generally to less actions, but also the appearance of something like the force of maintaining vitality as a response to minimalism. Dynamics, chance and change are at the core of nonliving matter. On this absurd skeleton is universal forms.
Friend of Friend
Question: What do you think of a strategy like "my friend's friend is my friend", if such a strategy exists?
Answer: We can detect such a strategy, if it doesn't exist. It's worth testing and, I expect it to be one of the II League. If there is a "manipulator" in the chain, he hangs us on a bad assessment of good and evil and we drop out of the First League, and again, he remains, say, a "victim until triumph", that is can "sacrifice to victory", so he don't enter the Third League.
For now, in simulations and according to "perception information" they lead strategies of timely, measured and unpredictable reciprocation of "good" to "good" and "bad" to "bad" actions of the opponent. With a few more finesse that increases the "number of options" of the game, i.e. the vitality of the player's connection. That is the description of the strategy of the I League. It is not surmountable by II League players who occasionally do not follow those rules. Each of them defeats the strategies of the III League, players who combine only "good with good".
I have yet to come up with algorithms in "token wars", in the suggested way "my friend's friend is my friend", with all the additions maybe "my enemy's friend is my enemy", also "my friend's enemy is my enemy" with finally "my enemy's enemy is my friend". I predict that it will not be the strategy of the First League, that is, invincible for the other two leagues, because it is easier to manipulate.
Namely, the term "friend" is relative, variable and an expression of the average. The first league looks for subtlety, it will look for some "evil" in (almost) every "good" detail to support the positive and hinder the negative, and vice versa, bad initiatives will share similarly. Such situations are unclear to the strategy of "friends" and will behave like a slow, dull, or naive newcomer who has wandered into a top competition.
Because of this imprecision, the relation "being a friend" is not transitive. Let's say that A and B are friends because they share at least 60 percent of common interests, and also B and C have a reciprocity coefficient above 0.6. Then (0.6×0.6) the connection between A and C could be only 36% and such are not "friends", blind trust between A and C is not justified.
On the other hand, friendship also implies helping in adversity. The characteristic of the strategy of the "good guys" (III league), that they let themselves "down the drain", would make giving up friendship at the first need for effort unnecessary. The strategy of "friends" in this sense is above the III League.
In short, clarify to me the definition of the relationship "being a friend" to start working on the algorithms of its skill, but I definitely believe that it will not stand a chance against "Terminator" (a package of modules of the strategy of "retaliation": timely, measured and partly unpredictable activities).
Datum
Question: How do you connect "knowledge" and "information"?
Answer: The beginning is usually like this. Let's imagine that we work with encyclopedic data from various fields, multiple types, with computer databases, or similar. Let the set of such facts be Ω, and ω its subset, or element.
Terms B_{1}, B_{2}, ... B_{N} , like house, fire, water, ... can be formed into sentences. For example, B_{1}B_{2}: the house is on fire, B_{ 2}B_{3}: water extinguishes fire. The house is on fire and we extinguish it with water, it would be B_{1}B_{2}B_{3}, and the like. Such B_{k}, or ω_{k}, where k = 1, 2, ..., N, are elements of Ω.
As in some story only in the infinitive, we have a series of N = 1, 2, ... position b_{1}b_{2}...b_{N}, where each item has a code, or a number b_{k} with position k = 1, 2, ..., N in the story, where it is present in that sequence (b_{ k} = 1), or absent (b_{k} = 0). It is easy to notice that we have a sequence of N binary digits which can denote M = 2 ^{N} different possibilities. The number N is information (in bits) and M is the number of different stories, the amount of knowledge that can be told.
I continue to work with an even deeper analogy. Knowledges are subsets of the aforementioned Ω and may or may not have common sequence terms B_{1}, B_{2}, ... B _{N}. It is clear that such elements can be worked with as usual with sets and that we can process them in a probabilistic way (Dual Likely).
1. The elements of two sets, X and Y, are counted by the formula
n(X ∪ Y) = n(X) + n(Y)  n(X ∩ Y),
which we can now interpret as: the "information" of the union of two "knowledges" is equal to the sum of individual X and Y minus the common (repeated) ones.
2. This can be checked with simple examples like this:
X: 100110101 (5)
Y: 001100110 (4)
U: 101110111 (7)
The "knowledge" union of this example has 7 = 5 + 4  2 "information", elements. The analogy extends through the concepts of knowledge, information, number of elements of a set, to probabilities, then to superpositions of quantum states, then to interference of waves of all sizes. The power of mathematics in applications is enormous, only when the implementations fit well into its abstractions.
Two plus two cows are four cows, 2 + 2 = 4, but also two plus two cups are four cups, although cows, the numbers themselves, as well as cups are completely different concepts.
3. We do not stop at the previous one, but also use the "law of total probability" (formula), reduced to:
\[ m(Y) = \sum_{k=1}^N m(YX_k), \]of course, with the appropriate interpretation. Sets are "knowledge", the sum is their union, and the product is the intersection, while YX_{k} represents the elements of the set Y which are also elements of X_{k}. Disjoint sets X_{k} are "knowledges" that decompose the total "knowledge" (Ω, the union N of them) into also disjoint sets YX_{k}. The number of elements of the set ω is m(ω).
Disjoint (have no common elements) knowledge X_{k} whose union is Ω they form one decomposition, the breaking of Ω into subsets.
4. With them, the "Bayes' formula" of probability applies:
\[ P(X_iY) = \frac{m(X_iY)}{m(Y)} = \frac{m(YX_i)}{\sum_{k=1}^N m(YX_k)} = \frac{P(X_i)P(YX_i)}{\sum_{k=1}^N P(X_k)P(YX_k)}. \]Its interpretation is easily deduced from the previous one. For example, Ω is a city of inhabitants, whose neighborhoods X_{k} divide the city, i.e. are disjoint subsets of residents whose union is Ω, and Y is the set of visitors to a department store during one day.
Then P(X_{i}Y) is the probability that the resident of the ith neighborhood was a visitor to the department store, while m(X_{i}Y) is the number of visitors to the department store from the ith quarter, and m(Y) total number of (different) visitors to the department store.
The facts
Question: Where else can those formulas be used (Datum: 1, 2, и 3)?
Answer: Mutually unique mapping, bijection, when conditions (1) are met, is a guarantee of formulas (2) and (3). Notice that in that answer (Datum) I use two measures of the "number of elements" of the set, say n and m = 2 ^{n}, or similar for which that condition applies.
In the image on the right, the universal set is broken into three disjoint sets, X_{1} + X_{2} + X_{3} = Ω, with Y ⊂ Ω which these also break, decomposed into three disjoint YX_{1} + YX_{2} + YX_{3} = Y, or precisely on A'' = YX_{1}, B'' =YX_{2}, C'' = YX_{3}, where X_{1} = A' + A'', X_{2} = B' + B'' and X_{3} = C' + C''. We can also write A = A' + A'', B = B' + B'' and also C = C' + C'', and the image and explanations will be equally accurate. It is similar with the mentioned bijection.
With n = 1, 2, 3, ... digits {0, 1} binary notation can express m = 2^{n} different numbers. Then m is the number of "interpretations", and the "information" they carry is log_{2}m = n bit. Also, m is the volume of "knowledge" with n (encyclopedic, or some database) "terms". By changing the bases of the logarithm η = log_{b} m, and further generalizing by bijection, we arrive at functional dependencies ν = f(m), which we are talking about here.
Applied to sets, we need a measure of the "number of elements" p with at least two properties (out of three) of the probability axiom:
 Nonnegativity: (∀ω ⊆ Ω) 0 ≤ p(ω) ≤ p(Ω);
 Additivity: for ω_{n} ∈ Ω, where n = 1, 2, ..., and mutually disjoint sets (ω_{i} ⋅ ω_{j} = ∅ if i ≠ j) is valid \[ p\left(\sum_{n=1}^\infty \omega_n\right) = \sum_{n=1}^\infty p(\omega_n). \]
Without the third axiom of probability theory, normality: p(Ω) = 1, (1), (2) and (3) equations of the mentioned answer (Datum) also hold. Then the measure p becomes n, or m.
For example, in the above image, let 40 and 20 be the sizes of the sets A' and A'', then 50 and 30 are the sizes of B' and < var>B'', and 60 and 40 measure C' and C''. Then (1) the previous formula gives, for example:
p(A ∪ Y) = p(A) + p(Y)  p(A ∩ Y),
130 = 60 + 90  20,
which is correct. Also, the third (3) formula gives the exact result:
p(Y) = p(YA) + p(YB) + p(YC),
130 = 20 + 30 + 40,
as well as the fourth (4) formula:
\[ p(AY) = \frac{p(AY)}{p(Y)} = \frac{p(YA)}{p(YA) + p(YB) + p(YC)}, \] \[ \frac{20}{90} = \frac{20}{20 + 30 + 40}, \]which is again correct. All in all, it is possible to look at the facts in the stories formally as "information" in "knowledge", that is, to reduce them to counting digits in the amount of numbers that can be expressed by them.
For example, when the symptom Y appears in the disease X_{k}, for k = 1, 2, ..., N, with known probabilities P(X_{k}). When Y occurs, then the probability of the disease is P(X_{i}Y), according to Bayes' formula. Computer diagnostics have been doing this for a long time, and we are just expanding the idea here.
Effect
Question: Information and knowledge are related like digits and number?
Answer: Yes, formally, you can say that. From (Datum):
b_{1}, b_{2}, ..., b_{N}, → M = b^{N}
where b is the number system base 2, 3, 10, or whatever, then the unit of information and the base of the Hartley logarithm log_{b} M = N. That N will be information, and M then is knowledge, we can say power, as in the wellknown (Serbian) school song: “Knowledge is power, knowledge is might, so learn children day and night” of the uncle Jova Zmaj.
In that poem, the power of knowledge is reflected in the ability of a learned man to rule nature, other people, and himself. However, the power of options lies in the surprise of the Romans when Hannibal attacked them by crossing the Alps with elephants (Hannibal ante portas), or in the speed of Napoleon's military actions, the third in brutality, or someone in general in the unpredictability that breaks the concentration of the opponent's strength, stretches it and weakens certain directions. The best ILeague strategy (Reciprocity) is the top of such a "quantity of possibilities".
Like the number of digits N, the frequency of light f = 1/τ is the number of oscillations per unit of time that defines the energy of light E = hf, where h = 6.626 J⋅Hz^{1} Planck's constant, and then the momentum (photon) of light p = h/ λ, where λ the wavelength of the particlewave of light. As we know, the speed of light c = λf = 300,000 km/s. Here all specific values are approximate.
In general, matter waves have a wavelength of λ = h/p, in the previous notations, except that they can have mass (m) and then total energy
\[ E = \frac{mc^2}{\sqrt{1  \frac{v^2}{c^2}}}, \]where v is the speed of movement of the body. Their momentum is p = mv. The change in energy is the work done by the force along the path ΔE = F⋅Δx, the force is equal to the change in momentum over time F = Δp/Δt. From these wellknown formulas of physics, we see forms of connection of uncertainty, energy, and force, for example.
That it is not a random coincidence is provable (Force of probability). When we understand more frequent outcomes driven by some repulsive force of uncertainty, or attractive force of probability, then there is a way to view the abstract space of random variables like Kepler's celestial mechanics. If we imagine a series of random experiments that we all repeat and separately add their realizations into vector components, then such vectors will be longer and longer (without changing the number of components). It will approach the expected components, the values of the outcome probabilities multiplied by the trial numbers, and the angle between the increasing and the expected decreases.
For a constant number of repetitions between two described "random vectors" from the sequence of outcomes, the angle between them decreases while their lengths increase. By the laws of large numbers, this becomes "motion" (statistically) along a hyperbola that is a conic (in addition to an ellipse and a parabola), equivalent to motion under the action of a constant central physical force (attractive gravitational, repulsive electromagnetic), when the segment from force to charge at equal intervals sweeps equal area.
Life
Question: What is life?
Answer: You must have read my popular texts and the answer seemed possible and easy to you (Information Stories, 1.5 Life). Although we are closer to it than ever, I regret, but that topic is still mysterious.
Where one might be ahead of the rest in that matter is the scalar product of vectors. It will be as amazing as the "harmless" measurement of the velocity of the Earth relative to the ether that led to the famous formula E = mc² then until the atomic bombs were dropped in 1945.
Unexpected breakthroughs in science were the discovery of electricity, teamwork, the genome, the Internet, artificial intelligence. Originality goes with surprises, and their consequences are also more or less unpredictable. Once upon a time, the famous SwissRussian mathematician Euler wrote his thenmocked formula e^{iπ} + 1 = 0, so that over time only the discovery of imaginary numbers (i² = 1), became irreplaceable in mathematics, sciences, technology, air traffic. This is perhaps also the case with the "scalar product", which is known long before "information theory".
The basic idea is that such a "sum of products" can be a measure of vitality in "information perceptions". That its sum Q = a_{1}b_{1} + a_{2}b_{2} + ... + a_{n}b_{n}, i.e. q_{k} = a_{k}b_{k}, can represent separable degrees of freedom, intelligence (a_{k}) and hierarchies (b_{k}) for example. I would not repeat that performance now, but let's move on to the later steps. The greater the vitality of the coupling Q, the smaller the angle between the multiplied vectors \( \vec{a} = (a_1, a_2, ..., a_n) \) and \( \vec{b} = ( b_1, b_2, ..., b_n) \), the better they (say subject and object) are mutually adapted.
Then the communication between the first and the second becomes maximally vital, considering their capacities, so as in competitions of equal "evil" (first league players) quickly turns into "cooperation of the strong" (Stalemate). The association of those of the same name, of the same sign, leads to synergy and emergence, which increases the group's freedoms at the expense of individual freedoms. We encounter this phenomenon when we hand over some personal rights to society for the sake of general safety, or efficiency, usually after each adoption of new regulations. At the same time, the loss of personal freedom is easier for us due to the principle of minimalism information.
Increasing the freedoms of the group increases the vitality, information of perception (Defiance II), with it the ability to give it up as needed and with a purpose, creating efficiency. Aren't these descriptions of the creation of cell membranes and the growth of small life forms into larger organisms?
Double Trouble
Question: Must an object have subjects and vice versa?
Answer: The question can be reduced to the priority of coupling, information of perception relative to individual states of the "interlocutor" (subject and object), or to the state of a solitary source of information.
Regarding the first, for example, smaller TV programs have fewer viewers, who also watch them less often than larger TV channels. Likewise, smaller digital platforms have fewer subscribers who also watch/lock in less often than larger digital platforms. These are the "double jeopardy" patterns discovered by the statistician Andrew Ehrenberg (1926  2010).
He called the difference between the lowest and highest average number of seconds of attention given a platform "attention elasticity." Platforms of lower attention have smaller elastic limits of attention, and he says that the efficiency of the platform determines this range. Platforms of lower attention thus have more superficial passersby, passive and inactive customers. With a wider gap between attention and visibility, leaving the potential customer more interested in the concurrent.
This is true in the marketing of trade, politics, impression and observation in general, where it is about the perception or communication of the subject with the environment. This "double jeopardy" theory of the German statistician is clearly consistent with the explanations of "information perception" and synergy. Some of the notes above (Life) are now easier to understand.
Note that the sumand "sum of products", one component of "freedom", q_{j} = a _{j}b_{j} can be interpreted as the product of corresponding "intelligence" (a_{j}) and "hierarchy " (b_{j}), which comes with a direct ratio of intelligence and freedom, the amount of possibilities, and inversely with limitations (a_{j} = q_{j} / b_{j}). Therefore, restrictions or rules of conduct are a contribution to freedom as a multiplier of abilities.
In other words, an absolutely unlimited will, which would have no resistance anywhere, would not be absolutely free, but on the contrary, would remain without freedom at all. It would be a kind of senseless body, like a person who neither sees nor hears nor feels anything, thus trapped in its own "universe" without communication with anything. In this way, we advance with the understanding of "life", a system with an excess of information, but also a world without communication.
The interaction of the subject and the object in the information of perception is what will produce the experience and give reality to the world. If A communicates with B, or with C, or with D and so on, then they are all mutually real, while some possible object X would be pseudoreal to them, if it does not communicate with any of them real. This means that the Moon exists even while we are not looking at it, because there are always enough of its "interlocutors" that are real to us.
This clarifies the second of those mentioned at the beginning of this answer. If the subject has no objects, objects of perception, or is an "object" without at least one observer, it is at best pseudoreal. In order to be realistic, he must "live" the information of perception.
Multi Ethnicity
Question: Does "perception information" have a stance on multiculturalism?
Answer: No, if I understood the question correctly. However, I can try to say something about the domain of the abstract, from the simulation, but close to the "target". Going by my personal judgement.
I hope you can easily understand "sum of products", to support the story with any formulas.
Two vectors a = (a_{1}, a_{2}, ..., a_{ n}) and b = (b_{1}, b_{2} , ..., b_{n}) the states are abilities and possibilities of perception, let's say subject and object, their communication, or information of perception
Q = a_{1}b_{1} + a_{2}b_{2} + ... + a_{n}b_{n}.
The higher the number of Q, the greater the power of absorption of the diversity of the coupling, the greater the "quantity of options" at its disposal, its vitality. This formula is my reference in simulations. The game is shown to reach a stronger flow, a higher level, a greater mastery, when it achieves a higher number of Q. I have described such contests enough.
Seeing this coupling as inward, as the tension of internal parts of the same system, it is easy to convey previous results as much as to assert their validity at that time. Let's say, every larger system (economy, society) has constructive, positive and destructive, negative elements, but I have not investigated when they are grouped into teams against teams, splitting the community, and when synergy overcomes stratification. The dilemma is whether and when "negative" elements are really harmful to the community.
For example, the frequent practice of culling male cattle to improve the farm's economic performance does not bear out expectations in recent studies (Productive performance). The Progress of the species is contributed by the presence of the apparently redundant and "useless" male sex, as I have often mentioned. A peaceful or established society will find unacceptable many forms of "aggressive" or nonteam behavior of individuals, and it can be very useful in "interesting" times. Many "disturbed" characters made the greatest contributions to the development of civilizations, or those misfits became very successful military leaders.
For example, one selfdestructive behavior shared and redirected, even to constructive ones, can give an overall positive effect and greater system vitality, 4 = 1⋅4 = (4  3)⋅(3 + 1) → 4⋅3  3⋅1 = 9. A situation like a sports coach who made his aggressiveness useful with the students he trains, or a bad student who was the shame of the community, becoming the pride of the educational institution and often science in general with his originality.
Situation 2 = (1)⋅(1) + (1)⋅(1) → (1  1)⋅(1  1) = 4 is the increase in the vitality of society when two villains of repression join together in useful work, instead of e.g. by separating prisoners into cells. Similar to the previous one is the account 2 = 1⋅1 + 1⋅ → (1 + 1)⋅(1 + 1) = 4, which speaks of the advantages of association but positive with positive measures. Analogies of internal tensions with competition are possible due to the very nature of the formulas and, I repeat, are in unexplored territory.
In short, I believe that diversity, if it would not tear society apart, is as desirable for the survival of vitality as the male individuals of the species, or as a riskier game with a victim to win (a gambit in chess, an investment in the economy) is stronger than combining only good with good. Another level would be the advantages or disadvantages of genetically mixing the species.
