Noumenology I


Any attempt to discern a fundamental description of the world we inhabit from its phenomenology is greatly complicated by the fact that we indeed inhabit it.

On one hand, there is the obvious difficulty of observation. At the end of the day, an act of observation is just a subset of the universe interacting with other such subsets. Given that there is only so much room for manoeuvre allowed by the dynamical constraints we are subject to, it may well be the case that there exists a fundamental limit to how deeply we can probe the universe.

On the other hand, there is the issue of rebuilding our everyday experience bottom-up from the fundamental laws that we have identified. This may not be as straightforward as invoking the law of large numbers and all the nice things that happen in the thermodynamic limit, because certain aspects of our experiences, both everyday and otherwise, may have to do with the atypicality of our vantage points rather than some deep fact about Nature. This is what cosmologists and string theorists refer to as the anthropic principle: the conditions of this world are amenable to life because if they weren’t, we wouldn’t be there to observe anything. Divorced from any further context, this does come off as circular, but seen in the light of the claim that there is an entire ensemble of worlds out there, it turns out to be merely selection bias in action.

(Of course, depending on what “world” stands in for, this claim runs the gamut from perfectly innocuous facts such as the existence of other planets to approximately a million times more contentious proposals such as the Multiverse, which incidentally is contentious for about the same reason you might ask, “Why not a billion times more?”)

The first difficulty, that there is a discrepancy between what exists and what we can observe in principle, is pretty much the essence of the second law of thermodynamics. After all, if we had access to information about the velocities of every molecule in some gas kept in a bicameral container, we could exploit that to arrange for the faster molecules to end up on one side and the slower molecules on the other using a valve that we can open or close at will. In other words, we could set up a temperature differential without performing any work, in flagrant violation of the Clausius statement of the second law. And if we tried to set up an actual physical mechanism to acquire this information as Landauer did in his refinement of Maxwell’s original thought experiment, we could see that at the end of the cycle, when everything is returned to its initial state and the information acquired is erased, energy would have to be transferred to the environment as heat, thereby increasing the net entropy of the universe. So, save for a constant of proportionality, entropy in the sense of Clausius (energy dissipated via heat at unit temperature) really is entropy in the sense of Shannon (number of bits required to encode the state of the system). The bridge between these two very different conceptions of entropy was of course Boltzmann.

Thermodynamics, as Boltzmann saw it, was rooted in the specificity of the human experience. If we find it absurd that shattered teacups don’t spontaneously reassemble into intact ones because something something information, it is due to human bias privileging the intact teacup over any other specific arrangement of shards. So, if we truly wished to unmask the underlying principles behind thermodynamic truths, Boltzmann argued, we would need to adopt a bottom-up microscopic view of the world in which every configuration would be on the same footing as any other. Boltzmann was far ahead of his time and, in the posthumous immortality his ideas brought him, effected the foundation of not only an entire branch of physics but also deep connections thereof with disciplines that were yet to be in existence. But having given the man his due, I’ll still have to say that in excising humans from thermodynamics he threw out the baby with the bathwater and left it bereft of any role for an observer whatsoever. Consequently, the second difficulty that I had mentioned earlier remains unaddressed and attempts to understand why the second law is true woefully incomplete.

This post is my attempt to dig up the baby and bring it back to life.


Before there can be life, there must be a body, and before there can be a body, there must be a universe. I stipulate that my universe consist of two components, A and B. I will use the same notation to denote the respective (finite) sets of states available to these two components, which means that the set of states available to the composite is A\times B. In order for this universe to be a suitable laboratory for understanding how the second law can emerge from reversible deterministic dynamics, we need to equip it with a time evolution operator \tau that permutes the elements of A\times B. If the time evolution operator \tau cannot be expressed as the product of permutations on the individual sets, A and B, the system will be said to be coupled. Since I will eventually be promoting one of these two components to the role of an observer, it is this case that I’m interested in.

The map \tau induces natural equivalence relations on the sets A and B. Namely, two states a and a' in A are declared equivalent iff the (not necessarily bijective) maps \pi_B\circ\tau(a,\cdot) and \pi_B\circ\tau(a',\cdot) induced on B via the projection \pi_B:A\times B\rightarrow B are equal. Likewise, two states b and b' in B are declared equivalent iff the maps \pi_A\circ\tau(\cdot,b) and \pi_A\circ\tau(\cdot,b') induced on A via the projection \pi_A:A\times B\rightarrow A are equal.

Now, some care needs to be taken so that the psychological arrow of time we are endowed with as outsiders doesn’t surreptitiously wend its way into our toy universe, at least not yet. As far as the system at hand is concerned, there is nothing intrinsic privileging forward time evolution \tau over backward time evolution \tau^{-1}. Hence, in order to be more egalitarian in how we go about things, I introduce an entire family of equivalence relations, \sim^A_{j} and \sim^B_{j}, indexed by integers j and defined so that a\sim^A_{j}a' iff \pi_B\circ\tau^j(a,\cdot)= \pi_B\circ\tau^j(a',\cdot) and b \sim^A_{j}b' iff \pi_A\circ\tau^j(\cdot,b)= \pi_A\circ\tau^j(\cdot,b').

The equivalence classes to which the states a\in A and b\in B belong under the respective equivalence relations \sim^A_{j} and \sim^B_{j} shall be denoted [a]_j and [b]_j. (I have refrained from including a superscript indicating whether it is A or B that the class is contained within because it is anyway evident from the representative elements a and b.) In order to further cement my commitment to the three R’s of environmentalism, I will again be recycling notation, so that [a]_j and [b]_j also denote the maps \pi_B\circ\tau^j(a,\cdot) and \pi_A\circ\tau^j(\cdot,b) respectively. This makes sense since there is a one-to-one correspondence between all possible maps on B and A on one hand and the (possible empty) equivalence classes that we partition A and B into using the relations \sim^A_{j} and \sim^B_{j} on the other. In particular, this means that it is perfectly sensible to talk of things like [a]_j\left([b]_k\right), which denotes the image of the set [b]_k\subseteq B under the map \pi_B\circ\tau^j(a,\cdot):B\rightarrow B corresponding to the class [a]_j.

Given an assignment of the a\in A and b\in B to the \sim^A_{j} and \sim^B_{j} equivalence classes (prelabelled by maps on B and A), we can unambiguously reconstruct the map \tau^j. However, an arbitrary assignment would not in general yield a bijective \tau^j on A\times B. The requirement that it does therefore places certain constraints on the assignment. (There are also constraints arising from the fact that \tau^j has to be the j-th power of some map, a property that an arbitrary map doesn’t necessarily have for j\neq 1, but I won’t be considering those here.)

The bijectivity of \tau^j boils down to this: if [a]_j(b) = [a]_j(b') for some a\in A and b,b'\in B, then [b]_j(a)\neq [b']_j(a). Likewise, if [b]_j(a) = [b]_j(a') for some a,a'\in A and b\in B, then [a]_j(b)\neq [a']_j(b). This certainly suffices; however, since I am distrustful of anything that Wittgenstein wouldn’t regard as tautological, I would rather have things formulated in terms of equations rather than inequalities. To this end, instead of looking at the \sim^A_{j} and \sim^B_{j} partitions, I will be turning attention to the \sim^A_{j} and \sim^B_{-j} partitions, which satisfy the equations |[a]_{j}^{-1}(b)|=|[b]_{-j}^{-1}(a)|, for all a\in A and all b\in B. Here, [a]_j^{-1}(b) and [b]_{-j}^{-1}(a) denote the preimages of b and a under the maps [a]_j and [b]_{-j} respectively.

Why is this true? The image of the set \{a\} \times [a]_j^{-1}(b) under the map \tau^j projects onto b via \pi_B, which is pretty much the definition of [a]_{j}^{-1}(b). The action of \tau^{-j} on the image set \tau^j\left(\{a\} \times [a]_j^{-1}(b)\right) gives us back the set \{a\} \times [a]_j^{-1}(b) we started with, which, as you can see, projects onto a via \pi_A. Thus, \tau^j\left(\{a\} \times [a]_j^{-1}(b)\right), which has the same size as [a]_{j}^{-1}(b), is contained within the set [b]_{-j}^{-1}(a)\times \{b\}, which has the same size as [b]_{-j}^{-1}(a). In other words, we have |[a]_{j}^{-1}(b)|\le|[b]_{-j}^{-1}(a)|. A symmetric argument tells us that |[a]_{j}^{-1}(b)|\ge|[b]_{-j}^{-1}(a)| as well, from which it follows that |[a]_{j}^{-1}(b)|=|[b]_{-j}^{-1}(a)|.

The above argument actually gives us an explicit bijection between the sets [a]_j^{-1}(b) and [b]_{-j}^{-1}(a) in terms of the map \tau^j. In fact, the converse also holds: given any bijection from [a]_j^{-1}(b) to [b]_{-j}^{-1}(a), we can use it to construct the restriction of \tau^j to \{a\}\times [a]_{j}^{-1}(b). And since all the \{a\}\times [a]_{j}^{-1}(b) form a pairwise disjoint cover of the set A\times B, there are no compatibility issues to contend with: once you have ensured that |[a]_{j}^{-1}(b)|=|[b]_{-j}^{-1}(a)| for all a\in A and all b\in B, you can pick arbitrary bijections between them and get a sensible \tau^j out. So, short of the group theoretic obstructions I have already mentioned above, the constraints so far are, in a sense, maximal.

With the essence of time-reversible dynamics thus distilled into pithy equations relating the \sim^A_{j} and \sim^B_{-j} partitions, we can do something nice. Suppose I had a function g_j defined on the \sim^A_j equivalence classes and wished to see how the average value of this function over a subset A\times \{b\}\subseteq A\times B, denoted \langle g_j\rangle^b_{0}, changed when the subset in question was acted upon by \tau^{-j}; then I could determine this by summing g_j\left([a]_j\right):=g_j(a) over all states a\in A but with the weights |[b]_{-j}^{-1}(a)| instead of unity:

\displaystyle \langle g_j\rangle^b_{-j}=\frac{1}{|A|} \sum_{a\in A} |[b]_{-j}^{-1}(a)|g_j(a).

Since |[a]_{j}^{-1}(b)|=|[b]_{-j}^{-1}(a)|, the above can be rewritten as

\displaystyle \langle g_j\rangle^b_{-j}=\frac{1}{|A|} \sum_{a\in A} |[a]_{j}^{-1}(b)|g_j(a).

Now, if the \sim_j^A equivalence classes are prelabelled by maps f on B and |f|_j denotes the number of elements a assigned to the \sim_j^A equivalence class labelled f, then this can still be massaged further to yield

\displaystyle \langle g_j\rangle^b_{-j}=\frac{1}{|A|} \sum_{f\in B^B} |f^{-1}(b)||f|_jg_j(f).

In the above, I have taken the liberty to write g_j\left([a]_j\right) as g_j(f) with the subscript j in g_j indicating that f is to be to be interpreted as the label of a \sim_j^A equivalence class. As a result, \langle g_j\rangle^b_{-j} has been expressed entirely in terms of averaging over all the maps on B without any reference to the partitioning of B whatsoever. This is good news because in order to be making contact with the blueprint laid out in the previous section, I’m going to be thinking of B as an observer (and [a]_1 as the induced macrostates), which means it would be necessary to have everything only in terms of what B has access to.

But before I can do that, I need to imbue B with that attribute of qualia that is the key to understanding irreversibility: memory, or as my friend Ronak prefers to call it, “Funesness.”


All creation is subtraction. Michelangelo chiselled away the marble to set his angels free, Shakespeare pruned the ramblings of monkeys raging away at typewriters to impart breath to Hamlet, Borges razed down aisles of the Library of Babel to whelp his cornucopia of immediate irrealities, and now it is our turn.

Funesness is as much an attribute of entire dynamical systems as artistic or literary merit is of blocks of marble or folios of gibberish. Rather, to speak of Funesness is to speak of individual states which fulfil certain conditions abstracting the essential features of human memory from messy complications like its ability to retrospectively fabricate experiences on cue. One man’s meat is another’s bare bones, so there is little I can offer by way of an a priori argument for why I think it is the following two features that we must strive to capture and make precise: (a) a Funes state of an observer exhibits a certain degree of (anti-)concurrence with the immediately preceding input, and (b) the concurrence is higher when the input in question has a lower probability of occurrence. But if I had to try, I would say that (a) has to do with the fact that we remember the past and not the future, while (b) has to do with the fact that we recall bizarre and surprising events with greater ease than run-of-the-mill ones.

Condition (a) may be implemented by requiring that the marginal probability that the immediately preceding input is f, denoted p(f), generally differs from the conditional probability that the immediately preceding input is f given that the current state of the observer is b, denoted p(f|b). Meanwhile, condition (b) may be implemented by requiring that p(f|b)/p(f) \leq p(f'|b)/p(f') whenever p(f) \geq p(f'). Note that p(f|b)/p(f) may be regarded as a measure of how dependent two events f and b are on each other.

As my notational choices may suggest, I am interpreting the \sim_1^A equivalence classes as the inputs and B as the observer. This makes sense because as long as it is only the immediately preceding state of A that is concerned, all B is sensitive to is the \sim_1^A equivalence class that the state is in. The sample space implicit in the assignment of probabilities is the set of consecutive pairs of states, which is a subset of (A\times B)^2. So, p(f) is |f|_1/|A|, p(b) is 1/|B|, and as we found out in the previous section, p(f|b) is |f^{-1}(b)||f|_1/|A|. Condition (a) becomes the requirement that |f^{-1}(b)| generally differs from unity while condition (b) becomes the requirement that |f^{-1}(b)|\leq |f'^{-1}(b)| whenever |f|_1 \geq |f'|_1.

To investigate the consequences of these requirements, let’s begin with the observation that

\displaystyle \sum_{f\in B^B} |f^{-1}(b)||f|_1=\sum_{f\in B^B} |f|_1=|A|.

This follows from the fact that both p(f) and p(f|b) yield unity upon being summed over all f\in B^B. A little rearranging yields

\displaystyle \sum_{f\in B^B} \left(|f^{-1}(b)|-1\right)|f|_1=0.

It may well be that all the summands in this sum individually vanish, but that’s the boring case, which we have already taken care to exclude from our definition of Funesness. In general, there will be summands that are negative, summands that vanish, and summands that are positive. Let B^B_- be the set of all f\in B^B for which |f^{-1}(b)|<1, let B^B_+ be the set of all f\in B^B for which |f^{-1}(b)|>1, and let K\geq 0 be given by

\displaystyle K=\sum_{f\in B^B_-} \left(1-|f^{-1}(b)|\right)|f|_1=\sum_{f\in B^B_+} \left(|f^{-1}(b)|-1\right)|f|_1.

Furthermore, let’s choose f_-\in B^B_- and f_+\in B^B_+ so that |f_-|_1 is the minimum of |f|_1 over all f\in B^B_- and |f_+|_1 is the maximum of |f|_1 over all f\in B^B_+. Therefore, as the logarithm is a monotonically increasing function of its (positive) argument, it follows that

\displaystyle \sum_{f\in B^B_-} \left(1-|f^{-1}(b)|\right)|f|_1\log|f|_1\geq K\log|f_-|_1,
\displaystyle \sum_{f\in B^B_+} \left(|f^{-1}(b)|-1\right)|f|_1\log|f|_1\leq K\log|f_+|_1.

Now, by definition of a Funes state, |f_-^{-1}(b)|<1<|f_+^{-1}(b)| implies that |f_-|_1> |f_+|_1. Therefore, the two inequalities above may be combined to yield

\displaystyle \sum_{f\in B^B} \left(|f^{-1}(b)|-1\right)|f|_1\log|f|_1\leq K\log\frac{|f_+|_1}{|f_-|_1}<0.

This may be restated as \langle S_1\rangle^b_{0}>\langle S_1\rangle^b_{-1}, where S_1 is the function that sends f\in B^B to \log|f|_1. There is simply not enough structure in our toy universe to admit meaningful discussion of things such as the subjective conscious experience we are accustomed to, but as long as the cognitive architecture enabling it is sculpted out of Funes states, the macrostates that the resulting cognitive model is privy to will be precisely the \sim_1^A equivalence classes. Per Boltzmann’s prescription, the logarithm of the size of a macrostate is the physical entropy associated with a particular (micro)state within the aforementioned macrostate. In other words, given that the current state of the observer is Funes, the entropy of the environment must have increased on an average.


The second law of thermodynamics enjoys a peculiar sort of epistemic privilege over the rest of physical laws. There is overwhelming consensus that there is no fundamental mechanism behind the second law, unlike the case of conservation laws, for instance. Yet, it is the only thing that we are really, truly, absolutely certain of—even conservation laws are known to fail in presence of anomalies. It is tempting to chalk this uncharacteristic certainty up to our characteristic cynicism—the Universe may be lovely, dark and deep, but it still won’t let us get away with free lunches—but if there is one thing that I would like you to take away from this post, it is that the second law should be thought of in the same vein as Descartes’ dictum, “I think, therefore I am.” You may espouse radical skepticism and begin doubting the existence of everything around you, but you cannot doubt that of your own, because if you didn’t exist, you wouldn’t be there to doubt the existence of anything. Likewise, you might doubt everything we know about physics, but you can’t doubt the second law of thermodynamics because if it weren’t true, you wouldn’t have the perception of time required to even formulate your doubt.

This idea, that our perception of time and increase in entropy are intimately connected, is not new; indeed, it goes back to Boltzmann himself. At the time, the Big Bang and cosmological expansion were unheard of, and so as far as Boltzmann was aware, the Universe had been around forever. It followed that everything that could happen would have already happened and the net entropy of the Universe would have been already maximised, a conclusion that seemed inconsistent with the fact the we see entropy increasing all the time. Boltzmann resolved this by invoking an anthropic argument: since the Universe has been there forever, even highly improbable statistical fluctuations eventually occur in some corner causing a momentary decrease in entropy there, and since our very existence relies upon the progression from order to disorder, we could only survive during these fluctuations. (Of course, people in the opening leg of the fluctuation would perceive time running backward, while those in the closing leg would perceive it running forward.)

At this point, it may seem that my critique about how Boltzmann didn’t care about observers was a tad misplaced, but hear me out. As several physicists from the post-Hubble period pointed out, if a local entropy fluctuation was improbable, a global one would be even more so. The former is all that is requisite for our existence, so there is no anthropic justification for why all of the observable universe seems to have fantastically low entropy. Of course, cosmologists today seek recourse in the peculiarity of the initial conditions of the Universe, but I disagree that this plays an important role in the resolution of this paradox. What does play a role, in my opinion, is the fact that the partition of the Universe into macrostates upon which the entropy crucially depends is not God-given but determined by the way we are coupled to the rest of the Universe. In other words, the second law of thermodynamics appears to hold throughout the observable universe because we are simply incapable of coupling to an observable with respect to which the net entropy decreases.

Or are we? Just as the Funes conditions concerned coincidences between states of B and the \sim_1^A equivalence classes to which the immediately preceding states of A belonged, we may talk of “Senuf” conditions which concern coincidences between states of B and the \sim_{-1}^A equivalence classes to which the immediately following states of A belong (my sincere apologies to Segrob). As I’ve mentioned above, it’s not possible to talk of the relationship between a mind and its mental states without adding any further structure (and possibly upgrading to a nondiscrete model since mental processes typically seem to involve an interplay of various relaxation timescales), yet there is a rough sense in which we might say that the Funes states “perceive” time as flowing in the same direction as that we have arbitrarily assigned to our toy universe, while Senuf states “perceive” it flowing in the opposite direction. But there isn’t anything precluding a state of B from being both Funes and Senuf at the same time! So, why does being able to remember both the past as well as the future seem so patently absurd?

There are two answers that I can think of. The boring one is that the Funes and Senuf conditions are pretty restrictive, so while one or the other may be explained away using an anthropic argument, both being satisfied together is statistically unlikely despite selection bias. The more interesting answer has to do with the limitations of our imagination when it comes to matters of consciousness and requires a slight digression first.

Recurrent epileptic seizures, caused by abnormal neural activity wreaking havoc through the brain, can be so incredibly disruptive to normal day-to-day functioning that those afflicted thus are actually willing to have pieces of their brain cut out so that the seizures may be contained within the region of origin. And indeed, as a last resort after pharmocological interventions have proved fruitless, doctors turn to corpus callosotomy, partial or complete removal of the corpus callosum, as a solution. The corpus callosum is a bundle of neural fibres that constitute the only connection between the two hemispheres of the brain. Its removal keeps seizures contained within a hemisphere but also inhibits communication between the two hemispheres. As a result, anyone undergoing this procedure is rendered split-brain.

Functional responsibilities aren’t distributed homogeneously across the two hemispheres. The right hemisphere controls the left side of the body and is dominant in tasks involving spatial reasoning among other things, while the left hemisphere controls the right side of the body and is dominant in tasks involving verbal reasoning among other things. So, if a split-brain subject were to be seated in front of a screen and the image of an object flashed on its left half so that only the right hemisphere had access to the visual stimulus, then the subject would be able to make an illustration of the object with their left hand but still (truthfully) say that they observed nothing. (This is an actual experiment, by the way.) Hence, it’s not only the brain that is split, but the very sense of self as well.

The neural activity underlying the divided subjective conscious experience is present in a person with an intact corpus callosum as well but as a result of the communication between the two hemispheres it permits, their brain is able to fashion these separate selves into a coherent whole. I won’t say that this means that the self is an illusion since it is exactly as real as everything else we experience, but we’ll have to admit that the brain does a hell lot of editing behind the scenes.

The trajectory of Funes mental processes in our brains may possibly intersect that of a Senuf mental process which may even be acorporeal as far as we can tell, but our brain would simply work overtime to keep the usual show running. So I guess what I am trying to say is that you might be sharing your brain with a Benjamin Button at this very moment and yet have absolutely no inkling of it whatsoever.

Thanks to Sushrut Thorat, Sankeerth Rao and Avradeep Bhowmik for help with resources, and to Ronak M. Soni for discussion. This acknowledgment however does not necessarily mean that Ronak agrees with everything above.