# The memorability heuristic and symplectic reductions

The other day, Florian mentioned to me that a heuristic he frequently employed to check whether some definition or construction was the right way forward was to see if it could be easily reconstructed from memory. This is an incredibly powerful general-purpose heuristic, one which I myself have extensively used without explicitly realising what I am doing (as have many mathematicians, I’m certain). Of course, heuristics are not iron-clad propositions, and sometimes in order to get to our destination, we do have to get our hands dirty and make nasty complicated arguments. But the wonderful thing about the memorability heuristic is that even when you do have to make such arguments, it is often possible to find a formulation which is far easier to contain in your mind. As such, the memorability heuristic is not only useful to see if a concept you are introducing is “natural” but also to test whether you have really understood an existing concept, which is to say, whether you are able to see its “naturality.”

For instance, the implicit function theorem in multivariable calculus is notorious enough for being hard to get right that it is the go-to bow that applicants to faculty positions have to string in order to demonstrate their pedagogical chops (or so my co-advisor Bernd Siebert tells me). But one of the early rewards of introducing the idea of a manifold is that it allows you to repackage the theorem into the statement that if you are given a smooth map between two manifolds then the preimage of any regular value in the target manifold is a submanifold of the domain manifold. In fact, it would be rather fair to say that the reason we care about the low-brow formulation of the implicit function theorem is so that we can have nice clean result like the one above. And it is only when one has seen this version of the theorem that one can claim to have truly understood it.

Florian’s remark was prompted by my annoyance at having to repeatedly look up the definition of a symplectic reduction, which suggested two things at once: (a) that I had to look it up repeatedly suggested a certain lack of clarity as to why it is defined the way it is—more precisely, it wasn’t clear why moment maps had to enter the picture at all—and (b) that I had to look it up repeatedly suggested that this lack of clarity wasn’t due to a corresponding lack of naturality but due to a fault in my own understanding. Indeed, this has been the case and now I believe that I have figured it out to my satisfaction, so you all can now scarper off, good day to you and goodbye!

Just kidding, just kidding, don’t go away, here’s the reason why a symplectic reduction has to involve moment maps. In §1, I recall the basic notions I am going to be talking about, i.e. group actions of symplectic manifolds, moment maps, and the definition of a symplectic reduction; in §2, I illustrate these with examples; in §3, I revisit the notion of symplectic reductions and note that the reason they appear opaque is that our choice of category for symplectic manifolds is bad; and finally, in §4, I fix this by constructing the right category for symplectic manifolds. It goes without saying, none of this is original to me.

1. Symplectic manifolds, group actions, and moment maps

Symplectic manifolds $(M,\omega)$ are (smooth) manifolds $M$ equipped with a closed nondegenerate $2$-form $\omega$ called the symplectic form. Note that when $M$ has finite dimension, this forces it to be even. A symplectic map $f: (M, \omega)\rightarrow (N,\eta)$ between two symplectic manifolds $(M,\omega)$ and $(N,\eta)$ is a smooth map $f:M\rightarrow N$ such that $(Tf)^*\eta = \omega$. When the underlying smooth map is a diffeomorphism, the symplectic map is said to be a symplectomorphism. Symplectomorphisms are isomorphisms in the naive category of symplectic manifolds (whose objects are symplectic manifolds and whose morphisms are symplectic maps).

The set of symplectomophisms from a symplectic manifold $(M,\omega)$ to itself forms a Lie group $\mathrm{Symp}(M,\omega)$ under composition, and its associated Lie algebra $\mathfrak{symp}(M,\omega)$ is the Lie algebra of symplectic vector fields, defined to be vector fields along which the Lie derivative of $\omega$ vanishes, with the usual vector field commutator being the Lie bracket. Given a Lie group homomorphism $\rho:G\rightarrow \mathrm{Symp}(M,\omega)$, the Lie group $G$ is said to act on $(M,\omega)$ via the action $\rho$. This induces a Lie algebra homomorphism $T\rho:\mathfrak g\rightarrow \mathfrak{symp}(M,\omega)$ between the associated Lie algebras.

Let us, for sake of notational convenience, denote $T\rho(v)$ as $X_v$, where $v\in \mathfrak g$. Now, for any vector field $X$, Cartan’s magic formula tells us:

$\displaystyle \mathscr L_X\omega = \mathrm d\circ\iota_ X\omega + \iota_X\circ\mathrm d\omega = \mathrm d\circ\iota_X\omega.$

So for symplectic vector fields $X_v$ in particular, we have that $\omega(X_v,\cdot)$ is closed. If it is furthermore exact, the vector field $X_v$ is said to be Hamiltonian; if for all $v\in\mathfrak g$, the associated vector fields $X_v$ are Hamiltonian, the action $\rho$ is said to be Hamiltonian. The reason for this terminology is that in physics symplectic manifolds represent the phase space of a dynamical system and so admit a natural action of the Lie group $\mathbb R$ via time evolution. There is only one generator $v = \partial_t$ in this case ($t$ being time) and $\omega( X_v,\cdot)$ turns out to be $\mathrm dH$ where $H$ is the Hamiltonian.

The Hamiltonian function $H$ in the case of time evolution may be generalised to any Hamiltonian group action $\rho:G\rightarrow \mathrm{Symp}(M,\omega)$. In the general case we have a smooth map $\mu:M\rightarrow \mathfrak g^*$ called a moment map, which linearly assigns to every $v\in \mathfrak g$, a smooth function $\mu_v:M\rightarrow \mathbb R$ satisfying $\omega( X_v,\cdot)=\mathrm d\mu_v$. We shall see in the next section that things like linear and angular momentum can be regarded as special instances of the moment map construction.

For a sufficiently “nice” value $m\in \mu(M)$ (such as a regular value), the preimage $\mu^{-1}(m)$ is a submanifold of $M$. Assuming that the group $G$ is connected, the preimage $\mu^{-1}(m)$ is in fact fixed by the action of $G$. (The more general case where $G$ has multiple connected components may be handled by replacing $G$ by its identity component in the subsequent discussion.) In particular, this means that it makes sense to take the (topological) quotient $M/\mkern-6mu/G :=\mu^{-1}(m)/G$. Now $M/\mkern-6mu/G$ may not be a manifold and if you hope to do string theory, this is something you just have to live with. But in order to avoid a long digression into developing the theory of orbifolds, I am instead going to assume that the action of $G$ restricted to $\mu^{-1}(m)$ has only one orbit type and so $M/\mkern-6mu/G$ is an honest-to-goodness manifold.

In fact, more happens to be true: the manifold $M/\mkern-6mu/G$ inherits a symplectic structure from $M$. More precisely, if $\iota: \mu^{-1}(m)\hookrightarrow M$ is the inclusion and $q: \mu^{-1}(m)\twoheadrightarrow M/\mkern-6mu/G$ the canonical quotient, then it turns out that there is a unique $2$-form $\omega'$ such that $(T\iota)^*\omega = (Tq)^*\omega'$. This $2$-form is in fact a symplectic form, and so $(M/\mkern-6mu/G,\omega')$ is a symplectic manifold. This is said to be a symplectic reduction. I say a symplectic reduction, because it’s actually sensitive to the choice of the regular value $m$ (or equivalently, the choice of the moment map $\mu$, which can always be shifted by a constant). In fact, the (smooth) topology of the submanifold $\mu^{-1}(m)$ (and consequently, the quotient $\mu^{-1}(m)/G$) can jump as we vary $m$ through a critical value, since the gradient flow induced by $\mu$ ceases to be a diffeomorphism in that case.

Now let’s get our hands on some examples.

2. Cotangent bundles

Cotangent bundles are an important class of examples of manifolds with a natural symplectic structure associated with them. The story outlined in the previous section happens to play out particularly nicely for cotangent bundles, so they’re an obvious choice of starting point for gaining an understanding into what is actually going on in the above construction.

Given the cotangent bundle $\pi:T^*K\rightarrow K$ over a manifold $K$, we can define the Liouville $1$-form $\lambda$ on $T^*K$ as follows. For a point $x\in K$ and a cotangent vector $p\in T^*_xK$, we define $\lambda_{(x,p)}\in T^*_{(x,p)}(T^*K)$ to be the pullback $(T\pi)^* p:=p\circ T\pi$. Then, we have the following claim.

Claim. The $2$-form $\omega := -\mathrm d\lambda$ is a symplectic form on $T^*K$. □

Proof. That it is closed is obvious. To show that it is also nondegenerate takes a little more work. Observe that for any vector bundle $\varpi: E \rightarrow K$ we have the following short exact sequence in the category of vector bundles over $E$:

$\displaystyle 0\longrightarrow T^{\mathrm{v}}E \longrightarrow TE \stackrel{T\varpi}{\longrightarrow} \varpi^*TK \longrightarrow 0.$

Now, the fibres of the vertical bundle $T^{\mathrm{v}}E$ consist of tangent vectors to the fibres of $E$, which themselves are vector spaces. Since the tangent spaces of a vector space may be canonically identified with the vector space itself, there is a canonical identification $T^{\mathrm{v}}E \cong \varpi^* E$. Thus, we may write:

$\displaystyle 0\longrightarrow \varpi^*E \longrightarrow TE \stackrel{T\varpi}{\longrightarrow} \varpi^*TK \longrightarrow 0.$

Next, we make use of a lemma in homological algebra, that says that if the epimorphism in a short exact sequence has a right-inverse, then the exact sequence splits. Any section $s: K\rightarrow E$ gives a right-inverse $Ts$ to $T\varpi$ and as a consequence, induces a splitting:

$\displaystyle TE \cong \varpi^*E\oplus \varpi^*TK \cong \varpi^*(TK\oplus E).$

But [EDIT: There’s a mistake here. Will fix later.] there is a canonical sectionthe zero sectionand therefore, a canonical splitting as well. Specialising to the case of the cotangent bundle $\pi:T^*K\rightarrow K$ therefore gives us a canonical identification $T(T^*K) \cong \pi^*(TK\oplus T^*K)$. It is then just a matter of following definitions to see that for any $\underline X, \underline Y\in\Gamma(TK)$ and $\underline\xi,\underline\psi\in\Gamma(T^*K)$, we have via the above canonical identification:

$\displaystyle \lambda_{(x,p)}(\underline X_x\oplus\underline \xi_x)=p(\underline X_x),\quad\omega(\underline X\oplus\underline \xi, \underline Y\oplus\underline \psi)=\underline \psi(\underline X)-\underline \xi(\underline Y).$

As we can locally guarantee that for any nonzero vector there is a covector whose evaluation on the vector is nonzero (and vice versa), it follows that $\omega$ is indeed nondegenerate and that $(T^*K,\omega)$ is a symplectic manifold. ■

Since the symplectic structure on $(T^*K,\omega)$ was induced by nothing more than the smooth structure on $K$, it is reasonable to guess that diffeomorphisms of $K$ induce symplectomorphisms of $T^*K$. Actually, we can wring out a stronger result from this.

Claim. Let $\underline{f}:K\rightarrow K'$ be a diffeomorphism and let $f:T^*K\rightarrow T^*K'$ be the diffeomorphism it induces on the cotangent bundles. Then, $(T f)^*\lambda' = \lambda$, where $\lambda$ (respectively $\lambda'$) is the Liouville form on $T^*K$ (respectively $T^*K'$). □

Proof. Concretely, $f$ is given by the following map:

$\displaystyle f(x,p) = (\underline{f}(x), (T\underline{f})^{*-1}p)=: (x',p').$

The induced tangent action $T f$ is therefore given by:

$\displaystyle T f(\underline X\oplus \underline \xi) = T\underline{f}(\underline X)\oplus (T\underline{f})^{*-1}(\underline \xi) =: \underline X'\oplus \underline \xi'.$

A straighforward substitution therefore gives us:

$\displaystyle (T f)^*\lambda'_{(x',p')}(\underline X_x\oplus\underline \xi_x) = p'\circ T\pi\circ Tf(\underline X_x\oplus \underline \xi_x) = ((T\underline{f})^{*-1}p)\circ T\pi(T\underline{f}(\underline X_x)\oplus (T\underline{f})^{*-1}(\underline \xi_x)) = p\circ (T\underline{f})^{-1}\circ T\underline{f}(\underline X_x) = p(\underline X_x) = \lambda_{(x,p)}(\underline X_x\oplus\underline \xi_x).$

In particular, $f$ is a symplectomorphism. This may be shown either via the naturality of the exterior derivative or by direct substitution into the expression for the symplectic form on a cotangent bundle that we derived above. ■

The above turns out to have a neat consequence.

Claim. Let $\underline{\rho}: G\rightarrow \mathrm{Diff}(K)$ be a group action on $K$. Then the induced group action $\rho:G\rightarrow\mathrm{Symp}(T^*K,\omega)$ is Hamiltonian. □

Proof. Our previous result implies that the image of $\rho$ consists of symplecomorphisms that preserve not only the symplectic form, but the Liouville form as well. So, the image of the differential $T\rho$ consists of symplectic vector fields $X$ such that $\mathscr L_{X}\lambda =0$. Cartan’s formula then tells us that:

$\displaystyle \iota_{X} \omega = -\iota_{X}\circ \mathrm d\lambda = -\mathscr L_{X}\lambda+ \mathrm d\circ\iota_{X}\lambda = \mathrm d(\lambda(X)).$

So, $\iota_{X} \omega$ is exact and the action $\rho$ is Hamiltonian. ■

The above proof suggests a natural choice for the moment map $\mu:T^*K\rightarrow \mathfrak g^*$, namely:

$\displaystyle (x,p) \mapsto \lambda_x( T\rho(\cdot)_{(x,p)}) = p(T\underline{\rho}(\cdot)_x).$

Now, we want to allow for nontrivial stabilisers at every point (but require that they vary smoothly and isomorphically). Unforunately, this means that there are no regular values in the image of $\mu$ since the image $\mu(\pi^{-1}(x))$ of a fibre $\pi^{-1}(x)$ of the cotangent bundle would be $\mathfrak{stab}(x)^\perp$ i.e. the space of elements of $\mathfrak g^*$ which vanish on the Lie subalgebra $\mathfrak{stab}(x)$ corresponding to the stabiliser $\mathrm{Stab}(x)^\perp$ of $x\in K$ under the $\underline{\rho}$ action. But the situation can be salvaged. The assignment of the fibres $\mathfrak{stab}(x)^\perp$ to $x$ gives us a (not necessarily trivial) vector bundle over $K$. In some local patch of $K$ we may then choose some local trivialisation and use that trivialisation to think of local sections as maps from $K$ to the typical fibre $\mathfrak h^\perp$, where $\mathfrak h$ is (noncanonically) isomorphic to the stabilisers $\mathfrak{stab}(x)$ which are all assumed to be isomorphic. These local maps are submersions (check this!) and so the preimage of any point in $\mathfrak h^\perp$ is a submanifold of $T^*K$. In order to be able to patch all these together into a global manifold, we look for a global section that corresponds to an image point in $\mathfrak g^*$ independently of the choice of local trivialisations. The canonical choice is the zero section. And so it follows that even though $0\in \mathfrak g^*$ may not be regular, $\mu^{-1}(0)$ is nevertheless a manifold.

Claim. The reduction $\mu^{-1}(0)/G$ is canonically symplectomorphic to $T^*(K/G)$. □

Proof. We shall first show that they are indeed diffeomorphic. Let $\underline q: K \rightarrow K/G$ be the canonical quotient map thought of as a fibre bundle. Since the fibres of $\underline q: K \rightarrow K/G$ are the orbits of the $G$-action $\underline \rho$, the vertical subbundle of $TK$, which is to say the subbundle of vectors tangent to the fibres, is precisely the subbundle spanned by vector fields generated by the $G$-action. As a result, we see that $\mu^{-1}(0)$ is precisely the horizontal subbundle of $T^*K$, which is to say the subbundle of covectors which vanish on the verticle subbundle. It is general fact about any fibre bundle that the horizontal covectors are precisely the covectors that are obtained by pulling back covectors on the base. So, we may write $\mu^{-1}(0)$ as follows:

$\displaystyle \mu^{-1}(0)=\{(x,p'\circ T\underline q)|x\in K, p'\in T_{\underline q(x)}^*(K/G)\}.$

Now let $\underline f$ be a diffeomorphism in the image of $\underline\rho$. Then, since it sends a fibre to itself, we have $\underline q\circ\underline f^{-1} = \underline q$. In particular, the pushforwards $T\underline q$ and $(T\underline f)^{-1}$ satisfy $T\underline q\circ (T \underline f)^{-1}=T\underline q$. Therefore the symplectomorphism $f$ induced on $\mu^{-1}(0)\subseteq T^*K$ by the diffeomorphism $\underline f: K\rightarrow K$ is given by:

$\displaystyle f(x,p'\circ T\underline q)= (\underline f(x), p'\circ T\underline q\circ (T\underline f)^{-1}) = (\underline f(x), p'\circ T\underline q).$

Thus, the quotient $\mu^{-1}(0)/G$ may be (canonically) identified with $T^*(K/G)$ at least at the level of manifolds via the map $q:\mu^{-1}(0)\rightarrow T^*(K/G)$ which is given by $(x,p'\circ T\underline q)\mapsto (\underline q(x),p')$. All that remains to be shown is that this diffeomorphism is actually a symplectomorphism with respect to the symplectic structures we have defined on them. To this end, it will suffice to show that $(T\iota)^*\lambda = (Tq)^*\lambda'$, where $\iota:\mu^{-1}(0)\hookrightarrow T^*K$ is the inclusion, and $\lambda$ and $\lambda'$ are the Liouville forms of $T^*K$ and $T^*(K/G)$ respectively.

We follow the definitions. The Lioville forms associated with the cotangent bundles $\pi: T^*K\rightarrow K$ and $\pi': T^*(K/G)\rightarrow K/G$ are given by $\lambda_{(x,p)} = p\circ T\pi$ and $\lambda'_{(x',p')} = p'\circ T\pi'$ respectively. Then $(T\iota)^*\lambda$ and $(Tq)^*\lambda'$ are given by:

$\displaystyle ((T\iota)^*\lambda)_{(x,p'\circ T\underline q)} = p'\circ T\underline q \circ T\pi,\quad ((Tq)^*\lambda')_{(\underline q(x),p')} = p'\circ \pi'\circ Tq.$

But note that by the definition of $q$, we have $\underline q \circ \pi = \pi' \circ q$. So, $(T\iota)^*\lambda = (Tq)^*\lambda'$ indeed. ■

So we see that, symplectic reductions are indeed something natural! A few concrete illustrations of the above generalities before we go on to try and make sense of this:

Example. Let $K$ be a vector space $V$ and $G$ be a subspace $W$ that acts on $V$ via the action $\underline{\rho}(w)(x) = x + w$ where $w\in W$ and $x\in V$. The cotangent bundle $T^*V$ is the product $V\times V^*$ and the symplectomorphism induced on it is $\rho(w)(x, p) = (x + w,p)$. The Lie algebra $\mathfrak g$ associated to $W$ is $W$ itself and the tangent bundles of $V$ and $T^*V$ may be identified with $V\times V$ and $T^*V \times (V\oplus V^*)$ respectively. So the associated vector fields $\underline X_w: V \rightarrow TV$ and $X_w: T^*V\rightarrow T(T^*V)$ are given by $x\mapsto (x,w)$ and $(x,p) \mapsto ((x,p),w\oplus 0)$ respectively. The moment map is meanwile given by $\mu_w(x,p) = p(w)$. In physical terms, this is the linear momentum in the direction $w$, just as we expect. The preimage $\mu^{-1}(0)$ is a subbundle of $T^*V\cong V\times V^*$ given by $V\times W^\perp$, where $W^\perp$ is the subspace of $V^*$ consisting of covectors which vanish on $W$. Note that this may be canonically identified with $(V/W)^*$, so quotienting out the (symplectic) $\rho$ action (which, as noted above, doesn’t do anything within the fibres) gives us $(V/W)\times (V/W)^*$. This is the cotangent bundle $T^*(V/W)$. □

Example. Now equip the vector space $V$ with an inner product $g$ and let $K$ be $V_{\setminus 0} := V\setminus \{0\}$ and $G$ be $\mathrm{SO}(V,g)$ with $\underline{\rho}$ the standard action on $V_{\setminus 0}$. The cotangent bundle $T^*V_{\setminus 0}$ is the product $V_{\setminus 0}\times V^*$ and the symplectomorphism induced on it by $F\in \mathrm{SO}(V,g)$ is as follows:

$\displaystyle \rho(F)(x, p) = (Fx,F^{*-1}p) = (Fx, (Fp^\sharp)^\flat),$

where $\sharp$ and $\flat$ denote the musical isomorphisms with respect to $g$. The Lie algebra $\mathfrak g$ associated to $\mathrm{SO}(V,g)$ is the Lie algebra of skew adjoint endomorphisms $V\wedge_g V^*$. This is generated by elements of the form $v\wedge \xi:= v\otimes \xi -\xi^\sharp\otimes v^\flat$, where $v\in V$, $\xi\in V^*$. The tangent bundles of $V_{\setminus 0}$ and $T^*V_{\setminus 0}$ may be identified with $V_{\setminus 0}\times V$ and $T^*V_{\setminus 0}\times (V\oplus V^*)$ respectively. The vector fields $\underline X_{ v\wedge \xi}: V_{\setminus 0}\rightarrow TV_{\setminus 0}$ and $X_{v\wedge \xi}: T^*V_{\setminus 0}\rightarrow T( T^*V_{\setminus 0})$ are given by:

$\displaystyle \underline X_{ v\wedge \xi}(x) = (x,\xi(x)v - g(v,x)\xi^\sharp),$
$\displaystyle X_{ v\wedge \xi}(x,p) = ((\xi(x)v - g(v,x)\xi^\sharp)\oplus ((x,p),g(\xi, p)v^\flat - p(v)\xi)).$

The moment map is meanwhile given by:

$\displaystyle \mu_{ v\wedge \xi}(x,p) = p(\xi(x)v - g(v,x)\xi^\sharp) = \xi(x)p(v) - g(v,x)g(\xi, p) = g(x \wedge p, v \wedge \xi).$

In physical terms, this is the angular momentum in the plane spanned by the vectors $v$ and $\xi^\sharp$, again just as we expect. The preimage $\mu^{-1}(0)$ is a subbundle of $T^*V_{\setminus 0}\cong V_{\setminus 0}\times V^*$ whose points $(x,p)$ satisfy the equation $x\wedge p = 0$. But this means that $p$ is parallel to $x^\flat$, which is to say there exists a real number $r$ such that $p = rx^\flat$. In other words, $\mu^{-1}(0) = \{(x,rx^\flat)|x\in V_{\setminus 0},r\in\mathbb R\}$. Quotienting out the action of $\mathrm{SO}(V,g)$ gives us $\mathbb R_{>0}\times \mathbb R$ via the map $(x,rx^\flat)\mapsto (|x|,r|x|)$. But this may be identified with the cotangent bundle $T^*(V_{\setminus 0}/\mathrm{SO}(V,g))$. □

Example. We let $K=V_{\setminus 0}$ again but this time without the inner product. Let $G$ be $\mathbb R_{>0}$ with the action $\underline \rho$ given by $\underline\rho(a)(x) = ax$ where $a\in \mathbb R_{>0}$ and $v \in V$. The induced action on $T^*V_{\setminus 0}$ is therefore $\rho(a)(x,p) = (ax,a^{-1}p)$. The Lie algebra $\mathfrak g$ associated to $\mathbb R_{>0}$ is $\mathbb R$ and for $r\in \mathbb R$, the associated vector fields $\underline X_r: V_{\setminus 0}\rightarrow TV_{\setminus 0}$ and $X_r: T^*V_{\setminus 0}\rightarrow T(T^*V_{\setminus 0})$ are given by $x\mapsto (x,rx)$ and $(x,p)\mapsto ((x,p), (rx) \oplus (-rp)$. The moment map is hence given by $\mu_r(x,p) = rp(x)$ and the preimage is the subbundle $\mu^{-1}(0) = \{(x,p)|p(x) = 0\}$ whose quotient by the the $\mathbb R_{>0}$ action may be identified as the cotangent bundle $T^*\mathbb P(V) = T^*(V_{\setminus 0}/\mathbb R_{>0})$. □

3. Friendship ended with symplectic maps…

Now that we have seen that symplectic reductions are actually doing something nice, let’s get back to the question we started off with. Why are symplectic reductions defined the way they are? It’s clear that they are performing some kind of quotient on a symplectic manifold, but why do we have to first define moment maps and take the quotient of a level set of this map instead of just going ahead and taking the quotient directly?

The problem is that symplectic maps, despite seeming very natural, are a very bad choice of morphisms for symplectic manifolds. Let’s unpack this assertion.

Say we try to take the quotient of a symplectic manifold $(M,\omega)$ directly with the canonical quotient map $q: M \rightarrow M/G$ being a symplectic map with respect to some symplectic form $\omega'$ on $M/G$. Let $v\in \mathfrak g$ be such that $X_v$, the symplectic vector field on $M$ that it generates, does not identically vanish. Since $X_v$ is tangent to the orbits of the $G$-action on $M$, it must be in the kernel of  $Tq$. As a consequence, we have the following:

$\displaystyle \omega(X_v,\cdot) = (Tq)^*\omega'(X_v,\cdot) = \omega'(Tq(X_v),Tq(\cdot))=0$.

So, $\omega(X_v,\cdot)$ identically vanishes despite $X_v$ not identically vanishing. This is not supposed to happen since symplectic forms are nondegenerate by definition. In fact, we see from this argument that a proper submersion can never be a symplectic map and that the only way the quotient under a Lie group action be symplectic is if the action is locally trivial i.e. discrete.

How do we fix this? The first thing that comes to mind when one is confronted with an equation that cannot identically hold is to look for solutions to it. That is to say, we try to solve $\omega(X_v,Y)=0$ for vector fields $Y$. We have already seen in §1 that $\omega(X_v,\cdot)$ is closed. But this is the Frobenius integrability condition for the subbundle of vector fields $Y$ satisfying $\omega(X_v,Y)=0$. Locally, $\omega(X_v,\cdot)$ is the exterior derivative of some function $\mu_v$ (defined up to a constant of integration) and the vector fields $Y$ must be tangent to the level sets of $\mu_v$.

We need to find a subbundle of vector fields $Y$ such that the above holds for all $v\in\mathfrak g$. To ensure this, we choose a basis of generators $v_i$ and note that in every local patch if we make choices $\mu_{v_i}$, then for $v =\sum_i a_iv_i$, we have $\omega(X_v,\cdot) = \mathrm d\sum_i a_i\mu_{v_i}$. Thus we can always choose the $\mu_v$ such that the assignment $v\mapsto \mu_v$ is linear. This gives us a map from an open set of $M$ to $\mathfrak g^*$. When the group action is Hamiltonian, the open set may be taken to be the entire manifold $M$ itself. Thus, we see that the definition of a moment map automatically falls out like pulp in an overly ripe tomato, and that the only sensible quotient we can take is that of the level sets of the moment map since these are precisely the “submanifolds” (in quotes because they can have singularities) the tangent vector fields to which lie in the the kernel of $\omega(X_v,\cdot)$.

In fact, we get something more. We find that the above argument works just as fine for group actions which are not Hamiltonian. We just have to stitch the level sets in different patches together to get a “global level set.” Here’s an illustration.

Example. Consider again the first example in §2, namely the action of $W$ on $T^*V\cong V\times V^*$ but we make a small modification. Let $p_0\in V^*$ be such that there is at least one $w\in W$ satisfying $p_0(w)\neq 0$. Then we take our symplectic manifold to be the discrete quotient $M=V\times (V^*/\mathbb Z p_0)$ with the symplectic structure $\omega$ as inherited from that on $T^*V\cong V\times V^*$. If we let $X: T^*V\rightarrow T(T^*V)$ be the vector field $(x,p)\mapsto ((x,p), 0\oplus p_0)$, then the flow line associated to $X$ is a closed cycle and the integral of $\iota_{X_w}\omega$ along this cycle is $p_0(w)\neq 0$. So, the action is not Hamiltonian, yet the submanifold $\mu^{-1}(0)$ that we had considered earlier descends to this discrete quotient, thus allowing us to take its quotient under the action of $W$ and get a symplectic reduction. □

4. …now Lagrangian submanifolds are my best friend

Symplectic maps may have been a bad choice of morphism but the hope still is that if we could consruct an appropriate category of symplectic manifolds, then sympletic reductions might be described by some universal property. In particular, we would like to describe symplectic reductions as categorical quotients.

To recapitulate, given a group action $\rho:G\rightarrow \mathrm{End}(X)$ on an object $X$ in some category $C$ and another object $Y$, then there is an induced group action $\hat\rho_Y$ on $\mathrm{Hom}(X,Y)$ given by $\hat \rho_Y(g)(f) = f\circ\rho(g)$, where $g\in G$ and $f\in\mathrm{Hom}(X,Y)$. The fixed points of the induced action are said to be $G$-invariant. The categorical quotient of $X$ under a given $G$-action $\rho$ is then defined to be a $G$-invariant morphism $q:X\rightarrow X/\mkern-6mu/_CG$ such that any $G$-invariant morphism $f:X\rightarrow Y$ factors uniquely through $q$.

Let’s make note of some intermediate desiderata. Firstly, whatever is our new notion of morphism between symplectic manifolds, symplectomophisms have to be special cases of that.  Secondly, given that the the assignment of cotangent bundles to manifolds is functorial (with respect to the morphisms inherited from the category of star bundles, where maps on the bases go forward but maps on the fibres go backward) and that the categorical quotient of a manifold, when defined, is the usual quotient, we see that at least for cotangent bundles, the symplectic reduction is the categorical quotient. So, we expect morphisms between cotangent bundles to be realised as special cases of morphisms between symplectic manifolds.

Both these desiderata are satisfied by the following tentative notion of (pre-)morphisms introduced by Weinstein.

A Weinstein premorphism from a symplectic manifold $(M,\omega)$ to another symplectic manifold $(N,\eta)$ is a Langrangian submanifold of the symplectic manifold $T^*K\times (T^*L)^-:=(M\times N,(T\mathrm{pr}_M)^*\omega-(T\mathrm{pr}_N)^*\eta)$. Recall that a Lagrangian submanifold is a submanifold whose tangent spaces are all Lagrangian and a subspace is said to be Lagrangian if its symplectic perpendicular is exactly itself. Note that when a symplectic manifold has finite dimension, Lagrangian submanifolds have half the dimension of the full manifold. In particular, the restriction of the symplectic form to a Lagrangian submanifold vanishes.

(I will get to the reason for the prefix “pre-” in a moment.)

Claim. Let $f:M\rightarrow N$ be a symplectomorphism. Then the graph submanifold $\mathrm{Graph}(f):=\{(x,f(x))|x\in M\}$ is a Lagrangian submanifold of $M \times N^-$. □

Proof. The graph $\mathrm{Graph}(f)$ is indeed a submanifold by virtue of  the implicit function theorem and its tangent space at the point $(x,f(x))\in \mathrm{Graph}(f)$ can be canonically identified with the subspace $W$ of $V:=T_x M\oplus T_{f(x)}N$ consisting of vectors of the form $v\oplus Tf(v)$ where $v\in T_xM$. The goal is to show that $W$ is a Lagrangian subspace of $V$.

To see that $W\subseteq W^\perp$, note that for all $v_1,v_2\in T_xM$, we have:

$\displaystyle ((T\mathrm{pr}_M)^*\omega -(T\mathrm{pr}_N)^*\eta)_{(x,f(x)}(v_1\oplus Tf(v_1), v_2 \oplus Tf(v_2)=\omega_x(v_1,v_2) - \eta_{f(x)}(Tf(v_1),Tf(v_2)) = 0.$

To see that $W\supseteq W^\perp$, first note that because $f$ is a diffeomorphism, $T_{f(x)}N= Tf(T_xM)$ and in particular, $V = T_x M\oplus Tf(T_xM)$. Then if $v_0,w_0\in T_xM$ are such that $v_0\oplus Tf(w_0)\in V$ is symplectically perpendicular to all elements in $W$, which is to say $v_0\oplus Tf(w_0)\in V$ is symplectically perpendicular to $v\oplus Tf(v)$ for all $v\in T_xM$, then we have:

$\displaystyle 0=((T\mathrm{pr}_M)^*\omega -(T\mathrm{pr}_N)^*\eta)_{(x,f(x)}(v_0\oplus Tf(w_0), v \oplus Tf(v)) = \omega_x(v_0,v) - \eta_{f(x)}(Tf(w_0),Tf(v))= \omega_x(v_0-w_0,v).$

Since this is true for all $v\in T_xM$ and $\omega$ is nondegenerate, it follows that $v_0 = w_0$ which implies that $v_0\oplus Tf(w_0)\in W$. ■

Claim. Let $T^*K$ and $T^*L$ be the two cotangent bundles with $\underline f:K\rightarrow L$ a smooth map between the bases. Then $\mathrm{Graph}^*(f):=\{((x,(T\underline f)^*p'),(\underline f(x),p'))|x\in K, p'\in T_{\underline f(x)}L\}$ is a Lagrangian submanifold of $T^*K\times (T^*L)^-$. □

Proof. As in the above case, the implicit function theorem tells us that $\mathrm{Graph}^*(f)$ is indeed a submanifold. Furthermore, its tangent space at the point $((x,(T\underline f)^*p'),(\underline f(x),p'))\in \mathrm{Graph}^*(f)$ may be canonically identified with the subspace $W$ of $V:= T_{(x,(T\underline f)^*p')}(T^*K)\oplus T_{(\underline f(x),p')}(T^*L)$ consisting of vectors of the form $(v \oplus (T\underline f)^*(s'))\oplus (T\underline f(v)\oplus s')$ where $v\in T_xK$ and $s'\in T^*_{f(x)}L$. We need to show $W$ is a Lagrangian subspace of $V$.

To see that $W\subseteq W^\perp$, note that for all $v_1,v_2\in T_xK$ and $s'_1, s'_2\in T^*_{f(x)}L$, we have:

$\displaystyle ((T\mathrm{pr}_{T^*K})^*\omega -(T\mathrm{pr}_{T^*L})^*\eta)((v_1 \oplus (T\underline f)^*(s_1'))\oplus (T\underline f(v_1)\oplus s'_1),(v_2 \oplus (T\underline f)^*(s'_2))\oplus (T\underline f(v_2)\oplus s'_2))=(T\underline f)^*(s'_2)(v_1) - (T\underline f)^*(s'_1)(v_2) - s'_2\circ T\underline f(v_1) + s'_1\circ T\underline f(v_2)=0.$

To see that $W\supseteq W^\perp$, let $(w \oplus t) \oplus (w'\oplus t')\in V$ be symplectically perpendicular to all elements in $W$, which is to say $(w \oplus t) \oplus (w'\oplus t')\in V$ is symplectically perpendicular to $(v \oplus (T\underline f)^*(s'))\oplus (T\underline f(v)\oplus s')$ for all $v\in T_xK$ and $s'\in T^*_{f(x)}L$. Then, we have:

$\displaystyle 0 = ((T\mathrm{pr}_{T^*K})^*\omega -(T\mathrm{pr}_{T^*L})^*\eta)((w \oplus t)\oplus (w'\oplus t'),(v \oplus (T\underline f)^*(s'))\oplus (T\underline f(v)\oplus s'))=s'(T\underline f(w)-w') - (t-(T\underline f)^*(t'))(v).$

Since this is true for all $v\in T_xK$ and $s'\in T^*_{f(x)}L$, it follows that $w' = T\underline f(w)$ and $t = (T\underline f)^*(t')$, which imply that $(w \oplus t) \oplus (w'\oplus t')\in W$. ■

So Weinstein premorphisms do fit the bill, except that they are not morphisms, at least not yet. In order to be morphisms, they need to be composable, and it’s not even clear whether the composition of two Lagrangian submanifolds even makes sense. But wait, the submanifolds are subsets of a Cartesian product and so they constitute a relation in set-theoretic terms. We know how to compose relations! If $Z_1\subseteq M\times P$ and $Z_2 \subseteq P\times N$ are relations, then the composition $Z_2\circ Z_1\subseteq M\times N$ is the set of pairs $(x,z)$ such that there exists a $y\in P$ satisfying the conditions $(x,y)\in Z_1$ and $(y,z)\in Z_2$. The question then is if $M,P,N$ are symplectic manifolds and $Z_1$ and $Z_2$ are Lagrangian submanifolds of $M\times P^-$ and $P\times N^-$ respectively, then whether $Z_2\circ Z_1$ is a Lagrangian submanifold of $M\times N^-$.

To analyse this, we follow Weinstein and reformulate the composition operation as the sequence of the following three operations:

1. Take the Cartesian product $Z_1\times Z_2\subseteq M\times P^-\times P\times N^-$.
2. Intersect the submanifold $Z_1 \times Z_2$ with the submanifold $M \times \Delta_P\times N^-$ where $\Delta_P$ is the diagonal of $P^-\times P$.
3. Project the intersection onto the component $M \times N^-$.

Things may go wrong at the second step since the intersection of two submanifolds is not necessarily a submanifold. The sufficient condition for this is transversality and given that $Z_1 \times Z_2$ intersects $M \times \Delta_P\times N^-$ transversally, we may deduce that $Z_2\circ Z_1$ is indeed a submanifold.

In fact, this is the only obstruction there is to the composability of Lagrangian submanifolds in the above sense. In order to prove this, we require the following two lemmas:

Lemma. If $(V, \omega_{V})$ and $(W, \omega_{W})$ are two symplectic vector spaces with $Z_V\subseteq V$ and $Z_W\subseteq W$ Lagrangian subspaces, then $Z:=Z_V\oplus Z_W$ is a Lagrangian subspace of $(V\oplus W,\omega_{V\oplus W}):=(V\oplus W, (\mathrm{pr}_{V})^* \omega_{V} + (\mathrm{pr}_{W})^* \omega_{W})$. □

Proof. Let $v_1\oplus w_1,v_2\oplus w_2\in Z$. Then we have:

$\displaystyle \omega_{V\oplus W}(v_1\oplus w_1,v_2\oplus w_2) = \omega_{V}(v_1,v_2) + \omega_{W}(w_1,w_2)=0.$

This shows that $Z \subseteq Z^\perp$.

Let $v_0\oplus w_0\in V\oplus W$ be such that $\omega_{V\oplus W}(v_0\oplus w_0,v\oplus w) =0$ for all $v\oplus w\in Z$. Setting $w =0$ tells us that $\omega_{V}(v_{0},v) =0$ for all $v\in Z_V$, which implies that $v_{0}\in Z_V$. A similar argument tells us that $w_0\in Z_W$. So, $v_0\oplus w_0\in Z$. This shows that $Z\supseteq Z^\perp$. ■

Lemma. Let $(V,\omega_V)$ be a symplectic vector space with $Z$ a Lagrangian subspace and $W$ a subspace of $Z$. Then, $Z/W$ is a Lagrangian subspace of the symplectic vector space $(W^\perp/W, \omega_{W^\perp/W})$ where $\omega_{W^\perp/W}$ is given by $\omega_{W^\perp/W}(v + W,w+W):=\omega_V(v,w)$ for any $v,w\in W^\perp$. □

Proof. First, we check that everything at least makes sense. Since $\omega_V|_Z=0$ identically, the symplectic perpendicular $W^\perp$ of $W\subseteq Z$ must at least contain $Z$, so $W^\perp/W$ makes sense and $Z/W$ is indeed a subspace of $W^\perp/W$. Furthermore, $\omega_{W^\perp/W}$ is indeed well-defined (i.e. independent of the choice of the lifts $v$ and $w$) and nondegenerate since we quotient out $W^\perp$ by precisely the kernel of $\omega_V|_W$.

In order to prove that $Z/W$ is a Lagrangian in $(W^\perp/W, \omega_{W^\perp/W})$, we need to show that for any $v+W, w+W\in Z/W$, we have $\omega_{W^\perp/W}(v + W,w+W)=0$, and that if $v_0 + W\in W^\perp/W$ is such that $\omega_{W^\perp/W}(v_0 + W,v+W)=0$ for all $v + W\in Z/W$, then $v_0 + W\in Z/W$. By definition of $\omega_{W^\perp/W}$, this is the same as showing that that for any $v, w\in Z$, we have $\omega_{V}(v ,w)=0$, and that if $v_0 \in W^\perp$ is such that $\omega_{V}(v_0,v)=0$ for all $v\in Z$, then $v_0\in Z$. But this just follows from the hypothesis that $Z$ is Lagrangian. ■

Claim. If $Z_1$ and $Z_2$ are Lagrangian submanifolds of $M\times P^-$ and $P\times N^-$ respectively and $Z_1 \times Z_2$ intersects $M \times \Delta_P\times N^-$ transversally, then the submanifold $Z_2\circ Z_1$ is Lagrangian. □

Proof. By applying the first lemma above to the tangent spaces of $Z_1 \times Z_2$, we see that it must be a Lagrangian submanifold of $M\times P^-\times P\times N^-$.

Now let $(x,z,z,y)\in (M\times P^-\times P\times N^-)\cap (Z_1 \times Z_2)$ and let $W=0\oplus T_{(z,z)}\Delta_P\oplus 0$. Then note that $T_{(x,z,z,y)}(M \times \Delta_P\times N^-)\cong T_xM\oplus T_{(z,z)}\Delta_P \oplus T_yN^-$ is contained in $W^\perp$. Moreover, if $u_0\oplus w_0 \oplus w_0' \oplus v_0\in W^\perp$ then it follows from the definitions that for all $w\in T_zP$, we have $\alpha_z(w,w_0 - w_0') = 0$. Since $\alpha$ is nondegenerate, we gather that $w_0 =w'_0$ and consequently that $W^\perp = T_xM\oplus T_{(z,z)}\Delta_P\oplus T_yN^-$.

Note that $W^\perp/W = T_xM\oplus T_yN^-\cong T_{(x,y)}(M\times N^-)$ and that $Z_2\circ Z_1$ is given by:

$\displaystyle Z_2\circ Z_1 = ((T_{(x,z)}Z_1 \oplus T_{(z,y)}Z_2)\cap W^\perp)/W = ((T_{(x,z)}Z_1 \oplus T_{(z,y)}Z_2)\cap W^\perp+W)/W.$

Thus, by virtue of the second lemma we proved above, if we can show $Z:= (T_{(x,z)}Z_1 \oplus T_{(z,y)}Z_2)\cap W^\perp+W$ is Lagrangian in $V:=T_xM\oplus T_zP^- \oplus T_zP \oplus T_yN^-$, then we are done. To show this, we make use of the fact that if $A,B,C$ are three subspaces of some vector space such that $B\subseteq C$ then:

$\displaystyle (A+B)\cap C=\{a+b|a\in A,b\in B,a+b\in C\}= \{a+b|a\in A, a\in C,b\in B\} = (A\cap C) + B.$

Keeping the above in mind along with the observation that $(\mathscr P_{\mathsf{Vect}}(V), +,\cap,\perp)$ is an involutive lattice (exercise), we have the following chain of equalities:

$Z^\perp = ((T_{(x,z)}Z_1 \oplus T_{(z,y)}Z_2)\cap W^\perp+W)^\perp = ((T_{(x,z)}Z_1 \oplus T_{(z,y)}Z_2)^\perp + W)\cap W^\perp = (T_{(x,z)}Z_1 \oplus T_{(z,y)}Z_2)^\perp\cap W^\perp+W = (T_{(x,z)}Z_1 \oplus T_{(z,y)}Z_2)\cap W^\perp+W = Z.$

So, $Z$ is indeed Lagrangian in $V$. ■

Transversality is a property that generically holds so Weinstein premorphisms are generically composable but not always. Weinstein discusses a few ways in which one might go about fixing this issue; the most natural one for our purposes is the prescription of Wehrheim and Woodward which basically amounts to enlarging the set of Weinstein premorphisms by adding in the kinky cases involving nontransversal intersections by hand. The way we fomally do this is akin to how we fill in the holes in the rational number line by thinking of numbers as sequences of Cauchy convergent sequences of rational numbers subject subject to an equivalence relation.

We define WWW morphisms from a symplectic manifold $(M,\omega)$ to a symplectic manifold $(N,\eta)$ to be, up to certain equivalence relations, finite tuples $(Z_k,Z_{k-1},\ldots,Z_1)$ where $Z_i$ is a Lagrangian submanifold of $P_{i-1}\times P_i^-$ where $i$ runs from $1$ to $k - 1$ and $(P_i,\alpha_i)$ are some symplectic manifolds such that $(P_0,\alpha_0)=(N,\omega)$ and $(P_{k+1},\alpha_{k+1})=(N,\eta)$. The equivalence relation $\sim$ in question is given by $(Z_k, \ldots, Z_{i+1},Z_{i},\ldots,Z_1)\sim (Z_k, \ldots, Z_{i+1}\circ Z_{i},\ldots,Z_1)$ whenever the intersection $Z_{i} \times Z_{i+1}\cap P_{i-1} \times \Delta_{P_i}\times P_{i+1}^-$ is a submanifold. Note that this is more general than requiring transversality since tranversality is after all only a sufficient condition. In fact, if you notice, in the proof of the last claim, we never used the fact that the intersection is transversal, but only that it results in a submanifold.

So, to summarise, WWW morphisms are basically the morphisms generated by Weinstein premorphisms.

We thus have a bona fide category $\mathsf{WSymp}$ of sympelctic manifolds on our hands and we might naively expect symplectic reductions to be categorical quotients in this category but unfortunately, this is not the case. The problem is that there are too many morphisms between two given objects. In particular, the singleton object $\{*\}$ is not a terminal object; the hom-set $\mathrm{Hom}(M, \{*\})$ is basically the set of (generalised, in the sense of Wehrheim and Woodward) Lagrangian submanifolds of $M$.

Why is this a problem? Suppose we have a categorical quotient $Z_q\in \mathrm{Hom}(M,M/\mkern-6mu/_{\mathsf{WSymp}}G)$. Then this means that for every $G$-invariant Lagrangian submanifold $Z\in\mathrm{Hom}^G(M, \{*\})$, there exists a unique Lagrangian submanifold $Z'\in \mathrm{Hom}(M/\mkern-6mu/G, \{*\})$ such that $Z = Z'\circ Z_q$. This is not obviously untrue, so here’s a concrete counterexample.

Example. Take $M$ to be the manifold $N \times \mathbb R^2$ with coordinates $(s,t)$ on $\mathbb R^2$ and the group $G$ to be the group of translations along the $s$ direction. Then, we would expect $M/\mkern-6mu/_{\mathsf{WSymp}}G$ to be $N$ with $Z_q=\Delta_N \times \{t = c\}$ for some real constant $c$. We see that for any Lagrangian submanifold $Z'\subseteq N$, we have $Z'\circ Z_q = Z'\times \{t=c\}$. So, for a Lagrangian submanifold $Z=Z'\times \{t=c'\}$, where $c\neq c'$, there is no $Z'\in \mathrm{Hom}(M/\mkern-6mu/_{\mathsf{WSymp}}G, \{*\})$ such that $Z = Z'\circ Z_q$. □

The problem is easily patched. We consider the slice category $S:=\mathsf{WSymp}/\{*\}$ whose objects are WWW morphisms $M\rightarrow \{*\}$ and whose morphisms are commuting triangles. In other words, our objects are symplectic manifolds $M$ with distinguished generalised Lagrangian submanifolds $Z_M$ and our morphisms are WWW morphisms compatible with the distinguished generalised Lagrangian submanifolds. Note that in this category, the singleton manifold with itself as the distinguished Lagrangian submanifold is the terminal object.

This brings us to the main result of today: symplectic reductions are indeed categorical quotients in the category $S$ we have defined above!

Proposition. Let $(M,\omega)$ be a symplectic manifold with a connected distinguished submanifold $Z_M$ and let $G$ be a connected Lie goup acting on $M$ via a Hamiltonian $G$-action $\rho$ that sends the distinguished Lagrangian submanifold $Z_M$ to itself. Furthermore, let $\mu: M\rightarrow \mathfrak g^*$ and $m\in\mathfrak g^*$ be a choice of moment map and value such that $\mu^{-1}(m)$ is submanifold of $M$, the intersection $Z_M\cap \mu^{-1}(m)$ is nonempty, and $M/\mkern-6mu/G:=\mu^{-1}(m)/G$ is a (symplectic) manifold. Then, $M/\mkern-6mu/_SG = M/\mkern-6mu/G$ with distinguished generalised Lagrangian submanifold $Z_{M/\mkern-6mu/G}=q(Z_M)$ and $Z_q = \{(x,q(x))|x\in \mu^{-1}(m)\}$, where $q:\mu^{-1}(m)\rightarrow \mu^{-1}(m)/G$ is the canonical quotient map. □

Proof. In order to prove the above proposition, we need to show the following things:

1. The Lagrangian submanifold $Z_M\subseteq M$ is contained inside a level set of the moment map $\mu$. In other words, we can choose can $m\in \mathfrak g^*$ such that $Z_M\subseteq \mu^{-1}(m)$. We need this firstly to ensure that the choice of the level set is fixed once we are given $Z_M$ and secondly for $q(Z_M)$ to even make sense.
2. The submanifold $Z_q$ is indeed Lagrangian in $M\times (M/\mkern-6mu/G)^-$. Note that, since $Z_M$ and $Z_{M/\mkern-6mu/G}$ could just as well be viewed as elements $\overline Z_M\in \mathrm{Hom}(\{*\},M)$ and $\overline Z_{M/\mkern-6mu/G}\in \mathrm{Hom}(\{*\},M/\mkern-6mu/G)$ respectively, composability of (generalised) Lagrangian submanifolds would then imply that $\overline Z_{M/\mkern-6mu/G}=\overline {q(Z_M)} = \overline Z_q\circ \overline Z_M$, which at the level of manifolds is the same as $Z_{M/\mkern-6mu/G}$, is automatically a generalised Lagrangian submanifold in $M/\mkern-6mu/G$.
3. For any symplectic manifold $(N,\eta)$ with distinguished generalised Lagrangian submanifold $Z_N$ and a $G$-invariant generalised Lagrangian submanifold $Z_f\in \mathrm{Hom}^G(M,N)$ satisfying $Z_N\circ Z_f = Z_M$, we can construct a generalised Lagrangian submanifold $Z'\in\mathrm{Hom}(M/\mkern-6mu/G,N)$ such that $Z_{M/\mkern-6mu/G} = Z_N\circ Z'$ and $Z_f = Z'\circ Z_q$. Note that since Lagrangian submanifolds generate generalised Lagrangian submanifolds under (formal) composition, it is enough to consider the case where $Z_f$ is a bona fide Lagrangian submanifold.
4. Such a $Z'$ is unique, i.e. if $Z''$ was an element of $\mathrm{Hom}(M/\mkern-6mu/G,N)$ satisfying $Z_f = Z''\circ Z_q$, then $Z'' = Z'$.

The Lagrangian submanifold $Z_M\subseteq M$ is contained inside a level set of the moment map $\mu$:

Let us denote the bundle spanned by the vector fields generated by action of the Lie algebra $\mathfrak g^*$ as $E$. Then since by hypothesis, $Z_M$ is preserved under this action, $E|_{Z_M}$ must be a subbundle of $TZ_M$. Furthermore, since $Z_M$ is Lagrangian $TZ_M = TZ_M^\perp$ is a subbundle of $E|_{Z_M}^\perp$. We saw in §3 that the level sets of the moment map may be characterised as $T(\mu^{-1}(m)) = E|_{\mu^{-1}(m)}^\perp$. This implies that the tangent space of $Z_M$ at a point is contained in the tangent space of the level set of $\mu$ containing the point (provided the tangent space is defined). Since $Z_M$ is connected, it follows that it is therefore contained within a level set of $\mu$.

The submanifold $Z_q$ is Lagrangian in $M\times (M/\mkern-6mu/G)^-$:

Given an $x\in \mu^{-1}(m)$, the tangent space $T_{x,q(x)}Z_q$ of $Z_q$ consists of vectors of the form $v\oplus Tq(v)$ where $v$ is a vector in $T_x\mu^{-1}(m)$. For any two vectors $v,w\in T_x\mu^{-1}(m)$, we have:

$\displaystyle \omega_x(v,w) - \omega'_{q(x)}(Tq(v),Tq(w)) = \omega_x(v,w) - \omega_{x}(v,w)=0.$

Therefore, $T_{x,q(x)}Z_q\subseteq T_{x,q(x)}Z_q^\perp$. Now, note that $q$ is a submersion, so every vector in $T_{(x,q(x)}(M\times (M/\mkern-6mu/G)^-)$ is of the form $v_0\oplus Tq(w_o)$ where $v_0\in T_xM$ and $w_0\in T_x\mu^{-1}(m)$. So assume that $v_0\oplus Tq(w_o)\in T_{x,q(x)}Z_q^\perp$. This means that for all $v\in T_x\mu^{-1}(m)$, we have:

$\displaystyle 0=\omega_x(v_0,v) - \omega'_{q(x)}(Tq(w_0),Tq(v)) = \omega_x(v_0-w_0,v).$

Since this is true for all $v\in T_x\mu^{-1}(m)$, it means that $v_0-w_0\in T_x\mu^{-1}(m)^\perp$. But we have already mentioned that this is $E_x$ which is in fact the kernel of $Tq$ acting on $T_x\mu^{-1}(m)$. But this means, firstly that $v_0 \in T_x\mu^{-1}(m)$ and secondly that $v_0\oplus Tq(w_o) = v_0\oplus Tq(v_o)$ which is in $T_{x,q(x)}Z_q$ after all. So, $T_{x,q(x)}Z_q\supseteq T_{x,q(x)}Z_q^\perp$ and hence it follows that $Z_q$ is Lagrangian.

For every $G$-invariant Lagrangian submanifold $Z_f\in \mathrm{Hom}^G(M,N)$ satisfying $Z_N\circ Z_f = Z_M$, we can construct a $Z'\in\mathrm{Hom}(M/\mkern-6mu/G,N)$ such that $Z_{M/\mkern-6mu/G} = Z_N\circ Z'$ and $Z_f = Z'\circ Z_q$:

Let $x\in M$ and $y\in N$ be such that $(x,y)\in Z_f$. Since $Z_f$ is $G$-invariant, $E_x\oplus 0_y$ is contained in $T_{(x,y)}Z_f$. And since $_f$ is Lagrangian, $T_{(x,y)}Z_f$ is contained in $E_x^\perp\oplus T_yN$ which, as we have argued above, is the same as $T_{(x,y)}(\mu^{-1}(m)\times N)$ for $m = \mu(x)$. So, every connected component of $\mathrm{pr}_M(Z_f)$ is contained in a level set of $\mu$. Distinct connected components may be contained in distinct level sets, but we choose the one containing $Z_M$. Note that there will always be an $x\in \mathrm{pr}_M(Z_f)$ such that $x\in Z_M$ since $Z_M = Z_N\circ Z_f$ by hypothesis, so this is always possible. Let $q:\mu^{-1}(m)\rightarrow \mu^{-1}(m)/G$ be the quotient map. Then we may set $Z' = \{(q(x),y)| (x,y)\in Z_f\}$. Firstly note that if this was indeed a generalised Lagrangian submanifold, it would satisfy $Z_{M/\mkern-6mu/G} = Z_N\circ Z'$ and $Z_f = Z'\circ Z_q$. Secondly, to see that it is indeed a generalised Lagrangian submanifold, consider $\overline Z_q := \{(q(x),x)|x\in\mu^{-1}(m)\}$, which is a Lagrangian submanifold since $Z_q$ is. Then composability implies that $Z' = Z_f\circ \overline Z_q$ is a generalised Lagrangian submanifold.

If $Z'' \in\mathrm{Hom}(M/\mkern-6mu/G,N)$ satisfies $Z_f = Z''\circ Z_q$, then $Z'' = Z'$:

Note that $\overline Z_q$ is the right inverse of $Z_q$, meaning that $Z_q\circ \overline Z_q = \Delta_{\mu^{-1}(m)}$. That $Z_q$ has a right inverse implies that $Z_q$ is right-cancellable in $Z'\circ Z_q = Z_f = Z''\circ Z_q$. ■

Wha we have now achieved is a much cleaner formulation of the idea of symplectic reductions which is much easier to memorise. In fact, we have in the process, ended up generalising things a little. We have already seen one way in which the categorical quotient version is a generalisation of symplectic reductions, as originally defined, at the end of §3. Since the Lagrangianness of submanifolds is a local property, we can drop the requirement that the group action be Hamiltonian. The other this is generalisation has to do with the fact we are taking into account not only Lagrangian submanifolds but generalised Lagrangian submanifolds as well. This allows us to drop the requirement that $\mu^{-1}(m)$ is a submanifold of $M$. Thus, the following concluding exercise:

Exercise. Construct a symplectic manifold $(P,\alpha)$ with distinguished submanifold $Z_P$ and submanifolds $Z_1\subseteq M\times P^-$ and $Z_2\subset P\times (M/\mkern-6mu/G)^-$, which are Lagrangian in $M\times P^-$ and $P\times (M/\mkern-6mu/G)^-$ respectively and are compatible with all the distinguished generalised Lagrangian submanifolds, such that $Z_2 \circ Z_1 = \{(x,q(x))|x\in \mu^{-1}(m)\}$ holds at the level of set-theoretic relations whether or not $\mu^{-1}(m)$ is a submanifold. (Hint: Regularity of values in the image can be reformulated as a transversality condition.) □

Thanks to Murad Alim, Florian Beck, and Martin Vogrin for discussion and to Áron Szabo for help with the examples and inducting me into the upper echelons of the Church of Basis-Free Computations.

# Coffee Brecht

In the dark times
Will there also be coffee?
Yes, there will also be coffee.
Of the dark kind.

# On toothbrushes

Sylvia Plath:

ORR: Setting aside poetry for a moment, are there other things you would like to write, or that you have written?

PLATH: Well, I always was interested in prose. As a teenager, I published short stories. And I always wanted to write the long short story, I wanted to write a novel. Now that I have attained, shall I say, a respectable age, and have had experiences, I feel much more interested in prose, in the novel. I feel that in a novel, for example, you can get in toothbrushes and all the paraphernalia that one finds in dally life, and I find this more difficult in poetry. Poetry, I feel, is a tyrannical discipline, you’ve got to go so far, so fast, in such a small space that you’ve just got to turn away all the peripherals. And I miss them! I’m a woman, I like my little Lares and Penates, I like trivia, and I find that in a novel I can get more of life, perhaps not such intense life, but certainly more of life, and so I’ve become very interested in novel writing as a result.

ORR: This is almost a Dr. Johnson sort of view, isn’t it? What was it he said, “There are some things that are fit for inclusion in poetry and others which are not”?

PLATH: Well, of course, as a poet I would say pouf! I would say everything should be able to come into a poem, but I can’t put toothbrushes into a poem, I really can’t!

Richard Feynman:

What ceremonies do we believe in?
Every morning we brush our teeth.
What is the evidence
that brushing our teeth does any good
against cavities?

And you start wondering.
Are we all imagining that,
as the earth turns
and the orbit has an edge between light and dark,
that along that edge all the people
are doing the same ritual—
brush, brush, brush—
for no good reason?
Have you tried to picture
this perpetual line of toothbrushes going around the earth?

# ‹br›eath

I often wonder
if poets pause
at the end
of every line
to catch their

(excuse my asthma)

breath

for lung capacity is finite
and poetry endless
and we’d die if we didn’t
breathe
every now and then

but sometimes waking nightmares, alcohol and cock and endless balls distract you long enough that you just happen to forget that you are not any more immune to death

than anyone else

# Noumenology I

I.

Any attempt to discern a fundamental description of the world we inhabit from its phenomenology is greatly complicated by the fact that we indeed inhabit it.

On one hand, there is the obvious difficulty of observation. At the end of the day, an act of observation is just a subset of the universe interacting with other such subsets. Given that there is only so much room for manoeuvre allowed by the dynamical constraints we are subject to, it may well be the case that there exists a fundamental limit to how deeply we can probe the universe.

On the other hand, there is the issue of rebuilding our everyday experience bottom-up from the fundamental laws that we have identified. This may not be as straightforward as invoking the law of large numbers and all the nice things that happen in the thermodynamic limit, because certain aspects of our experiences, both everyday and otherwise, may have to do with the atypicality of our vantage points rather than some deep fact about Nature. This is what cosmologists and string theorists refer to as the anthropic principle: the conditions of this world are amenable to life because if they weren’t, we wouldn’t be there to observe anything. Divorced from any further context, this does come off as circular, but seen in the light of the claim that there is an entire ensemble of worlds out there, it turns out to be merely selection bias in action.

(Of course, depending on what “world” stands in for, this claim runs the gamut from perfectly innocuous facts such as the existence of other planets to approximately a million times more contentious proposals such as the Multiverse, which incidentally is contentious for about the same reason you might ask, “Why not a billion times more?”)

The first difficulty, that there is a discrepancy between what exists and what we can observe in principle, is pretty much the essence of the second law of thermodynamics. After all, if we had access to information about the velocities of every molecule in some gas kept in a bicameral container, we could exploit that to arrange for the faster molecules to end up on one side and the slower molecules on the other using a valve that we can open or close at will. In other words, we could set up a temperature differential without performing any work, in flagrant violation of the Clausius statement of the second law. And if we tried to set up an actual physical mechanism to acquire this information as Landauer did in his refinement of Maxwell’s original thought experiment, we could see that at the end of the cycle, when everything is returned to its initial state and the information acquired is erased, energy would have to be transferred to the environment as heat, thereby increasing the net entropy of the universe. So, save for a constant of proportionality, entropy in the sense of Clausius (energy dissipated via heat at unit temperature) really is entropy in the sense of Shannon (number of bits required to encode the state of the system). The bridge between these two very different conceptions of entropy was of course Boltzmann.

Thermodynamics, as Boltzmann saw it, was rooted in the specificity of the human experience. If we find it absurd that shattered teacups don’t spontaneously reassemble into intact ones because something something information, it is due to human bias privileging the intact teacup over any other specific arrangement of shards. So, if we truly wished to unmask the underlying principles behind thermodynamic truths, Boltzmann argued, we would need to adopt a bottom-up microscopic view of the world in which every configuration would be on the same footing as any other. Boltzmann was far ahead of his time and, in the posthumous immortality his ideas brought him, effected the foundation of not only an entire branch of physics but also deep connections thereof with disciplines that were yet to be in existence. But having given the man his due, I’ll still have to say that in excising humans from thermodynamics he threw out the baby with the bathwater and left it bereft of any role for an observer whatsoever. Consequently, the second difficulty that I had mentioned earlier remains unaddressed and attempts to understand why the second law is true woefully incomplete.

This post is my attempt to dig up the baby and bring it back to life.

II.

Before there can be life, there must be a body, and before there can be a body, there must be a universe. I stipulate that my universe consist of two components, $A$ and $B$. I will use the same notation to denote the respective (finite) sets of states available to these two components, which means that the set of states available to the composite is $A\times B$. In order for this universe to be a suitable laboratory for understanding how the second law can emerge from reversible deterministic dynamics, we need to equip it with a time evolution operator $\tau$ that permutes the elements of $A\times B$. If the time evolution operator $\tau$ cannot be expressed as the product of permutations on the individual sets, $A$ and $B$, the system will be said to be coupled. Since I will eventually be promoting one of these two components to the role of an observer, it is this case that I’m interested in.

The map $\tau$ induces natural equivalence relations on the sets $A$ and $B$. Namely, two states $a$ and $a'$ in $A$ are declared equivalent iff the (not necessarily bijective) maps $\pi_B\circ\tau(a,\cdot)$ and $\pi_B\circ\tau(a',\cdot)$ induced on $B$ via the projection $\pi_B:A\times B\rightarrow B$ are equal. Likewise, two states $b$ and $b'$ in $B$ are declared equivalent iff the maps $\pi_A\circ\tau(\cdot,b)$ and $\pi_A\circ\tau(\cdot,b')$ induced on $A$ via the projection $\pi_A:A\times B\rightarrow A$ are equal.

Now, some care needs to be taken so that the psychological arrow of time we are endowed with as outsiders doesn’t surreptitiously wend its way into our toy universe, at least not yet. As far as the system at hand is concerned, there is nothing intrinsic privileging forward time evolution $\tau$ over backward time evolution $\tau^{-1}$. Hence, in order to be more egalitarian in how we go about things, I introduce an entire family of equivalence relations, $\sim^A_{j}$ and $\sim^B_{j}$, indexed by integers $j$ and defined so that $a\sim^A_{j}a'$ iff $\pi_B\circ\tau^j(a,\cdot)= \pi_B\circ\tau^j(a',\cdot)$ and $b \sim^A_{j}b'$ iff $\pi_A\circ\tau^j(\cdot,b)= \pi_A\circ\tau^j(\cdot,b')$.

The equivalence classes to which the states $a\in A$ and $b\in B$ belong under the respective equivalence relations $\sim^A_{j}$ and $\sim^B_{j}$ shall be denoted $[a]_j$ and $[b]_j$. (I have refrained from including a superscript indicating whether it is $A$ or $B$ that the class is contained within because it is anyway evident from the representative elements $a$ and $b$.) In order to further cement my commitment to the three R’s of environmentalism, I will again be recycling notation, so that $[a]_j$ and $[b]_j$ also denote the maps $\pi_B\circ\tau^j(a,\cdot)$ and $\pi_A\circ\tau^j(\cdot,b)$ respectively. This makes sense since there is a one-to-one correspondence between all possible maps on $B$ and $A$ on one hand and the (possible empty) equivalence classes that we partition $A$ and $B$ into using the relations $\sim^A_{j}$ and $\sim^B_{j}$ on the other. In particular, this means that it is perfectly sensible to talk of things like $[a]_j\left([b]_k\right)$, which denotes the image of the set $[b]_k\subseteq B$ under the map $\pi_B\circ\tau^j(a,\cdot):B\rightarrow B$ corresponding to the class $[a]_j$.

Given an assignment of the $a\in A$ and $b\in B$ to the $\sim^A_{j}$ and $\sim^B_{j}$ equivalence classes (prelabelled by maps on $B$ and $A$), we can unambiguously reconstruct the map $\tau^j$. However, an arbitrary assignment would not in general yield a bijective $\tau^j$ on $A\times B$. The requirement that it does therefore places certain constraints on the assignment. (There are also constraints arising from the fact that $\tau^j$ has to be the $j$-th power of some map, a property that an arbitrary map doesn’t necessarily have for $j\neq 1$, but I won’t be considering those here.)

The bijectivity of $\tau^j$ boils down to this: if $[a]_j(b) = [a]_j(b')$ for some $a\in A$ and $b,b'\in B$, then $[b]_j(a)\neq [b']_j(a)$. Likewise, if $[b]_j(a) = [b]_j(a')$ for some $a,a'\in A$ and $b\in B$, then $[a]_j(b)\neq [a']_j(b)$. This certainly suffices; however, since I am distrustful of anything that Wittgenstein wouldn’t regard as tautological, I would rather have things formulated in terms of equations rather than inequalities. To this end, instead of looking at the $\sim^A_{j}$ and $\sim^B_{j}$ partitions, I will be turning attention to the $\sim^A_{j}$ and $\sim^B_{-j}$ partitions, which satisfy the equations $|[a]_{j}^{-1}(b)|=|[b]_{-j}^{-1}(a)|$, for all $a\in A$ and all $b\in B$. Here, $[a]_j^{-1}(b)$ and $[b]_{-j}^{-1}(a)$ denote the preimages of $b$ and $a$ under the maps $[a]_j$ and $[b]_{-j}$ respectively.

Why is this true? The image of the set $\{a\} \times [a]_j^{-1}(b)$ under the map $\tau^j$ projects onto $b$ via $\pi_B$, which is pretty much the definition of $[a]_{j}^{-1}(b)$. The action of $\tau^{-j}$ on the image set $\tau^j\left(\{a\} \times [a]_j^{-1}(b)\right)$ gives us back the set $\{a\} \times [a]_j^{-1}(b)$ we started with, which, as you can see, projects onto $a$ via $\pi_A$. Thus, $\tau^j\left(\{a\} \times [a]_j^{-1}(b)\right)$, which has the same size as $[a]_{j}^{-1}(b)$, is contained within the set $[b]_{-j}^{-1}(a)\times \{b\}$, which has the same size as $[b]_{-j}^{-1}(a)$. In other words, we have $|[a]_{j}^{-1}(b)|\le|[b]_{-j}^{-1}(a)|$. A symmetric argument tells us that $|[a]_{j}^{-1}(b)|\ge|[b]_{-j}^{-1}(a)|$ as well, from which it follows that $|[a]_{j}^{-1}(b)|=|[b]_{-j}^{-1}(a)|$.

The above argument actually gives us an explicit bijection between the sets $[a]_j^{-1}(b)$ and $[b]_{-j}^{-1}(a)$ in terms of the map $\tau^j$. In fact, the converse also holds: given any bijection from $[a]_j^{-1}(b)$ to $[b]_{-j}^{-1}(a)$, we can use it to construct the restriction of $\tau^j$ to $\{a\}\times [a]_{j}^{-1}(b)$. And since all the $\{a\}\times [a]_{j}^{-1}(b)$ form a pairwise disjoint cover of the set $A\times B$, there are no compatibility issues to contend with: once you have ensured that $|[a]_{j}^{-1}(b)|=|[b]_{-j}^{-1}(a)|$ for all $a\in A$ and all $b\in B$, you can pick arbitrary bijections between them and get a sensible $\tau^j$ out. So, short of the group theoretic obstructions I have already mentioned above, the constraints so far are, in a sense, maximal.

With the essence of time-reversible dynamics thus distilled into pithy equations relating the $\sim^A_{j}$ and $\sim^B_{-j}$ partitions, we can do something nice. Suppose I had a function $g_j$ defined on the $\sim^A_j$ equivalence classes and wished to see how the average value of this function over a subset $A\times \{b\}\subseteq A\times B$, denoted $\langle g_j\rangle^b_{0}$, changed when the subset in question was acted upon by $\tau^{-j}$; then I could determine this by summing $g_j\left([a]_j\right):=g_j(a)$ over all states $a\in A$ but with the weights $|[b]_{-j}^{-1}(a)|$ instead of unity:

$\displaystyle \langle g_j\rangle^b_{-j}=\frac{1}{|A|} \sum_{a\in A} |[b]_{-j}^{-1}(a)|g_j(a).$

Since $|[a]_{j}^{-1}(b)|=|[b]_{-j}^{-1}(a)|$, the above can be rewritten as

$\displaystyle \langle g_j\rangle^b_{-j}=\frac{1}{|A|} \sum_{a\in A} |[a]_{j}^{-1}(b)|g_j(a).$

Now, if the $\sim_j^A$ equivalence classes are prelabelled by maps $f$ on $B$ and $|f|_j$ denotes the number of elements $a$ assigned to the $\sim_j^A$ equivalence class labelled $f$, then this can still be massaged further to yield

$\displaystyle \langle g_j\rangle^b_{-j}=\frac{1}{|A|} \sum_{f\in B^B} |f^{-1}(b)||f|_jg_j(f).$

In the above, I have taken the liberty to write $g_j\left([a]_j\right)$ as $g_j(f)$ with the subscript $j$ in $g_j$ indicating that $f$ is to be to be interpreted as the label of a $\sim_j^A$ equivalence class. As a result, $\langle g_j\rangle^b_{-j}$ has been expressed entirely in terms of averaging over all the maps on $B$ without any reference to the partitioning of $B$ whatsoever. This is good news because in order to be making contact with the blueprint laid out in the previous section, I’m going to be thinking of $B$ as an observer (and $[a]_1$ as the induced macrostates), which means it would be necessary to have everything only in terms of what $B$ has access to.

But before I can do that, I need to imbue $B$ with that attribute of qualia that is the key to understanding irreversibility: memory, or as my friend Ronak prefers to call it, “Funesness.”

III.

All creation is subtraction. Michelangelo chiselled away the marble to set his angels free, Shakespeare pruned the ramblings of monkeys raging away at typewriters to impart breath to Hamlet, Borges razed down aisles of the Library of Babel to whelp his cornucopia of immediate irrealities, and now it is our turn.

Funesness is as much an attribute of entire dynamical systems as artistic or literary merit is of blocks of marble or folios of gibberish. Rather, to speak of Funesness is to speak of individual states which fulfil certain conditions abstracting the essential features of human memory from messy complications like its ability to retrospectively fabricate experiences on cue. One man’s meat is another’s bare bones, so there is little I can offer by way of an a priori argument for why I think it is the following two features that we must strive to capture and make precise: (a) a Funes state of an observer exhibits a certain degree of (anti-)concurrence with the immediately preceding input, and (b) the concurrence is higher when the input in question has a lower probability of occurrence. But if I had to try, I would say that (a) has to do with the fact that we remember the past and not the future, while (b) has to do with the fact that we recall bizarre and surprising events with greater ease than run-of-the-mill ones.

Condition (a) may be implemented by requiring that the marginal probability that the immediately preceding input is $f$, denoted $p(f)$, generally differs from the conditional probability that the immediately preceding input is $f$ given that the current state of the observer is $b$, denoted $p(f|b)$. Meanwhile, condition (b) may be implemented by requiring that $p(f|b)/p(f) \leq p(f'|b)/p(f')$ whenever $p(f) \geq p(f')$. Note that $p(f|b)/p(f)$ may be regarded as a measure of how dependent two events $f$ and $b$ are on each other.

As my notational choices may suggest, I am interpreting the $\sim_1^A$ equivalence classes as the inputs and $B$ as the observer. This makes sense because as long as it is only the immediately preceding state of $A$ that is concerned, all $B$ is sensitive to is the $\sim_1^A$ equivalence class that the state is in. The sample space implicit in the assignment of probabilities is the set of consecutive pairs of states, which is a subset of $(A\times B)^2$. So, $p(f)$ is $|f|_1/|A|$, $p(b)$ is $1/|B|$, and as we found out in the previous section, $p(f|b)$ is $|f^{-1}(b)||f|_1/|A|$. Condition (a) becomes the requirement that $|f^{-1}(b)|$ generally differs from unity while condition (b) becomes the requirement that $|f^{-1}(b)|\leq |f'^{-1}(b)|$ whenever $|f|_1 \geq |f'|_1$.

To investigate the consequences of these requirements, let’s begin with the observation that

$\displaystyle \sum_{f\in B^B} |f^{-1}(b)||f|_1=\sum_{f\in B^B} |f|_1=|A|.$

This follows from the fact that both $p(f)$ and $p(f|b)$ yield unity upon being summed over all $f\in B^B$. A little rearranging yields

$\displaystyle \sum_{f\in B^B} \left(|f^{-1}(b)|-1\right)|f|_1=0.$

It may well be that all the summands in this sum individually vanish, but that’s the boring case, which we have already taken care to exclude from our definition of Funesness. In general, there will be summands that are negative, summands that vanish, and summands that are positive. Let $B^B_-$ be the set of all $f\in B^B$ for which $|f^{-1}(b)|<1$, let $B^B_+$ be the set of all $f\in B^B$ for which $|f^{-1}(b)|>1$, and let $K\geq 0$ be given by

$\displaystyle K=\sum_{f\in B^B_-} \left(1-|f^{-1}(b)|\right)|f|_1=\sum_{f\in B^B_+} \left(|f^{-1}(b)|-1\right)|f|_1.$

Furthermore, let’s choose $f_-\in B^B_-$ and $f_+\in B^B_+$ so that $|f_-|_1$ is the minimum of $|f|_1$ over all $f\in B^B_-$ and $|f_+|_1$ is the maximum of $|f|_1$ over all $f\in B^B_+$. Therefore, as the logarithm is a monotonically increasing function of its (positive) argument, it follows that

$\displaystyle \sum_{f\in B^B_-} \left(1-|f^{-1}(b)|\right)|f|_1\log|f|_1\geq K\log|f_-|_1,$
$\displaystyle \sum_{f\in B^B_+} \left(|f^{-1}(b)|-1\right)|f|_1\log|f|_1\leq K\log|f_+|_1.$

Now, by definition of a Funes state, $|f_-^{-1}(b)|<1<|f_+^{-1}(b)|$ implies that $|f_-|_1> |f_+|_1$. Therefore, the two inequalities above may be combined to yield

$\displaystyle \sum_{f\in B^B} \left(|f^{-1}(b)|-1\right)|f|_1\log|f|_1\leq K\log\frac{|f_+|_1}{|f_-|_1}<0.$

This may be restated as $\langle S_1\rangle^b_{0}>\langle S_1\rangle^b_{-1}$, where $S_1$ is the function that sends $f\in B^B$ to $\log|f|_1$. There is simply not enough structure in our toy universe to admit meaningful discussion of things such as the subjective conscious experience we are accustomed to, but as long as the cognitive architecture enabling it is sculpted out of Funes states, the macrostates that the resulting cognitive model is privy to will be precisely the $\sim_1^A$ equivalence classes. Per Boltzmann’s prescription, the logarithm of the size of a macrostate is the physical entropy associated with a particular (micro)state within the aforementioned macrostate. In other words, given that the current state of the observer is Funes, the entropy of the environment must have increased on an average.

IV.

The second law of thermodynamics enjoys a peculiar sort of epistemic privilege over the rest of physical laws. There is overwhelming consensus that there is no fundamental mechanism behind the second law, unlike the case of conservation laws, for instance. Yet, it is the only thing that we are really, truly, absolutely certain of—even conservation laws are known to fail in presence of anomalies. It is tempting to chalk this uncharacteristic certainty up to our characteristic cynicism—the Universe may be lovely, dark and deep, but it still won’t let us get away with free lunches—but if there is one thing that I would like you to take away from this post, it is that the second law should be thought of in the same vein as Descartes’ dictum, “I think, therefore I am.” You may espouse radical skepticism and begin doubting the existence of everything around you, but you cannot doubt that of your own, because if you didn’t exist, you wouldn’t be there to doubt the existence of anything. Likewise, you might doubt everything we know about physics, but you can’t doubt the second law of thermodynamics because if it weren’t true, you wouldn’t have the perception of time required to even formulate your doubt.

This idea, that our perception of time and increase in entropy are intimately connected, is not new; indeed, it goes back to Boltzmann himself. At the time, the Big Bang and cosmological expansion were unheard of, and so as far as Boltzmann was aware, the Universe had been around forever. It followed that everything that could happen would have already happened and the net entropy of the Universe would have been already maximised, a conclusion that seemed inconsistent with the fact the we see entropy increasing all the time. Boltzmann resolved this by invoking an anthropic argument: since the Universe has been there forever, even highly improbable statistical fluctuations eventually occur in some corner causing a momentary decrease in entropy there, and since our very existence relies upon the progression from order to disorder, we could only survive during these fluctuations. (Of course, people in the opening leg of the fluctuation would perceive time running backward, while those in the closing leg would perceive it running forward.)

At this point, it may seem that my critique about how Boltzmann didn’t care about observers was a tad misplaced, but hear me out. As several physicists from the post-Hubble period pointed out, if a local entropy fluctuation was improbable, a global one would be even more so. The former is all that is requisite for our existence, so there is no anthropic justification for why all of the observable universe seems to have fantastically low entropy. Of course, cosmologists today seek recourse in the peculiarity of the initial conditions of the Universe, but I disagree that this plays an important role in the resolution of this paradox. What does play a role, in my opinion, is the fact that the partition of the Universe into macrostates upon which the entropy crucially depends is not God-given but determined by the way we are coupled to the rest of the Universe. In other words, the second law of thermodynamics appears to hold throughout the observable universe because we are simply incapable of coupling to an observable with respect to which the net entropy decreases.

Or are we? Just as the Funes conditions concerned coincidences between states of $B$ and the $\sim_1^A$ equivalence classes to which the immediately preceding states of $A$ belonged, we may talk of “Senuf” conditions which concern coincidences between states of $B$ and the $\sim_{-1}^A$ equivalence classes to which the immediately following states of $A$ belong (my sincere apologies to Segrob). As I’ve mentioned above, it’s not possible to talk of the relationship between a mind and its mental states without adding any further structure (and possibly upgrading to a nondiscrete model since mental processes typically seem to involve an interplay of various relaxation timescales), yet there is a rough sense in which we might say that the Funes states “perceive” time as flowing in the same direction as that we have arbitrarily assigned to our toy universe, while Senuf states “perceive” it flowing in the opposite direction. But there isn’t anything precluding a state of $B$ from being both Funes and Senuf at the same time! So, why does being able to remember both the past as well as the future seem so patently absurd?

There are two answers that I can think of. The boring one is that the Funes and Senuf conditions are pretty restrictive, so while one or the other may be explained away using an anthropic argument, both being satisfied together is statistically unlikely despite selection bias. The more interesting answer has to do with the limitations of our imagination when it comes to matters of consciousness and requires a slight digression first.

Recurrent epileptic seizures, caused by abnormal neural activity wreaking havoc through the brain, can be so incredibly disruptive to normal day-to-day functioning that those afflicted thus are actually willing to have pieces of their brain cut out so that the seizures may be contained within the region of origin. And indeed, as a last resort after pharmocological interventions have proved fruitless, doctors turn to corpus callosotomy, partial or complete removal of the corpus callosum, as a solution. The corpus callosum is a bundle of neural fibres that constitute the only connection between the two hemispheres of the brain. Its removal keeps seizures contained within a hemisphere but also inhibits communication between the two hemispheres. As a result, anyone undergoing this procedure is rendered split-brain.

Functional responsibilities aren’t distributed homogeneously across the two hemispheres. The right hemisphere controls the left side of the body and is dominant in tasks involving spatial reasoning among other things, while the left hemisphere controls the right side of the body and is dominant in tasks involving verbal reasoning among other things. So, if a split-brain subject were to be seated in front of a screen and the image of an object flashed on its left half so that only the right hemisphere had access to the visual stimulus, then the subject would be able to make an illustration of the object with their left hand but still (truthfully) say that they observed nothing. (This is an actual experiment, by the way.) Hence, it’s not only the brain that is split, but the very sense of self as well.

The neural activity underlying the divided subjective conscious experience is present in a person with an intact corpus callosum as well but as a result of the communication between the two hemispheres it permits, their brain is able to fashion these separate selves into a coherent whole. I won’t say that this means that the self is an illusion since it is exactly as real as everything else we experience, but we’ll have to admit that the brain does a hell lot of editing behind the scenes.

The trajectory of Funes mental processes in our brains may possibly intersect that of a Senuf mental process which may even be acorporeal as far as we can tell, but our brain would simply work overtime to keep the usual show running. So I guess what I am trying to say is that you might be sharing your brain with a Benjamin Button at this very moment and yet have absolutely no inkling of it whatsoever.

Thanks to Sushrut Thorat, Sankeerth Rao and Avradeep Bhowmik for help with resources, and to Ronak M. Soni for discussion. This acknowledgment however does not necessarily mean that Ronak agrees with everything above.

# Either/or

I.

As long as the nights teeter into dawn,
And embers into dazzling cities,
The poets will walk into the flaming valley
And burn for a thousand years,
Wafting their pregnant worlds across the summer stillness.

And as long as the days dissolve into dusk,
And cities into sputtering embers,
They will behead their echo by the ocean
And let it bleed for a thousand years
So that the ghost may tingle against the frigid face of God.

In all these years of burning and bleeding,
And all these memories welling and receding,
Did you wonder whether you remembered it all backwards

Before you remembered nothing at all?

II.

You burn, you bleed, you drown, reprise,
Each new iteration of hellfire and ice
Swings the world in an ever-widening arc:
The summers grow bright and the winters dark.

Euphoria and desolation, benedictions and lies.

Is it the world that is brimming over and over,
Or is it you shrinking until you are a needle
Precariously perched upon the immediacy of reality?

What would it take for you to tumble away?

III.

You do not know, you do not know,
You with your whimpering beast in tow,
As you crawl into the labyrinth of faces—

The faces that indulge you, so you think at least,
And a moment of truth locked in stasis.

You imagine me waiting in the sanctum sanctorum,
You imagine me waiting for you with an extended thumb
Amid the pages of my inventory of names
Of every beast that was and every beast to come.

With a name, I will set you free,
With a name, I will tip you into the void,
And watch your crisis of being
Combust itself into inanimate meat
And gravitate into the heat death of the Universe.

Are you terrified that I will wait for you?
Or are you terrified that I will not?