Every day, the game refreshes with a new five-letter target English word. You then get six tries to guess the word. With each guess, the game tells you if each letter is correct (green), if it appears in the target word but you have it in the wrong place (yellow), or if it does not appear at all in the target word (gray). For instance, here’s a possible sequence of guesses for once of the recent target words, `ABBEY`

:

In [5]:

```
WordlePlot[{"STARE", "CLEAN", "FADED", "ABBEY"}, "ABBEY"]
```

Out[5]:

The first guess, `STARE`

, has so far been one of my favorites, since right away you learn about five of the most frequently occuring letters. But while we’re on the topic, what are the most frequent letters for five letter words? In principle this could be different from the distribution for all English words. Let’s see:

In [8]:

```
words = ToUpperCase /@ Select[WordList["KnownWords"],
StringLength[#] == 5 && AllTrue[Characters[#], x |-> MemberQ[CharacterRange["a", "z"], x]] &];
chardata = Association[Rule @@@ ReverseSortBy[Tally@Flatten@Characters@words, Last]];
BarChart[Labeled[#2,
Style[#1, FontSize -> 24, FontFamily -> "Clear Sans"], Before] & @@@
Reverse@Take[Normal@chardata, 15], BarSpacing -> 0,
BarOrigin -> Left, Axes -> False, ImageSize -> 600]
```

Out[8]:

So, using `STARE`

we do hit the three most common letters (A, E, and R), but S and T are not as prevalent as I thought. Could we use this data to find a better starting word? For starters, we could form a very simple heuristic: for each word, add the frequency counts for each of its distinct letters, and let that be a score. By this metric, here are the top 15 starter words:

In [11]:

```
score[word_] := Total[chardata /@ Union[Characters[word]]]
scores = {#, score[#]} & /@ words;
TableForm[
Style[First[#] <> " (" <> ToString[Last[#]] <> ")",
FontSize -> 24, FontFamily -> "Courier New"] & /@
Take[ReverseSortBy[scores, Last], 15],
TableHeadings -> {Style[#, FontSize -> 24,
FontFamily -> "Courier New"] & /@ Range[15], {}}]
```

Out[11]:

`STARE`

is down at #13 (though really #6 if we count groups of anagrams as one). Maybe I’ll start using `ORATE`

. But of course, this is just a heuristic. Can we do better with the starting word, or more importantly, develop a strategy for choosing a word at any stage of Wordle?

As with any strategy, we have to decide what we’re optimizing. We could, for instance, aim to minimize the expected number of guesses it takes to get to the target word. This would involve playing with probabilities and occasionally taking measured risks. For instance, maybe such a strategy would boost the chances of guessing the target word using an impressive three guesses, but at the cost of risking failure even after six guesses in some cases.

I’ll take a different approach, that of a conservative player: we simply want to minimize the chance of failure. If we average five or six guesses, so be it, as long as we reliably get the morning rush of discovering the target word. Another way to think of this is playing as if we’re confronting an adversary, rather than a random word. Imagine there’s a person on the other end of Wordle, watching our strategy, and carefully choosing a target word to foil us. Then risk-taking as described above is certain folly. Instead, we should plan for the worst case scenario.

First things first, we’ll need a way to select only the words that are consistent with the constraints Wordle has given us thus far. When playing the game as a human, it’s typical to interpret the colors to build up a list of more abstract constraints: *“okay, so there’s at least one B, it can’t be in the fourth position, …“*. But for a computer, it’s much easier to just keep track of what colors the game has given us, and restrict the possibilities to words that would give those same colors on all our guesses if they were the target words. For instance, after the first two guesses above, we would have the data

In [13]:

```
guessData[guesses_, target_] := AssociationMap[w |-> tag[w, target], guesses];
gd = guessData[{"STARE", "CLEAN"}, "ABBEY"]
```

Out[13]:

<|STARE -> {-1, -1, 0, -1, 0}, CLEAN -> {-1, -1, 0, 0, -1}|>

The function `tag`

is defined in the downloaded notebook, and represents a green letter with a 1, a yellow letter with a 0, and a gray letter with a -1. We could then retrieve a list of candidate target words by

In [15]:

```
candidates[guessData_, words_] := Select[words, candidate |-> AllTrue[Keys @ guessData, tag[#, candidate] == guessData[#] &]]
candidates[gd, words]
```

Out[15]:

{ABBEY, ADIEU, AIDED, BAKED, DAZED, FADED, FAMED, FAZED, FOVEA, GAMEY, GEMMA, HOVEA, > JADED, JAWED, MAMEY, MAZED, PAVED, WAXED}

Now we can start asking some concrete questions. Any starting word will reduce the pool of candidates for any target word. In some cases we could get lucky and the reduction is dramatic (for instance, if we were to guess the target word right away). But remember, we’re imagining that we play against an adversary. This means we should pick a starting word and then put ourselves in the shoes of the adversary, and pick a target word that maximizes the number of remaining candidates. That is, for any given word `start`

, the adversary computes:

In [7]:

```
maxRemaining[start_, words_] :=
Last@SortBy[Table[{tup, Length[candidates[<|start -> tup|>, words]]}, {tup, Tuples[{-1, 0, 1}, 5]}], Last]
```

Note we are only maximizing over the possible colors, not the target words themselves, since two target words that give the same colors are equivalent for our purposes. Let’s see what we get for the two starting words we were looking at earlier:

In [17]:

```
Length[words]
maxRemaining["STARE", words]
maxRemaining["ORATE", words]
```

Out[17]:

4198

{{-1, -1, -1, -1, -1}, 419}

{{-1, -1, 0, -1, -1}, 338}

So indeed, `ORATE`

is a better starting word than `STARE`

, in terms of the worst-case reduction in the number of candidate words they leave behind. Both of them reduce the pool of words by a factor of at least 10, which is a surprise to me. Also, it may come as a surprise that when the player starts with `ORATE`

, it’s in the adversary’s best interest to pick a target word that contains an A. What happens if the target word doesn’t have an A? Then we have

In [20]:

```
Length[candidates[<| "ORATE" -> {-1, -1, -1, -1, -1} |>, words]]
```

Out[20]:

338

Ah, so in fact it was a perfect tie!

On this note, it’s hard to imagine the adversary could ever benefit by giving us a green letter. Let’s just make that assumption, because it speeds up this process considerably:

In [10]:

```
maxRemaining[start_, words_] :=
Last@SortBy[Table[{tup, Length[candidates[<|start -> tup|>, words]]}, {tup, Tuples[{-1, 0}, 5]}], Last]
```

Now we can afford to check several more words. Let’s look at the top 30 words we found using our letter-frequency heuristic, and see how they do according to this new metric:

In [24]:

```
contestants = First /@ Take[ReverseSortBy[scores, Last], 30];
ranking[cs_, words_] := With[{data = SortBy[
Table[With[{mr = maxRemaining[w, words]}, {WordlePlot[w,
First@candidates[<|w -> mr[[1]]|>, words]],
Style[mr[[2]], FontSize -> 36,
FontFamily -> "Clear Sans"]}], {w, cs}], Last]},
TableForm[data[[;;5]], TableSpacing -> {5, 10},
TableHeadings -> {(Style["#" <> ToString@#, 24,
FontFamily -> "Clear Sans"] & /@ Range[5]), None}]
];
ranking[contestants, words]
```

Out[24]:

It seems like `RAISE`

is a significant cut above the rest, reducing the worst-case candidate pool by 42 relative to the next-best starting word, `IRATE`

. Another trend that jumps out is that it’s occasionally beneficial for the adversary to give a yellow A, but otherwise the adversary will choose to make all letters gray. So, to be sure we’re not missing any missing any potentially strong starting words, let’s sort all of the words in order of how much they reduce the pool when all the letters are gray and then score those:

In [28]:

```
nonOverlapping[w1_, w2_] :=
Length[Intersection[Characters[w1], Characters[w2]]] == 0;
grayScore[w_, words_] := Length[Select[words, nonOverlapping[#, w] &]];
newContestants = SortBy[words, grayScore[#, words] &][[;; 30]];
newContestants[[;;5]]
ranking[newContestants, words]
```

Out[28]:

Good, so we haven’t missed anything. `ALOES`

does well if all the letters are gray, but in that case the adversary can actually do much better by making the A yellow, so it doesn’t make our worst-case scenario ranking.

Great, so we have our starting word, `RAISE`

. Once we play it, we know we’ll be down to no more than 286 candidate words. Where do we go from here?

In principle, we’re allowed to choose any of the 4000+ five-letter words as our next guess. In some cases, though, it would be ridiculous to guess a word that’s not among the candidate target words. This is definitely true if all the letters from the starting word came up gray: once a letter comes up gray, we know it will come up gray in any other word, so that’s a waste of a letter. For green letters, though, the situation is reversed: if we play that same letter in that same position again, we know it will come up green, so why bother, unless only one candidate word remains and we’re ready to finish the game?

To avoid these puzzles, let’s play Wordle as if we’re in “hard mode”. This is a setting that requires you to use what you’ve learned from previous guesses. So, if a letter comes up gray, we can’t use it in the next guess; but if a letter comes up green, we must keep that letter in place in all future guesses. It’s unclear to me whether this is actually “hard”, in the sense that it makes it more difficult to reliably guess the word in six tries, but it does make the search space smaller, so we’ll play by this rule.

So, for starters, let’s imagine we played `RAISE`

– our best move – and the adversary makes all the letters gray, her best move. Touché, digne adversaire. We should then repeat the same analysis: which of the remaining candidate words minimizes the maximum possible number of words left over after we guess it?

We’ll use the same trick as before: sort by the number that would remain if our second guess also comes up all gray, and then check for adjustments if we include yellows.

In [32]:

```
raiseWords = candidates[<| "RAISE" -> {-1, -1, -1, -1, -1} |>, words];
raiseContestants = SortBy[raiseWords, grayScore[#, raiseWords] &][[;; 30]];
ranking[raiseContestants, raiseWords]
```

Out[32]:

Perhaps shockingly, by playing the ideal word `HOTLY`

, we have again reduced the number of possibilities by more than a factor of 10!

Also, it shouldn’t be surprising here that the adversary benefits by giving us some yellows. We’re assuming that `RAISE`

came up all gray, so there must be either an O or a U in the target word, for starters.

Let’s play `HOTLY`

, and again let the adversary make her best reply, a yellow O and L. Then there are only 15 remaining possibilities, so we might as well just rank them all (with a slight adjustment to the ranking function to now look at adversary words directly rather than color assignments):

In [35]:

```
raiseHotlyWords =
candidates[<|"HOTLY" -> {-1, 0, -1, 0, -1}|>, raiseWords];
ranking[words_] := With[{data = SortBy[
Table[
With[{mr =
Last@SortBy[
Table[{w2,
Length[candidates[<|w -> tag[w, w2]|>, words]]}, {w2,
words}], Last]}, {WordlePlot[w,
mr[[1]]],
Style[mr[[2]], FontSize -> 36,
FontFamily -> "Clear Sans"]}], {w, words}], Last]},
TableForm[data[[;; 4]], TableSpacing -> {5, 10},
TableHeadings -> {(Style["#" <> ToString@#, 24,
FontFamily -> "Clear Sans"] & /@ Range[5]), None}]
];
ranking[raiseHotlyWords]
```

Out[35]:

Great, so we play `BLOCK`

:

In [37]:

```
raiseHotlyBlockWords =
candidates[<|"BLOCK" -> {1, 1, 1, -1, -1}|>, raiseHotlyWords];
ranking[raiseHotlyBlockWords]
```

Out[37]:

Now our best move is to play either `BLOOD`

or `BLOND`

, and in either case only one word remains, namely `BLOWN`

. So in other words, here is the perfect game of Wordle, where the player always minimizes the number of remaining words, and the adversary always maximizes this number:

In [38]:

```
WordlePlot[{"RAISE", "HOTLY", "BLOCK", "BLOOD", "BLOWN"}, "BLOWN"]
```

Out[38]:

That’s five guesses, so we win!

This probably means that Wordle is always solvable in five tries. I say probably because technically, our analysis here wasn’t complete. We only allowed the player and adversary to think “one move ahead”. Perhaps the adversary could do even a bit better by letting the set of possible words shrink more than it needs to, but in a way that makes it more difficult for the player to shrink it further on the next turn. I somewhat doubt this, but we can’t rule out the possibility. However, it seems quite far-fetched for such gamesmanship on the part of the adversary to increase the number of turns all the way to 7. It looks like Wordle is always solvable in six tries, or maybe only five.

]]>and the first few are \(1, 1, 2, 5, 14, 42, \ldots\). They are sequence A000108 in the Online Encyclopedia of Integer Sequences, and that very low call number might be another indication of their importance.

To give just one of the \(\ge 66\) interpretations of these numbers, consider the problem of balanced parentheses. We would like to know, given \(n\) pairs of parentheses, how many ways they can be arranged to give a balanced expression. When \(n = 1\), there is only one way, \(()\). For \(n = 2\), we have two choices, \(()()\) and \((())\). For \(n = 3\), five choices:

\[((())), (())(), ()(()), (()()), ()()()\]And for \(n = 4\), there are fourteen choices:

In [2]:

```
balanced[n_] := Select[Permutations[Flatten[Table[{1, -1}, n]]],
x |-> AllTrue[Accumulate[x], # >= 0 &]];
StringJoin @@@ (balanced[4] /. {1 -> "(", -1 -> ")"})
Length[%]
```

Out[2]:

{()()()(), ()()(()), ()(())(), ()(()()), ()((())), (())()(), (())(()), (()())(), > (()()()), (()(())), ((()))(), ((())()), ((()())), (((())))}

14

Note the approach we took in this last case: we think of “(“ as +1 and “)” as -1, and then make sure that as we read through the expression, we never dip below 0, that is, we never close a parenthesis before it is opened. This is precisely what it means to have a balanced expression. We can use this idea to give a graphical interpretation to balanced parentheses, by plotting the partial sums that are used in the code snippet above. Here are those plots for \(n = 3\):

In [9]:

```
balancedPlot[xs_] :=
ListLinePlot[Prepend[Accumulate[xs], 0],
PlotLabel ->
MaTeX[StringJoin[xs /. {1 -> "(", -1 -> ")"}],
Magnification -> 1.5], PlotStyle -> Directive[Red, Thick],
Axes -> False, Frame -> True, FrameStyle -> Black,
PlotRange -> {{1, Length[xs] + 1}, {0, Length[xs]/2}},
FrameTicks -> None,
GridLines -> {Range[Length[xs]], Range[Length[xs]]},
AspectRatio -> 1/2];
GraphicsRow[balancedPlot /@ balanced[3]]
```

Out[9]:

Let’s prove that these counts are given by the Catalan numbers. Let \(a_n\) be the number of balanced expressions of \(n\) pairs of parentheses. To write down a balaned expression, we always start by writing down a left parenthesis. Eventually that left parenthesis has to be closed, and then we can add more parentheses afterwards. So a balanced expression of \(n\) pairs has the following form:

This leads to the recursion

\[a_n = \sum_{m=0}^{n-1} a_m a_{n-m-1},\qquad a_0 = 1.\]To solve the recursion, we can use a generating function. Let

\[f(x) = \sum_{n=0}^\infty a_n x^n.\]Then, formally,

\[f^2 = \sum_{n=0}^\infty \left(\sum_{m=0}^n a_m a_{n-m}\right) x^n.\]Comparing with the recursion relation above, we have

\[f = 1 + xf^2,\]and solving this gives

\[f = \frac{1 - \sqrt{1 - 4x}}{2x}.\]We can check quickly that we have the correct generating function:

In [11]:

```
f[x] := (1-Sqrt[1-4x])/(2x);
Series[f[x], {x, 0, 5}]
```

Out[11]:

Better yet, we can use the generating function to find an explicit form for \(a_n\). We have

\[\sqrt{1-4x} = (1-4x)^{1/2} = \sum_{n=0}^\infty (-4)^n \binom{1/2}{n} x^n = 1 - \sum_{n=1}^\infty \frac{2}{n}\binom{2n-2}{n-1} x^n.\]This gives

\[f = \sum_{n=0}^\infty \frac{1}{n+1}\binom{2n}{n} x^n,\]so indeed \(a_n = C_n\).

The Catalan numbers appear in a somewhat surprising way in the context of eigenvalue perturbation theory. Michael Scheer and I noticed this and had lots of fun working it out. We eventually realized that this result appears in an old paper of V. I. Arnol’d, cited above. Instead of borrowing Arnol’d’s succinct discussion, though, I’ll outline how Michael and I came to the same ideas. This should give an idea of how a mere mortal (i.e., not Arnol’d) could notice these patterns, and it also comes with a useful visual mnemonic for the terms in eigenvalue perturbation theory.

As any quantum mechanic knows, determining the dynamics of a quantum system boils down to solving an eigenvalue problem of the form

\[H\ket{\psi_n} = E_n\ket{\psi_n},\]where \(H\) is the Hamiltonian operator for the system and \(E_n\) is an energy eigenvalue. The time evolution of the eigenstates is then given by

\[\ket{\psi_n(t)} = e^{-iE_n t/\hbar}\ket{\psi_n(0)},\]and so by decomposing the initial state into the energy eigenstates and using this result, we can find the time evolution of any initial state.

Cool. So we need to solve eigenvalue problems. Unfortunately, this is quite hard in general. For a finite-dimensional system, we at least have the option of calculating all the matrix elements \(H_{ij} = \braket{\phi_i\vert H\vert \phi_j}\) and computing the eigenvalues numerically. For an infinite-dimensional system, we can’t even fall back on this brute force approach; we need some more clever way of getting at the spectrum of the Hamiltonian.

A common approach is perturbation theory. If the Hamiltonian can be expressed as a small correction to a Hamiltonian whose spectrum we know, then the new spectrum should be a small correction to the old spectrum. Let’s see how this works out.

We write \(H = H_0 + \epsilon V\), and assume we know the eigenvalues \(E_n^{(0)}\) and normalized eigenvectors \(\ket{n^{(0)}}\) of \(H_0\):

\[H_0 \ket{n^{(0)}} = E_n^{(0)} \ket{n^{(0)}}.\]We also assume all the \(E_n^{(0)}\) are distinct. Then, let \(E_n\) and \(\ket{n}\) be the eigendata of \(H\):

\[H\ket{n} = E_n \ket{n}.\]We can write these as series in \(\epsilon\):

\[E_n = E_n^{(0)} + \epsilon E_n^{(1)} + \ldots, \qquad \ket{n} = \ket{n^{(0)}} + \epsilon \ket{n^{(1)}} + \ldots.\]Substituting this into the eigenvalue equation and using \(H = H_0 + \epsilon V\), we find

\[E_n^{(0)}\ket{n^{(0)}} + \epsilon V \ket{n^{(0)}} + \epsilon H_0 \ket{n^{(1)}} + \mathcal{O}\left(\epsilon^2\right) = E_n^{(0)}\ket{n^{(0)}} + \epsilon E_n^{(1)}\ket{n^{(0)}} + \epsilon E_n^{(0)}\ket{n^{(1)}} + \mathcal{O}\left(\epsilon^2\right).\]Canceling the first terms and acting with \(\bra{n^{(0)}}\), we find at order \(\epsilon\) that

\[\braket{n^{(0)}|V|n^{(0)}} + \braket{n^{(0)}|H_0|n^{(1)}} = \braket{n^{(0)}|E_n^{(1)}|n^{(0)}} + \braket{n^{(0)}|E_n^{(0)}|n^{(1)}}.\]The second terms cancel because \(\bra{n^{(0)}}H_0 = \bra{n^{(0)}}E_n^{(0)}\), and so we find

\[E_n^{(1)} = \braket{n^{(0)}|V|n^{(0)}}.\]We can then insert this back into the equation above, giving

\[V \ket{n^{(0)}} + H_0 \ket{n^{(1)}} = \braket{n^{(0)}|V|n^{(0)}}\ket{n^{(0)}} + E_n^{(0)}\ket{n^{(1)}}.\]Acting with \(\bra{m^{(0)}}\), we find

\[\braket{m^{(0)}|V|n^{(0)}} = (E_n^{(0)}-E_m^{(0)})\braket{m^{(0)}|n^{(1)}}.\]Normalization of \(\ket{n}\) requires \(\braket{n^{(0)}\vert n^{(1)}} = 0\), and we can solve for all the other components of \(\ket{n^{(1)}}\) from the above, so we find

\[\ket{n^{(1)}} = \sum_{m\neq n} \frac{\braket{m^{(0)}\vert V\vert n^{(0)}}}{E_n^{(0)}-E_m^{(0)}}\ket{m^{(0)}}.\]To find the higher-order corrections, the typical approach followed in textbooks is to just keep on chugging: look at the \(\epsilon^n\) terms in the eigenvalue equation, substitute what we found for the \((n-1)\)th order terms, and solve for the \(n\)th order terms. This is a massive pain. Life gets much easier when we realize that \(E_n^{(1)}\) and \(\ket{n^{(1)}}\) are just the derivatives of \(E_n\) and \(\ket{n}\) with respect to \(\epsilon\), evaluated at \(\epsilon = 0\). We can use these derivatives to find any higher terms. For example,

\[E_n^{(2)} = \frac{1}{2}\left.\frac{dE_n^{(1)}}{d\epsilon}\right|_{\epsilon = 0} = \frac{1}{2}\left(\frac{d}{d\epsilon}\braket{n|V|n}\right)_{\epsilon = 0}\]Using our expression above to differentiate \(\ket{n}\), we find

\[E_n^{(2)} = \sum_{m\neq n} \frac{\braket{n^{(0)}|V|m^{(0)}}\braket{m^{(0)}|V|n^{(0)}}}{E_n^{(0)}-E_m^{(0)}}.\]What if we want \(E_n^{(3)}\)? Then we take another derivative, which requires using our rules to differentiate every state and every energy appearing above. This quickly gets messy, but it can be done. Later we’ll show some tricks to make this process much easier.

As an example of applying perturbation theory, consider the Hamiltonian of a harmonic oscillator,

\[H_0 = \frac{p^2}{2m} + \frac{1}{2}m\omega^2 x^2.\]It is well-known that if we write

\[x = \sqrt{\frac{\hbar}{2m\omega}}\left(a^\dagger + a\right), \qquad p = i\sqrt{\frac{\hbar m\omega}{2}}\left(a^\dagger - a\right)\]then we find

\[H = \hbar\omega\left(a^\dagger a + \frac{1}{2}\right).\]We then define the vacuum by \(a\ket{0} = 0\), and the normalized energy eigenstates are given by

\[\ket{n} = \frac{1}{\sqrt{n!}} \left(a^\dagger\right)^n\ket{0}\]with \(H_0 \ket{n} = \hbar\omega\left(n+\frac{1}{2}\right)\ket{n}\).

This solves the quantum harmonic oscillator problem. What if we then want to add an anharmonic term, such as \(H = H_0 + \epsilon x^k\)? To do perturbation theory, we first need to compute the matrix elements of \(V = x^k\) in the basis of energy eigenstates. The easiest way to do this is to rewrite \(x\) in terms of the ladder operators, and use the fact (which can be shown from the definitions and canonical commutation relations) that \(\left\lbrack a, a^\dagger\right\rbrack = 1\). For instance:

\[\braket{2|\left(a^\dagger + a\right)^2|2} = \braket{2|\left(a^\dagger a + a a^\dagger\right)|2} = \frac{1}{2}\braket{0|a^2\left(a^\dagger a + a a^\dagger\right)\left(a^\dagger\right)^2|0} = \frac{1}{2}\braket{0|a^2\left(-1 + 2a a^\dagger\right)\left(a^\dagger\right)^2|0} = 5,\]where in the last step we use the normalization of \(\ket{2}\) and \(\ket{3}\). In general, it’s easy to show that the expectation value of this operator for \(\ket{n}\) is \(2n+1\).

We can easily automate this process, by telling Mathematica how to move annihilation operators to the right (a process called normal-ordering):

In [6]:

```
VEV[x___, a, ad, y___] := VEV[x, y] + VEV[x, ad, a, y];
State[x___, a, ad, y___] := State[x, y] + State[x, ad, a, y];
ConjugateState[x___, a, ad, y___] :=
ConjugateState[x, y] + ConjugateState[x, ad, a, y];
VEV[ad, ___] = 0; ConjugateState[ad, ___] := 0;
VEV[___, a] = 0; State[___, a] := 0;
VEV[] = 1;
MatrixElements[op_, states_] :=
Table[With[{rhs = ExpandNCM[op ** state]},
Table[ExpandNCM[conj ** rhs], {conj, HermitianConjugate /@ states}]
], {state, states}]
```

Note that there’s lots of other behind-the-scenes code here, download the notebook to see it all. Now let’s make ourselves a basis of the first few eigenstates and get the matrix elements of \(x^2 \propto \left(a^\dagger + a\right)^2\):

In [36]:

```
op[n_] :=
NonCommutativeMultiply @@ Table[Operator[ad] + Operator[a], n];
states = Table[1/Sqrt[n!] State @@ Table[ad, n], {n, 0, 5}];
MatrixElements[op[2], states] // MatrixForm
```

Out[36]:

As expected, we see the values \(2n+1\) along the diagonal. We also find

\[\braket{n+2|\left(a^\dagger + a\right)^2|n} = \sqrt{(n+2)(n+1)},\]which is simple to prove.

We don’t have to use perturbation theory to add an \(x^2\) potential, because in this case we still have a harmonic oscillator. With an extra \(V = \frac{\epsilon}{2}m\omega^2 x^2\), we effectively change \(\omega \mapsto \omega\sqrt{1+\epsilon}\), and so the energy levels become

\[E_n = \hbar\omega\sqrt{1+\epsilon}\left(n + \frac{1}{2}\right) = E_n^{(0)}\left(1 + \frac{1}{2}\epsilon - \frac{1}{8}\epsilon^2 + \ldots\right).\]Since we know this, though, we can check that perturbation theory gives us the right answer. We have

\[V = \frac{1}{2}m\omega^2 x^2 = \frac{1}{4}\hbar\omega \left(a^\dagger + a\right)^2.\]Thus,

\[E_n^{(1)} = \frac{1}{2}\hbar\omega\left(n+\frac{1}{2}\right),\]as expected. Better yet,

\[E_n^{(2)} = \frac{\braket{n|V|n+2}\braket{n+2|V|n}}{E_n-E_{n+2}} + \frac{\braket{n|V|n-2}\braket{n-2|V|n}}{E_n-E_{n-2}} = \frac{(\hbar\omega)^2}{16}\left(\frac{(n+2)(n+1)}{-2\hbar\omega} + \frac{n(n-1)}{2\hbar\omega}\right) = -\frac{1}{8}\hbar\omega\left(n+\frac{1}{2}\right).\]So indeed, perturbation theory is giving us the term by term expansion of the exact energy eigenvalues for the adjusted Hamiltonian.

To get our inspiration for dealing with perturbation theory at all orders, we need to compute \(E_n^{(3)}\). Recall that this means differentiating

\[E_n^{(2)} = \sum_{m\neq n} \frac{\braket{n|V|m}\braket{m|V|n}}{E_n-E_m}.\]Let’s start with the eigenstates. When we hit the \(\ket{n}\) or the \(\bra{n}\), we get an additional sum over states which are not \(n\):

\[\sum_{m\neq n} \sum_{k\neq n}\frac{\braket{n|V|m}\braket{m|V|k}\braket{k|V|n}}{(E_n-E_m)(E_n-E_k)} + \sum_{m\neq n} \sum_{k\neq n}\frac{\braket{n|V|k}\braket{k|V|m}\braket{m|V|n}}{(E_n-E_m)(E_n-E_k)} = 2\sum_{m\neq n} \sum_{k\neq n}\frac{\braket{n|V|m}\braket{m|V|k}\braket{k|V|n}}{(E_n-E_m)(E_n-E_k)}.\]When we hit the \(\ket{m}\) or \(\bra{m}\), the sum ranges instead over states which are not \(m\):

\[\sum_{m\neq n} \sum_{k\neq m}\frac{\braket{n|V|m}\braket{m|V|k}\braket{k|V|n}}{(E_n-E_m)(E_m-E_k)} + \sum_{m\neq n} \sum_{k\neq m}\frac{\braket{n|V|k}\braket{k|V|m}\braket{m|V|n}}{(E_n-E_m)(E_m-E_k)}.\]If we swap the indices \(m\) and \(k\) in the second sum, the numerators become identical, and combining the denominators we end up with

\[\frac{\braket{n|V|m}\braket{m|V|k}\braket{k|V|n}}{E_m-E_k}\left(\frac{1}{E_n-E_m}-\frac{1}{E_n-E_k}\right) = \frac{\braket{n|V|m}\braket{m|V|k}\braket{k|V|n}}{(E_n-E_m)(E_n-E_k)}.\]Of course, this algebra only works for \(k\neq n\), so we should remove those terms from both sums. In total, then, the terms from differentiating \(\ket{m}\) and \(\bra{m}\) are

\[\sum_{m\neq n}\sum_{k\neq n, m} \frac{\braket{n|V|m}\braket{m|V|k}\braket{k|V|n}}{(E_n-E_m)(E_n-E_k)} - 2\sum_{m\neq n}\frac{\braket{n|V|n}\braket{n|V|m}\braket{m|V|n}}{(E_n-E_m)^2}.\]We then almost have another copy of the double sum above; the only problem is that we’re missing the terms with \(k = m\). But very conveniently, the precise terms we need to compensate are generated when we differentiate the denominator:

\[\sum_{m\neq n} \frac{\braket{n|V|m}\braket{m|V|n}(\braket{m|V|m}-\braket{n|V|n})}{(E_n-E_m)^2}.\]The other term combines nicely with what we just had to subtract off. Gathering all this up, we have in total

\[E_n^{(3)} = \frac{1}{3}\d{E_n^{(2)}}{\epsilon} = \sum_{m\neq n}\sum_{k\neq n} \frac{\braket{n|V|m}\braket{m|V|k}\braket{k|V|n}}{(E_n-E_m)(E_n-E_k)} - \sum_{m\neq n}\frac{\braket{n|V|n}\braket{n|V|m}\braket{m|V|n}}{(E_n-E_m)^2}.\]Somewhat miraculously, the terms have arranged themselves into a particular form: in the numerators, we have a cyclic arrangement of matrix elements of \(V\). In the denominator, we have a product of differences between \(E_n\) and other energy levels. And all of these other energy levels are simply summed over states not equal to \(n\).

The appearance of these cyclic products of matrix elements makes it very tempting to draw some circles. As a first draft, we’ll do it like this: arrange \(k\) points on a circle, and then draw \(k-1\) directed edges between them, such that no point has both an incoming and an outgoing edge. We interpret every point with outgoing edges, the “sources”, as copies of \(\ket{n}\); every point with incoming edges is a sum over \(m_i\neq n\). The edges are energy differences in the denominator. Finally, for convenience, we add an overall factor of -1 if there is an even number of sources. In this notation, the first three energy corrections are

It turns out that all terms in perturbation theory, to all orders, can be written in terms of diagrams of this form. Proving this is an exercise in careful bookkeeping, taking derivatives of arbitrary diagrams and playing games like those above to resum into new diagrams. The end result is the following two rules for differentiating sources and sinks diagrammatically:

Now for the fun part: we can automate these rules and look at as many orders of perturbation theory as we’d like. We represent a diagram by a `ChordGraph`

containing, for each point on the circle, a list of indices of the points to which it is connected by an edge (for sources) or an empty list (for sinks).

In [9]:

```
DFirst[x : ChordGraph[{}, a___]] :=
DualChordGraph@DFirst@DualChordGraph[x];
DFirst[x : ChordGraph[{b__}, a___]] :=
ChordGraph[{b, Length[{a}] + 2}, a, {}] +
ChordGraph[{}, a, {b, 1}] +
With[{sb = Sort[{b}]},
Sum[ChordGraph[sb[[;; n]], a, sb[[n ;;]]], {n, Length[{b}]}]];
DFirst[ChordGraph[{}]] := 2 ChordGraph[{}, {1}];
DChordGraph[x_ChordGraph] :=
Total[DFirst /@
NestList[RotateChordGraph[#, 1] &, x, Length[x] - 1]];
DChordGraph[expr_] := expr /. x_ChordGraph :> DChordGraph[x];
```

The `DualChordGraph`

function (not shown) swaps the direction of all edges; to differentiate a sink, we differentiate the corresponding source in the dual graph and then take another dual.

Now we can repeatedly differentiate the first correction and see what diagrams we get:

In [49]:

```
expansions =
NestList[Expand@*CanonicalChordGraph@*DChordGraph, ChordGraph[{}],
3];
expansions //TableForm //TraditionalForm
```

Out[49]: