(Michael Chinen) » 2020

12 Dec 20
16:22

Log probability, entropy, and intuition about uncertainty in random events

Probability is a hard thing for humans to think about. The debate between Bayesian and orthodox (frequentist) statistics around the relationship of event frequency and probability makes that clear. Setting that aside, there are a whole bunch of fields that care about log probability. Log probability is an elemental quantity of information theory. Entropy is a measure of uncertainty, and at the core of information theory and data compression/coding. Expected log probability is entropy:
$H(X) = -\sum_x p(x) \log p(x) = -\mathop{\mathbb{E}} \log p(x)$

For a uniform distribution this can be simplified even further. A fair coin toss, or die throw, for example, has uniform probability of heads/tails or any number, and we get:
$H(X) = -\sum \frac{1}{n} \log \frac{1}{n} = -n \frac{1}{n} \log \frac{1}{n} = -\log \frac{1}{n} = \log{n},$
where $n = 2$ for a coin flip and $n = 8$ for a 8-sided die (because there are 2 and 8 possible values, respectively). So with a base-2 log, we get $H(X_{coin}) = \log{2} = 1$ for the coin flip, and $H(X_{d8}) = \log{8} = 3$ for a 8-sided die.

One thing that might not be immediately obvious is that this allows us to compare different types of events to each other. We can now compare the uncertainty in a coin flip and the uncertainty in a 8-sided die. $H(X_{d8}) = 3 H(X_{coin}),$ so it takes 3 coin flips to have the same uncertainty. In fact, this means you could simulate an 8-sided die with 3 coin flips, but not 2 coin flips with some sort of tree structure: the first flip determines if the die is in 1-4 or 5-8, the next if it is in 1-2 or 3-4 if the first flip was heads, or 5-6 vs 7-8 for tails on the first flip, and the last flip resolves which of those two numbers the die ends up on.

You probably would have been able to come up with this scheme to simulate die throws from coinflips without knowing about how entropy is formulated. I find this interesting for a couple reasons. First, this means there may be something intuitive about entropy that we have in our brains that we can dig up. The second is that this gives us a formal way to verify and check what we intuited about randomness. For this post I want to focus on intuition.

The first time you are presented with entropy, you might wonder why we take the log of probability. That would be a funny thing to do without a reason. Why couldn’t I say, ‘take your logs and build a door and get out, I’ll just take the square or root instead and use that for my measure of uncertainty’, and continue with my life? It turns out there are reasons. I wanted to use this post to capture those reasons and the reference.

If you look at Shannon’s A Mathematical Theory of Communication, you will find a proof-based solution that’s quite nice. Even after looking at it if you haven’t looked at convexity-based proof in a while, it can still be somewhat unintuitive why there needs to be a logarithm involved. Here is a less formal and incomplete explanation that would have been useful for me to get more perspective on the problem.

There are a few desirable properties of entropy that Shannon describes. For example, entropy should be highest when all the events are equally likely. Another one of them is how combining independent events like coin flips or dice rolls combine the possibilities. This means that the number of outcomes is exponential on the number coin flips or die rolls. So if I compute the entropy of one coin flip and another coinflip, and add them together, the sum should be the same value if I were to compute a single entropy on those two coinflips together.

If you want a measure that grows linearly on number of coin flips or die rolls that achieves this property, then taking the logarithm of the number of combinations gives you just that. No other function that isn’t a logarithm will do that. This is because the number of possibilities of n coinflips is exponential. Notice that $\log_2(2^n) = n$ , where $n$ is the number of coin tosses and $2^n$ are the possible outcomes for $n$ coin tosses). So the log inverts the exponent added by the combinations of multiple events, which gives entropy the linear property on $n$ .

07 Dec 20
22:11

Nostalgia, Music, and Utility Functions

We accept that certain events that happened in the early course of our lives influence us with a permanence that lives on in our identity. For many people music will be one of those events. In the opening of High Fidelity the main character sarcastically warns about the dangers of music for kids. In this post I want to consider the after effects (such as nostalgia) for music after the novelty wears off. “Sentimental music has this great way of taking you back somewhere at the same time that it takes you forward, so you feel nostalgic and hopeful all at the same time.” — also from High Fidelity, and closer to today’s topic.

The older I get, the less surprises there seem to be. I probably should restate that. There are still many surprises, but the intensity and the excitement — the amount something grips me — seems to go down with age. This is a bit depressing if you look at it one way, but I don’t think it has to be. Rather it seems like it is a natural consequence of understanding more about the world. The most obvious analogue is film and literature – at first you can start with the classics and they are all amazing. Then eventually you get to a point where you can see how those classics influenced the newer works, getting more perspective and insight into the world. This is quite an interesting feeling too. But the novel spark of that first foray into the arts is really something special.

Where did those bright eyed, wonderful times go? I’m reminded of them most viscerally when I put on music. Sometimes I talk or write about experimental computer music. But the particular music that does this for me is only a few specific bands and genres that are now very far from me culturally – the teenage and college years of grunge, pop-punk, emo, and indie rock, from The Descendents to the popular garden state soundtrack. Near 40, I still feel a certain strong and invincible euphoria when listening to these genres, even though it’s been years since I listened. If I get into it and shake my head and air-drum it is even better. The music has wired up a strong pathway for belonging and excitement in my brain and then neurons were set to be read only, to last for much longer than I would have thought. When I am 80, will it still be there?

At the same time, the act of listening to this music feels more than a bit weird. Perhaps it feels masturbatory or escapist. There is a disconnect between who I am now and who I was then, and the music is an instant portal that revives those youthful brain pathways.

And if we take it one step further, where did those bright eyes and times of wonder come from? The act of falling in love with a (sub)-culture because of someone you loved is one of the slyest and most curious phenomenon that is near the heart of western individualism. The association with a beautiful face that you might get to see if you go to the next concert. Culture is much more than a sexually transmitted meme. The acceptance from dressing a certain way and having a crew is really powerful, and dancing in unison to a beat is a primal joy even if the footsteps are different. For me, it was always the elusive nature and the promise that glory was just around the corner. The Cat Power song goes “It must just be the colors and kids that keep me alive, because the music is boring me to death”, which is something that a popular artist might be able to say after overwhelming success. But I was always on the outskirts, never really feeling in the middle of the culture that I loved. The colors, the kids, they were great, but the thing the lasted for me was the music. And I think, just as with the friendship paradox, this is the case for most people.

What would it take to find something like that again? Is it a matter of age, hormones, and college or can we recreate that somehow? Would I even want to if I could? Why can’t I, say, take these nostalgic feelings from music that I grew out of and ‘install’ them into music and things I am interested in now?

This is related to rewriting your utility function, which is a very scary thing that will literally annihilate your identity if you get it wrong. Just as we are programmed through evolution to love doing certain things – to feel joy from the wind going through your hair on a good walk, we are programmed through society and culture to a lesser extent during our developmental youth. But we do not get to swap the way we feel about walking through nature and thinking about high dimensional spaces. The latter has an intellectual pleasure for sure, but it is less ‘natural’. If we could do this, the world would be very different in very little time. I am not sure if we would converge or diverge as a species, but I would posit that if you could make being good at your job feel like dancing for the average person, productivity would blow up globally and previous social problems would disappear in place of newer probably scarier problems. But this is a rabbit hole that has many tropes and discussions such as wireheading that I would like to leave aside for now. I just want to note that it seems that technology is still a ways off from this, but this feeling I get when I listen to the right music is a sneak preview. It is frightening and feels great at the same time.