22 Apr 21

Review of The Information

All disciplines have interesting histories that explain their development. For some reason, different studies seem to ‘value’ their history differently. Art and music students are required to study art and music history with multiple dedicated classes. Computer science and mathematics students do not have to study math or computer science history in a typical undergraduate program (if you are lucky, maybe one or two classes). Instead, if the computer science or math student is lucky, they get a charismatic professor who is a good story teller and fits in anecdotes about the creators of the topics being studied, or a reference to a book about this.

Presumably, the history of art is useful to an artist, not only for telling interesting stories to their students one day, but also to understand something deeper about where new art comes from. A deeper understanding may be useful for a number of things, including how to go about creating novel art. The creation of new abstract concepts in math and art have certain high-level similarities – they provide a new way to look at things. Perhaps the poster child for this type of thinking is Xenakis, whose existence united architecture, statistics, algorithms, and music, where knowledge across the fields had an interesting synergy. But even if we ignore cross-discipline examples, I think we will find that the innovators typically have had an interest in the history. Is the assumption for the sciences that this is correlation, and not causation? That argument can be made, but it seems less likely to me. Or is the assumption that in the interest of time, most undergraduate students don’t need to be innovators, and rather just need to understand how to solve the damn equation, not how it came to be?

Perhaps this is a straw-man. I may be generalizing from an unrepresentative experience, and it has been a number of years since I was in school. It seems like folks pursuing graduate degrees in math/computer science have more understanding of history, and because this information is easier to come by today than 20 years ago, people that self-study also pick this up naturally. In that case all I can say is that I was not aware of so much of the history involved in computer science, and I wish I had started my studies with something like James Gleick’s The Information.

Information theory is a relatively new field of the sciences. Of course, it did not spring out of nowhere. There are a few history-oriented books that describe its formation, but there is not much. Gleick’s coverage is by far the widest I’ve seen.

The book has an excellent cast of characters, starting out with long distance communication with African drums, Babbage/Lovelace and early computers, and Laplace. As the book develops the more typical founding characters in Information theory appear, with Maxwell and his heat daemons, Clausius and his entropy, Morse with his codes, Hartley, Shannon, Wiener, Turing, and Kolmogorov. What makes the book’s presentation special is the depth in which each character is gone into. There are a large amount of supporting characters and competitors that I hadn’t heard of, which provides great context for the developments. Naturally, a large amount of time is spent of the juicy rivalries such as the Shannon-Wiener relationship, but also how it fit into the rest of the world, e.g., how Kolmogorov felt about them.

I was introduced to a range of new connections that I was not aware of, including the Schrodinger (yes, that Schrodinger) connection to molecular biology and What is Life?. There were also nice teasers for the parts of info theory I haven’t had exposure to such as Quantum computing and Schorr/Feynman’s thoughts on it. There also deeper ties to fundamental math history such as the early developments in greek and arabic states from Aristotle to al-Khwarizmi. I was also unaware of the amount of obsoleted infrastructure required for telegraph networks, and the book spends a good time talking about the logistics of this kind of thing.

I very much enjoyed this book, although it still misses a few important areas. Notably, Kullback’s application of information theory to statistics, as well as Bayesian statistics and the related information criteria are not mentioned. Deep learning is also not mentioned, but the book was published in 2011, before the recent surge. Naturally, Gleick also discusses the fictional works of Borges. Unfortunately as much as I enjoy Borges, I found this to be the weakest part of the book.

At 426 pages, Gleick’s presentation is almost entirely conceptual and non-technical, so I think this would be an great bedtime read for anyone interested in the topic that isn’t in a rush. For faster and a more technical approach, one might consider John Pierce’s book.