Piano tuners and superforecasting

December 28, 2022

This post is part of Tailwit Capital, my Substack newsletter which you can subscribe to here.

At one point in my reading of Superforecasting: The Art and Science of Prediction by Philip E. Tetlock and Dan Gardner, I came across a concept where, as soon as the concept sank in, I felt like I could've just closed the book right there and I would've come away with the one most valuable takeaway (for myself personally) from the whole book. I finished the book anyway, but it only confirmed that I would've been right. So I'm going to explain that one concept here, and hopefully you'll leave this post either knowing you want to read more of the book, or knowing you don't need to because you got the best part.

I won't summarize the whole book here, as there are plenty of summaries you can find. The only backstory that's really interesting to know is that in the 2010s, IARPA (a research-oriented agency associated with the US intelligence community (which includes the CIA and many similar agencies)), presumably wanting to figure out how to get better at making predictions about what will happen in the world, created a forecasting tournament to get different research teams to experiment and try to find the best forecasting techniques. What's surprising (as Tetlock mentions) is that the US intelligence community willingly initiated a competition that they knew they themselves could lose, which does seem oddly sportsmanlike for the CIA. :)

The questions were all about the likelihood of specific events around the world within specific timeframes—examples from the book include:

"Will North Korea detonate a nuclear device before the end of this year?"
"In the next year, will any country withdraw from the eurozone?"
"Will the Kurdistan Regional Government hold a referendum on national independence this year?"

They're the kind of questions where you might expect the average layperson to know something about some of them, but you definitely wouldn't expect any layperson to have enough context in advance to make an informed forecast (with specific probabilities) about all of them. And the questions are such that it's perfectly expected that on many questions, any given person might start out with no context at all (like the Kurdistan question, for most people, I imagine).

I should also point out that the questions are all concrete and enforce predictions that are falsifiable, meaning that when the timeframe is up, everyone can more or less agree on whether the answer was yes or no, save for some potential hair-splitting e.g. on what constitutes a "nuclear device," what constitutes a "referendum" or "independence," etc, in cases where what happened in real life was ambiguous. But you can't "move the goalposts," e.g. by saying that your prediction just hasn't happened yet.

Tetlock and his partner Barbara Mellers started a team called the Good Judgment Project (GJP) that aggregated the forecasts of thousands of volunteers—untrained normies, basically—retirees and all!—who had a little bit of free time and a lot of curiosity—and, in short, they CRUSHED. (And by the way, performance on a given question was not correlated with whether one was an "expert" on that topic or knew nothing about it to begin with. In fact, if I recall correctly, "experts" tended to do a little worse, due to being overconfident and attached to their preconceptions.)

Here's an excerpt about how the tournament went down:

Each team would effectively be its own research project, free to improvise whatever methods it thought would work, but required to submit forecasts at 9 a.m. eastern standard time every day from September 2011 to June 2015. By requiring teams to forecast the same questions at the same time, the tournament created a level playing field—and a rich trove of data about what works, how well, and when. Over four years, IARPA posed nearly five hundred questions about world affairs [...] with the vast majority of forecasts extending out more than one month and less than one year. In all, we gathered over one million individual judgments about the future.

In year 1, GJP beat the official control group by 60%. In year 2, we beat the control group by 78%. GJP also beat its university-affiliated competitors, including the University of Michigan and MIT, by hefty margins, from 30% to 70%, and even outperformed professional intelligence analysts with access to classified data. (emphasis mine)

Most of the rest of the book is about the particular forecasters they discovered within their volunteer pool who were unbelievably good (the superforecasters), and what made them good. There are a lot of takeaways, but as I said, I'm only going to go into the one that was by far the most valuable for me.

Okay, but to explain it, I have to start by asking an annoying question: How many piano tuners are there in the city of Chicago?

This style of question is called a Fermi problem. My first awareness of these came from the fact that it used to be fetch for companies to ask questions like this in engineering interviews. I wish I were joking. Thankfully I have never been asked this in an interview. I won't go into my opinions on the correlation between the ability to answer this question and the ability to build great digital products, but suffice it to say that Google themselves, one of the industry leaders in asking piano tuner questions, eventually decided not to do that anymore.

You can google for in-depth walkthroughs of how to answer such questions (ironic, eh?), but the gist is that it's a matter of breaking down an unguessable number into several component numbers that might be more guessable, like: the number of people who live in Chicago, the percentage of people/households that own a piano, how often a piano needs to be tuned, how long it takes to tune a piano, and so on. You don't necessarily know these, either, but at least you can make a guess, and identify the assumptions your guesses are based on. Then you multiply across, and all the units cancel out, kind of like in those high school chem or physics problems where it's super satisfying to cross out all the units. (No? Just me?)

The key takeaway of the book for me is that superforecasters treat every question as if it's a Fermi problem. No matter the question, they Fermi-ize it. Fermi all the things. Once I read that, it seemed so obvious, but it hadn't occurred to me before, and it immediately took me from "totally stumped" in the face of one of those geopolitics questions, to "hey, maybe I could get halfway decent at that, too!"

When it comes to forecasting real life things and not silly interview questions, what you want to break the question into specifically is base rates vs. differentiating factors. ("Base rates" is a real term, "differentiating factors" I just made up because I don't know what that's called. Do correct me if you know it.) In other words: "which parts of this situation are like other situations that I can get data on... and which parts are unique?" The idea is to first come to an estimate of what the likelihood of an event would be if the situation were not special at all compared to comparable known situations, and then adjust for any factors that do make it special.

Let me digress here and say that one of the best things you can do for yourself, in terms of accurately predicting events in your own life, is to start from the assumption that you are not special, and your situation is not special, until proven otherwise. I only say to start there! Of course everyone is ultimately a little bit special. But by briefly suspending your specialness and allowing yourself to be comparable to other people of certain demographics and certain circumstances, you get to be grounded in a massive foundation of human stories and data that can help you know what you can expect to happen. That is why we always start with base rates.

Back to our Fermi problem. Geopolitics is hard and abstract, so let's use something closer to home as an example. What is the probability that you'll lose your job (involuntarily) within the next year? (Sorry to choose a downer question, but I think #studiesshow something scary is more likely to keep your attention, so I'm doing this for you.)

Most likely there is no single statistic out there that can just give you the answer, or you don't know of one, so we can start by asking, "How is my situation like other situations that I could get data on?" Possible answers include: the rate of layoffs across similar companies in your country (especially in whatever economic conditions you would predict for the next year, which is kind of a sub-forecast in this forecast); the rate of layoffs of people with your job/role, or your seniority or length of tenure, across all companies in your industry; and I bet you can think of a few more. If you can look up data for any of these, you'll end up with some numbers, and can think about how to take all of them into account to come up with some informed estimate of a base rate.

And next: "How is my situation different from those other situations?" Is your company doing particularly well or poorly compared to others in the sector? Are you new to the company, and that makes you feel more likely to get cut first if layoffs happen? Or do you happen to have evidence that your team or project is so high priority that it would be the least likely to be affected? Are you 10x and everyone knows it? (Hell yeah.)

But how do you know exactly how to weight all the different pieces of data that went into your base rate? And if you have individual factors that make the thing more likely or less likely than the base rate, how do you know how much you should adjust for those factors? My guess is that getting a sense for all those specifics is a big part of what forecasters get better at over time if they keep at it and if they review the results of their predictions and make an effort to get more accurate. (Which pundits don't do, almost by definition.) And I suspect that's part of what distinguishes an OK forecaster from a super forecaster.

That's okay that I don't know how to handle all those specifics right now. As a tailwit, I don't need to be the best at anything. I mostly just want to be not the worst. And to be not the worst at forecasting, my hunch is that all you have to do is to make a guess based on any actual data and reasoning. Just like with the piano tuner question, any guess involving the tiniest iota of logic is definitely going to be closer to the answer than no guess at all, and will probably also be closer than a super random guess based purely on feelings (e.g., "I'll definitely get laid off in the next year because my boss hates me," or, "I definitely won't get laid off in the next year because I don't want to think about that so it's definitely not going to happen.")

With this Fermi trick in hand, you can go forth and start forecasting all the things. Most of all—and I'm sure I will be harping on this again and again along our journey—most of all, a true tailwit must keep in mind: NEVER FORGET BASE RATES! If somebody tries to tell you that such-and-such a metric is a whopping 7.2% right now, your response had better be (in your best Marcel the Shell voice): "Compared to WHAT?!"

A tailwit who knows their base rates is ready to take on the world.

Appendix

I asked ChatGPT: "How many piano tuners would you guess there are in Chicago?"

ChatGPT's response:

It would be difficult for me to accurately estimate the number of piano tuners in Chicago, as I don't have access to current information about the local piano tuning market. The number of piano tuners in a given location can vary based on a number of factors, such as the size of the population, the number of pianos in the area, and the demand for piano tuning services. Without more specific information, it would be difficult for me to provide a reliable estimate of the number of piano tuners in Chicago.