[Non-typo UPDATE below.]
OK I need to get this out of my system, so I can proceed to more important things… If you need to get up to speed on Landsburg’s population brain teaser, go to this post. Here, I am going to assume every reader is hip deep in the problem and needs my careful coaching to escape the black hole into which I nearly fell myself, lo this past fortnight.
So here are my thoughts:
* I am assuming we are talking about the population of children (not the parents), and that nobody dies. The kids don’t in turn reproduce; there is just an initial set of couples, who then start cranking out babies.
* I am not dead certain, but I am very very confident, that Landsburg’s (modest) point is correct. In particular, I think if we did computer simulations, he would win. That is to say, for any finite number C of couples, and for any finite number G of generations (where in principle, a given couple could have a string of G girls and not a boy when the cutoff is reached), if we ran that simulation a large but finite number S times, we would observe: (A) More often than not, the population would have more boys than girls, and (B) for each of the S simulations, if we calculated the fraction of girls in the population (of children), and then took the mean of that calculation over all S simulations, the answer would be lower than 0.5.
* The important thing going on here is that “mathematical expectation” means something very specific, and in some contexts it is misleading. For example, consider two independent, random variables X and Y that are each uniformly distributed on the interview [1, 2]. The expectation of each is the midpoint, i.e. 1.5. However, if you calculate the expectation of (X/Y) it is bigger than 1. (Specifically it’s 3ln(2)/2.) Of course by symmetry that means the expectation of (Y/X) > 1 as well. So at first glance that’s weird. It seems like you’re saying you “expect” X to be bigger than Y, and Y to be bigger than X, at the same time. But that’s not what you’re saying. The general moral is that the ratio of expectations is not equal to the expectation of the ratio. I.e. even though E(X) = E(Y), it’s not true that E(X/Y) = 1, even though these are independent variables.
* Back to the population puzzler: Yes, it is certainly true that nature doesn’t care what rules the couples are using. Every time someone has another kid, there is a 50/50 chance it is a girl. But this doesn’t prove what a lot of people think it does. It’s true that within any population, or even within an individual family, the expected number of boys equals the expected number of girls. I.e. E(G) = E(B). But as we just saw, that fact by itself doesn’t mean we can conclude that E(G/B) = 1, or that E(G/(G+B)) = 0.5
* What IS true is that if we look at the entire set of children IN ALL POSSIBLE POPULATIONS, then E(G/(G+B)) = 0.5. But that’s not what the question asked.
* What Landsburg interpreted the question as asking, is the mathematical expectation of the fraction of girls *for any particular population* that is randomly drawn from the set of all possible populations. For populations consisting of any finite C number of initial couples (who then crank out kids), that answer is quite simply NOT 0.5. It’s true, as C goes to infinity, the answer approaches infinity. It’s also true that most people would have assumed that the spirit of the question meant, “For arbitrarily large populations.” But Landsburg (quite fairly, in my opinion) doesn’t think this means the basic intuition is right. No, people are skipping a huge step in the problem, when they go from the observation “each child is 50/50 a girl” to “I think in a large enough population, the expected fraction of girls is 50%.”
* Specifically, here’s what’s going on: The set of all possible populations has populations with DIFFERENT SIZES. There are a bunch of possible populations where many of the couples have one boy and no girls. In such populations, the fraction of girls is less than 0.5. On the other hand, there are a bunch of possible populations where many of the couples have tons of girls, and one boy. In such populations, the fraction of girls is greater than 0.5. However, if we assign the same weight to each potential population, even though some have more children than others, then the mean of all such fractions will be smaller than 0.5. In other words, in fewer than 50% of the possible populations, will there be more girls than boys.
* We know this is correct, and ironically, for the very reason that the Landsburg-haters keep harping on. Clearly, in the whole UNIVERSE of possible populations, the ratio is 0.5. In other words, there are as many boys as girls, if we look at the entire group of “possible children” in the universe. Now then, in one potential population that has more boys than girls, how is that going to happen? With the particular stopping rule under consideration, that will only happen when there are a lot of families with just one boy, and not so many families with a ton of girls. In contrast, the potential populations with more girls, will be ones where there aren’t many families with just one boy, i.e. there are a lot of families who had a string of girls before getting a boy. So this means that for a finite C number of couples, if you look at populations with more boys than girls, you will see small populations, while if you look at populations with more girls than boys, you will see big populations.
* Do the Landsburg Haters feel the vice closing? If you already admit–proudly in fact–that in the total universe of all possible offspring, there is an equal number of boys and girls, and you now see (though perhaps you didn’t think about it much before) that an actual population with more boys will have a smaller population, while a population with more girls will have a big population, then maybe it is dawning on you what Landsburg is saying: There are MORE POPULATIONS with a ratio of girls smaller than 0.5. The way the “total offspring in all possible populations” achieves the perfect 0.5 ratio, is that there are a (relatively small) number of possible populations with a ton of girls. You can’t have it both ways: If you still insist that in any randomly drawn population, the ratio of girls is just as likely to be under as over 0.5, then you must think that in the set of all possible offspring (across all potential populations), there are more girls than boys. (Or, you must deny that populations with a higher ratio of girls, have more total children than populations with a smaller ratio of girls.)
* This might help it click for some of you: Suppose we assume there are 1 billion couples, and we cut a simulated computer run off after 1000 generations. There is one potential outcome in which there are 1 billion boys, and 0 girls. The ratio of girls to boys in this outcome is 0. There is also a potential outcome in which there are 0 boys and 1 trillion girls. (Each of the billion couples has a string of 1000 girls without getting a boy, and then the simulation ends.) The ratio of girls to boys in this outcome is 1. But it is far more likely than an actual simulation will hit that first outcome than the second, so the 0% value of girls will get a much higher weight (to calculate the expectation) than the 100% value will get. Note that it’s still true that for any given simulation, the expected number of girls equals the expected number of boys (I think it’s 1 billion girls and 1 billion boys?). What’s happening is that there is a small probability of getting a bunch of girls, and that drives up the expectation of the number of girls. But no matter how many girls there are, the fraction can’t ever be higher than 100%. So that’s the most such an outcome can “push up” the overall expectation of the *fraction*.
* So for all these reasons, I think Landsburg is basically correct, and that simulations would show this. However, let me now give a shout out to his critics…
* Most important, I think Landsburg has seized on a particular formulation of the problem that misses the spirit of the puzzle. I think the point of the question is, “Should we expect this type of parental stopping rule to give a tendency for more boys?” At first you think, “Of course it would!” but then you realize, “Wait a second, there are no men with brothers in this society, but some guys have a ton of sisters…” So you try to use intuition to figure out if the families with 1 boy more than offset the families with a ton of girls, and you then have the “aha!” moment and realize that we have no reason to expect there to be more boys than girls. In other words, if someone in a job interview described the couples’ intentions and then said, “So, if we let this population grow for many centuries and then took a look, would you expect to see more males or females?” I think that would be fine. That is the interesting part of the puzzle, not the quirks of the Expectation operator.
* The physicist makes a good point that the standard approach for calculating probabilities takes a family (or group of families) and runs through the various scenarios of how they could come up with a boy. In other words, the probability of a given family having 0 boys is 0, the way everybody is tackling this problem. And yet, if we want our answer to reflect a humongous (but finite) population in progress, surely it is stacking the deck in favor of the boys to do it this way.
* Extending the last point, what if we captured the finite aspect of it, not by taking C couples, but rather by fixing O total offspring? In other words, what if we said, “Choose any finite O number of total offspring for the population. What is the expectation of the fraction of girls?” I’m pretty sure the answer is 50%. The reason nobody models it this way, is that it’s not nearly as intuitive as taking the couple approach, because it gets tricky to figure out all the ways you could generate that many offspring. For example, for O = 1, you clearly are dealing with only one couple, and the answer is obviously 50%. For O = 2, you could be dealing with 1 or 2 couples. (If 1 couple, then 2 generations, while if 2 couples, then 1 generation.) I haven’t done this rigorously, but I’m pretty sure if you do it this way, for any finite O, the expected fraction of girls is 0.5. UPDATE: Actually this approach is much weirder than the couple approach, because you can’t come up with a good way of assigning probabilities to the different scenarios. E.g. with O = 2, if there’s 1 couple then the proportion of girls is 100% half the time, and 50% the other half. So the expected proportion of girls is 75%. But if there are 2 couples then it’s one generation and a coin toss, so the expected proportion of girls is 50%. But to figure out the overall expected proportion of girls, conditional on O = 2, you need to know how likely it is that there is 1 couple versus 2 couples. But that doesn’t really make any sense. (Also, for this particular O, notice that regardless of the probabilities, the expected proportion of girls is higher than 50%, unless I did something stupid. So probably the intuition of “let’s focus on the offspring, not the couples” is not nearly as good as I thought ten minutes ago.)
* As I said, there is no reason that the finite O approach is any stranger than the finite C approach, except for the fact that when we picture a specific iteration of the random process, we are picturing an initial group of couples who then throw on Barry White. But I think Landsburg is not giving due credit to his critics who are really upset at his particular approach to the solution, when alternate (and equally plausible) approaches would vindicate the “wrong” intuition.
* This leads to my final point: It is absolutely hilarious to see how heated the debate got in the comments at Landsburg’s blog. Landsburg kept calling one guy “too stupid to understand this,” and–surprise!–that only made the guy angrier. If people can have this much controversy and bullheadedness over a problem that is 95% math, no wonder we can’t get anywhere in the social sciences.