29 Dec 2010

## Fallacies Multiplying Like Rabbits

[Non-typo UPDATE below.]

OK I need to get this out of my system, so I can proceed to more important things… If you need to get up to speed on Landsburg’s population brain teaser, go to this post. Here, I am going to assume every reader is hip deep in the problem and needs my careful coaching to escape the black hole into which I nearly fell myself, lo this past fortnight.

So here are my thoughts:

* I am assuming we are talking about the population of children (not the parents), and that nobody dies. The kids don’t in turn reproduce; there is just an initial set of couples, who then start cranking out babies.

* I am not dead certain, but I am very very confident, that Landsburg’s (modest) point is correct. In particular, I think if we did computer simulations, he would win. That is to say, for any finite number C of couples, and for any finite number G of generations (where in principle, a given couple could have a string of G girls and not a boy when the cutoff is reached), if we ran that simulation a large but finite number S times, we would observe: (A) More often than not, the population would have more boys than girls, and (B) for each of the S simulations, if we calculated the fraction of girls in the population (of children), and then took the mean of that calculation over all S simulations, the answer would be lower than 0.5.

* The important thing going on here is that “mathematical expectation” means something very specific, and in some contexts it is misleading. For example, consider two independent, random variables X and Y that are each uniformly distributed on the interview [1, 2]. The expectation of each is the midpoint, i.e. 1.5. However, if you calculate the expectation of (X/Y) it is bigger than 1. (Specifically it’s 3ln(2)/2.) Of course by symmetry that means the expectation of (Y/X) > 1 as well. So at first glance that’s weird. It seems like you’re saying you “expect” X to be bigger than Y, and Y to be bigger than X, at the same time. But that’s not what you’re saying. The general moral is that the ratio of expectations is not equal to the expectation of the ratio. I.e. even though E(X) = E(Y), it’s not true that E(X/Y) = 1, even though these are independent variables.

* Back to the population puzzler: Yes, it is certainly true that nature doesn’t care what rules the couples are using. Every time someone has another kid, there is a 50/50 chance it is a girl. But this doesn’t prove what a lot of people think it does. It’s true that within any population, or even within an individual family, the expected number of boys equals the expected number of girls. I.e. E(G) = E(B). But as we just saw, that fact by itself doesn’t mean we can conclude that E(G/B) = 1, or that E(G/(G+B)) = 0.5

* What IS true is that if we look at the entire set of children IN ALL POSSIBLE POPULATIONS, then E(G/(G+B)) = 0.5. But that’s not what the question asked.

* What Landsburg interpreted the question as asking, is the mathematical expectation of the fraction of girls *for any particular population* that is randomly drawn from the set of all possible populations. For populations consisting of any finite C number of initial couples (who then crank out kids), that answer is quite simply NOT 0.5. It’s true, as C goes to infinity, the answer approaches infinity. It’s also true that most people would have assumed that the spirit of the question meant, “For arbitrarily large populations.” But Landsburg (quite fairly, in my opinion) doesn’t think this means the basic intuition is right. No, people are skipping a huge step in the problem, when they go from the observation “each child is 50/50 a girl” to “I think in a large enough population, the expected fraction of girls is 50%.”

* Specifically, here’s what’s going on: The set of all possible populations has populations with DIFFERENT SIZES. There are a bunch of possible populations where many of the couples have one boy and no girls. In such populations, the fraction of girls is less than 0.5. On the other hand, there are a bunch of possible populations where many of the couples have tons of girls, and one boy. In such populations, the fraction of girls is greater than 0.5. However, if we assign the same weight to each potential population, even though some have more children than others, then the mean of all such fractions will be smaller than 0.5. In other words, in fewer than 50% of the possible populations, will there be more girls than boys.

* We know this is correct, and ironically, for the very reason that the Landsburg-haters keep harping on. Clearly, in the whole UNIVERSE of possible populations, the ratio is 0.5. In other words, there are as many boys as girls, if we look at the entire group of “possible children” in the universe. Now then, in one potential population that has more boys than girls, how is that going to happen? With the particular stopping rule under consideration, that will only happen when there are a lot of families with just one boy, and not so many families with a ton of girls. In contrast, the potential populations with more girls, will be ones where there aren’t many families with just one boy, i.e. there are a lot of families who had a string of girls before getting a boy. So this means that for a finite C number of couples, if you look at populations with more boys than girls, you will see small populations, while if you look at populations with more girls than boys, you will see big populations.

* Do the Landsburg Haters feel the vice closing? If you already admit–proudly in fact–that in the total universe of all possible offspring, there is an equal number of boys and girls, and you now see (though perhaps you didn’t think about it much before) that an actual population with more boys will have a smaller population, while a population with more girls will have a big population, then maybe it is dawning on you what Landsburg is saying: There are MORE POPULATIONS with a ratio of girls smaller than 0.5. The way the “total offspring in all possible populations” achieves the perfect 0.5 ratio, is that there are a (relatively small) number of possible populations with a ton of girls. You can’t have it both ways: If you still insist that in any randomly drawn population, the ratio of girls is just as likely to be under as over 0.5, then you must think that in the set of all possible offspring (across all potential populations), there are more girls than boys. (Or, you must deny that populations with a higher ratio of girls, have more total children than populations with a smaller ratio of girls.)

* This might help it click for some of you: Suppose we assume there are 1 billion couples, and we cut a simulated computer run off after 1000 generations. There is one potential outcome in which there are 1 billion boys, and 0 girls. The ratio of girls to boys in this outcome is 0. There is also a potential outcome in which there are 0 boys and 1 trillion girls. (Each of the billion couples has a string of 1000 girls without getting a boy, and then the simulation ends.) The ratio of girls to boys in this outcome is 1. But it is far more likely than an actual simulation will hit that first outcome than the second, so the 0% value of girls will get a much higher weight (to calculate the expectation) than the 100% value will get. Note that it’s still true that for any given simulation, the expected number of girls equals the expected number of boys (I think it’s 1 billion girls and 1 billion boys?). What’s happening is that there is a small probability of getting a bunch of girls, and that drives up the expectation of the number of girls. But no matter how many girls there are, the fraction can’t ever be higher than 100%. So that’s the most such an outcome can “push up” the overall expectation of the *fraction*.

* So for all these reasons, I think Landsburg is basically correct, and that simulations would show this. However, let me now give a shout out to his critics…

* Most important, I think Landsburg has seized on a particular formulation of the problem that misses the spirit of the puzzle. I think the point of the question is, “Should we expect this type of parental stopping rule to give a tendency for more boys?” At first you think, “Of course it would!” but then you realize, “Wait a second, there are no men with brothers in this society, but some guys have a ton of sisters…” So you try to use intuition to figure out if the families with 1 boy more than offset the families with a ton of girls, and you then have the “aha!” moment and realize that we have no reason to expect there to be more boys than girls. In other words, if someone in a job interview described the couples’ intentions and then said, “So, if we let this population grow for many centuries and then took a look, would you expect to see more males or females?” I think that would be fine. That is the interesting part of the puzzle, not the quirks of the Expectation operator.

* The physicist makes a good point that the standard approach for calculating probabilities takes a family (or group of families) and runs through the various scenarios of how they could come up with a boy. In other words, the probability of a given family having 0 boys is 0, the way everybody is tackling this problem. And yet, if we want our answer to reflect a humongous (but finite) population in progress, surely it is stacking the deck in favor of the boys to do it this way.

* Extending the last point, what if we captured the finite aspect of it, not by taking C couples, but rather by fixing O total offspring? In other words, what if we said, “Choose any finite O number of total offspring for the population. What is the expectation of the fraction of girls?” I’m pretty sure the answer is 50%. The reason nobody models it this way, is that it’s not nearly as intuitive as taking the couple approach, because it gets tricky to figure out all the ways you could generate that many offspring. For example, for O = 1, you clearly are dealing with only one couple, and the answer is obviously 50%. For O = 2, you could be dealing with 1 or 2 couples. (If 1 couple, then 2 generations, while if 2 couples, then 1 generation.) I haven’t done this rigorously, but I’m pretty sure if you do it this way, for any finite O, the expected fraction of girls is 0.5. UPDATE: Actually this approach is much weirder than the couple approach, because you can’t come up with a good way of assigning probabilities to the different scenarios. E.g. with O = 2, if there’s 1 couple then the proportion of girls is 100% half the time, and 50% the other half. So the expected proportion of girls is 75%. But if there are 2 couples then it’s one generation and a coin toss, so the expected proportion of girls is 50%. But to figure out the overall expected proportion of girls, conditional on O = 2, you need to know how likely it is that there is 1 couple versus 2 couples. But that doesn’t really make any sense. (Also, for this particular O, notice that regardless of the probabilities, the expected proportion of girls is higher than 50%, unless I did something stupid. So probably the intuition of “let’s focus on the offspring, not the couples” is not nearly as good as I thought ten minutes ago.)

* As I said, there is no reason that the finite O approach is any stranger than the finite C approach, except for the fact that when we picture a specific iteration of the random process, we are picturing an initial group of couples who then throw on Barry White. But I think Landsburg is not giving due credit to his critics who are really upset at his particular approach to the solution, when alternate (and equally plausible) approaches would vindicate the “wrong” intuition.

* This leads to my final point: It is absolutely hilarious to see how heated the debate got in the comments at Landsburg’s blog. Landsburg kept calling one guy “too stupid to understand this,” and–surprise!–that only made the guy angrier. If people can have this much controversy and bullheadedness over a problem that is 95% math, no wonder we can’t get anywhere in the social sciences.

#### 20 Responses to “Fallacies Multiplying Like Rabbits”

1. Steven E. Landsburg says:

Bob: Thanks for this thoughtful and accurate post.

I will note only that if you look through my comments, you will find that I used harsh words only with commenters who persisted in repeating the same incorrect arguments over and over and over and over and over and over and over and over, after I’d carefully explained why those arguments were incorrect, and without ever even acknowledging those responses. (I was gentle with those who acknowledged the responses but failed to understand them.)

• bobmurphy says:

I understand why you said what you did. BTW you know that the chance of having a girl is 50%, right? And that there are countries with more than 1 family?

2. Joseph says:

Landsburg’s argument about the expectation is right but I think he is being unreasonable. Calculating the expectation of the proportion is not necessarily the best way to answer the question “what proportion are girls?”

In fact, rather than looking at the definition of expectation, let’s look at the definition of probability itself. If the probability of an event is 1/2, that means by definition that the long-run proportion of times we get the event happening is 1/2.

Landsburg claims that the “intuitive” answer is wholly wrong. But I think there is a valid line of argument to give the intuitive answer. We just have to note that the problem asked for the fraction in the population. The population is “big,” therefore the long-run proportion is the probability.

Much of the difficulty of probability-based questions like this is knowing how to correctly mathematically formulate the question. We need to work out when we are viewing questions as being asked and what the possible answers are.

First of all, the initial experiment of children being born must already have occurred before you choose O of them; otherwise, there is a small probability that less than O children will be born. You must then choose O children without knowing their genders. It doesn’t matter how many couples the children came from, the expectation is still 0.5. There is an elegant argument for this. Expectation is additive, the problem is symmetric in boys and girls, so 1 = proportion of girls + proportion of boys, so E(1) = 1 = E(proportion of girls + proportion of boys) = E(proportion of girls) + E(proportion of boys) = 2.E(proportion of girls), so E(proportion of girls) = 1/2.

In fact, this leads to an interesting point. If we know the number of chilldren, the expectation is 1/2. This may seem to be a contradiction, but it depends on what we are ranging over. A similar phenomenon is http://en.wikipedia.org/wiki/Simpson%27s_paradox.

3. Joseph says:

Actually, I don’t think what I’ve said above about the “finite O approach” is correct. The problem is not symmetric in boys and girls. (E.g. if there is only one pair of parents, there is only one allowed fraction of girls for each number of children.)

(Can I mention that it is difficult to tell whether comments have been received or not on this blog? I assume they are waiting moderation but there is a chance they have been “lost in the aether.”)

4. Yancey Ward says:

Dr. Murphy,

I don’t know if it makes any difference to your analysis (my head is still hurting), but there is something wrong with this part of your discussion:

There is one potential outcome in which there are 1 billion boys, and 0 girls. The ratio of girls to boys in this outcome is 0. There is also a potential outcome in which there are 0 boys and 1 trillion girls. (Each of the billion couples has a string of 1000 girls without getting a boy, and then the simulation ends.) The ratio of girls to boys in this outcome is 1

• bobmurphy says:

Yancey, do you just mean I’m being sloppy with “fraction,” “ratio,” “proportion,” etc.? Yeah, I caught some of those mistakes when I re-read it, but was too lazy to fix it. Everyone should take these things with a grain of salt. E.g. in the quotation above, obviously what I meant to say in the last sentence was, “The ratio of girls to total children in this outcome is 1.”

5. f4kingit says:

I remember one point in my life (during Calc classes in college) where I could follow stuff like this, but sadly, no longer. However, I’m very impressed by the likes of Dr. Murphy and others! How did you keep all this stuff in your brain after college?

6. Evan says:

OK, I get that such a policy would mean that 50% of the families would be 100% Boys, and since at least some of the remaining 50% of families will also have at least 1 Boy, the expected B:G ratio of a randomly-selected family would necessarily be greater than 1:1, since we are not weighting the ratio by # of children.

What I can’t quite conceptualize, though, is why this would hold when we discuss “countries”. In a country, composed of an aggregation of individual families, wouldn’t we expect the larger number of families with 1B:0G and the smaller number of families with a low B:G ratio to exactly balance out for an aggregate ratio of 1:1? Does Landsburg’s answer still make sense if we move beyond randomly selecting individual families?

I also don’t get how, if E(X) = 1.5 and E(Y) = 1.5, that it doesn’t follow that E(X/Y)=1. What else would it be? If it were just X=1.5 and Y=1.5, then clearly X/Y=1. Why does the Expectation change anything?

• bobmurphy says:

Evan, I don’t have the energy to argue about the population stuff right now; maybe someone else can grab the baton. I get what you’re saying, but I still think Landsburg is right. Keep in mind that in practice, any particular population (not just family) will not have exactly 50% boys and 50% girls. So if you think about the types of populations that have more boys, you realize it is ones that has lots of little families. So when Nature is distributing the equal numbers of boys and girls across all potential populations, she dumps a bunch of girls into a relatively small number of them. That’s why more such populations (not just families) have a higher fraction of boys, even though the total fraction of all potential girls divided by total potential children is 50%.

As far as the E(X/Y), I can’t give you an intuitive answer. Just do it. You take the integral from y=1 to y=2 of (1/y)dy, times the integral from x=1 to x=2 of xdx. (You technically are also multiplying each by the pdf, but since it’s uniform distribution the pdf is 1 everywhere.)

So the inner integral is 1/2(x^2), evaluated at x=2 minus x=1. So that’s where the 3/2 comes from.

Then the outer integral is ln(y) evaluated at y=2 minus y=1, i.e. ln2-0.

So the overall answer is 3ln(2)/2, and since ln(2) is slightly bigger than 2/3, the whole thing is bigger than 1.

7. Silas Barta says:

Excellent summary and resolution, Bob.

8. Randy says:

Mr. Murphy,

I do not presume to settle the issue, but I do have difficulty following your logic. To me, the biggest problem is in this statement:

“However, if we assign the same weight to each potential population, even though some have more children than others, then the mean of all such fractions will be smaller than 0.5. In other words, in fewer than 50% of the possible populations, will there be more girls than boys.”

I would draw the exact opposite conclusion! If, for example, we were to calculate the expected ratio for a single family in this manner (giving equal weight to all potential populations B, GB, GGB, GGGB, GGGGB, etc.) we would arrive at a ratio much greater than 0.5, wouldn’t we? Wouldn’t it be true, in this case, that in MORE than 50% of the possible populations, there would be more girls than boys?

I understand your point about larger populations having a higher ratio than smaller populations, but I don’t see the relevancy. If we ran a boatload of simulations (using the same number of families), I suspect we would find that there were some population sizes that would occur far more frequently than others, a “convergent” size, if you will. In calculating the expected ratio, we wouldn’t give equal weight to all potential population sizes, but rather we would weight populations more heavily as they approached the convergent size. In other words, if we were to pick one result randomly from our simulation results, we would be much more likely to choose a population closer to our convergent size than an “extreme” result.

Another interesting way to look at it is to imagine that we pre-determined the birth outcomes with a flip of a coin, creating a super-duper loooong results string. It would look like this: BBGBGGBBGBGGGBGBBBGBGB… Then, suppose we determined our population makeup for any given number of families by moving along this results string, assigning children to them until all our families had managed to produce boys. We could use the same string and run the simulation for 1000, 2000, 10000, however many families we liked. Each time we did it, we would stop at a different place in the string. However, examining each substring, we would still arrive at the 50:50 ratio result over the long haul. We could even create a huge number of these results strings and run our simulation for each number of families over every individual results string. No matter where we ended up, or how we measured the makeup of our resulting “substrings” — even if we tried various weightings for larger vs. smaller populations — the makeup of all the resulting populations (the substrings) would remain 50:50.

Now we could, if we so chose, purposely create some crazy strings (like BBBBBBBBBBBBBBBBB or GGGGGGGGGGGGG) and run those through the simulation, but this would completely invalidate the results! The point is not to cover all possible results strings with equal weight, as you seem to be suggesting, but to generate every string with a fair coin every time, which would obviously lead us to the 50:50 result.

Anyway, keep up the good fight! I’m enjoying your book. No other economics textbook would contain the sentence:

“Rather than grabbing the coconut or watch, he could have decided to use his hands to punch himself in the face.”

9. Randy says:

Forgive me one final thought:

Imagine we were bound and determined to introduce a legitimate extreme GGGGGGGGGGGGGGGG string into our simulation. We could keep flipping coins until we finally managed to generate one. But wouldn’t we then be bound to also include in our simulation the thousands upon thousands of mundane 50:50 style strings we created along the way? And wouldn’t these serve, ultimately, to offset the effect our extreme string had on our simulation?

Happy New Year!

10. Sieben says:

I just ran a simulation. The outcome was 50%, which is annoying.

The simulation is over 100000 families. The matlab code looks like this.

boys=0;
girls=0;
for i = 1:100000
boybirth=0;
while boybirth==0
birth=rand;
if birth > .5
boybirth=1;
elseif birth < .5
girls=girls+1;
end
end
boys=boys+1;
end

fraction = (boys)/(boys+girls)

Families have, on average, 2 kids… :I

Funnily, if families are RESTRICTED to 2 children per family, the fraction of boys increases to .53, and the average # of kids per family decreases to 1.877

• bobmurphy says:

Sieben,

That’s what Landsburg would have expected. As the number of families approaches infinity, the expected fraction approaches 0.5.

What you should do is test, say, a population of 4 families, but do it 100,000 times. If the simple insight “every birth has 50/50 chance of being a girl” is really all there is to this, then that overall average should be close to 0.5. But Landsburg predicts it will be noticeably lower.

• Daniel Hewitt says:

Finally read all of this (well, most of it) and Landsburg did a good job explaining why lower populations will be more skewed towards boys, therefore 50% as a blanket answer is wrong. But the other guy, in the comments on his blog, acknowledged this right away…..

“every birth has 50/50 chance of being a girl” seems to be an assumption that everyone made, but the problem statement does not mention it. In real life, boys born outnumber girls.

11. Randy says:

Having stepped back a little bit from the problem, it seems quite clear to me that Landsburg has used an arcane interpretation of the wording of the problem to convince himself and others that there is some non-intuitive result that only a select few are smart & clever enough to understand.

It seems abundantly clear to me that the problem, as originally worded, isn’t meant to determine if a person fully understands the mathematical properties of the expectation function, if they are diligent enough to also consider countries that consist only of 4 families, or whether or not the measurement of the fraction differs if we allow all families to complete themselves or not before measuring.

The problem is meant to conjure in us a vision of a modern country, consisting of millions of families, all procreating according to the given algorithm, but at their own pace, and the measurement the problem asks for is meant to be taken at any point in time.

I hope you will revisit this craziness sometime in the future, when you’ve had a chance to regain some perspective, and acknowledge that the answer to the problem really is 50% and that only through extreme interpretation and arcane mental gymnastics does one arrive at any other conclusion. After all, you are an Austrian, right? This seems like the kind of crazy “I’ve figured out something incredible” sort of thinking that a Keynesian macroeconomist would engage in to convince us that the state really can — honestly, just look at the math! — put our money to better use than the market can.

• bobmurphy says:

Randall,

I hope to revisit this within the upcoming week. Feel free to remind me if I don’t. I understand what you are saying, but I think you are still wrong. What’s good is that we can test it with computers. I.e. we could agree on a simulation that captures what you think the spirit of the question is, and Landsburg would still think it wouldn’t be 50% in that case. Then he will bet you.

12. Randy says:

Mr. Murphy,

I’ve run the sim that you suggest here, the result is .43, assuming I am calculating the fraction the right way. What I’m doing is this: After simulating all the families in the country, I calculate the ratio of girls to boys. Then, I average this ratio equally over all simulations.

But I have to ask — what country in the world contains only 4 families?

Landsdale introduces us to his brilliant non-intuitive answer with a story about families living on his block, pointing out that the ratio of girls in the average family isn’t the same as the ratio of girls on the block. Fine. But then he decides to promote families to “country” status, and this is just wrong. There is a big difference between a family and a country. A family is the atomic decision-making unit in the problem. A country is a container for an overabundance of INDEPENDENT families.

Go back over Landsdale’s family vs. block analogy and ask yourself this question: Is a country more like a single family, or a block of families?

If we really wanted to equate a country with a family in the way Landsdale tries to, we’d need a country of strictly DEPENDENT families, where the results of every family’s birth in the entire country was the same, every time. We might as well compute the ratio of girls-to-unicorns in such a magical place!

The results of my sim bear all this reasoning out. If I reduce the number of families in my country to 1, I get .30. As I increase the number to something even remotely resembling a mid-sized town, it quickly approaches .5

13. Randy says:

Actually, a better question for you:

What fraction of the population on Landsburg’s block is female?