Causality – Joeira

Notes on The Book of Why

Post author:Alex Uriarte
Post published:December 31, 2025
Post category:Fan

I read (most of) The Book of Why by Judea Pearl. I read the first four chapters carefully, but less so the remaining of the book, after I thought I understood its main arguments and conclusions. I may come back, particularly if I think I missed key aspects of it, but I will summarize here my current understanding and impressions.

For most of my reading I was somewhat put off by the constant claim of how revolutionary and new the content of the book was, when much of it seemed to simply reflect standard scientific methodology. At first I thought that the claim to a revolutionary new science (“Causal Inference”) perhaps reflected Pearl’s own trajectory, stemming from a computer science background and dialoguing routinely with statisticians, and that perhaps he did not have a strong science background. But that is not the case: I understand he has actually made contributions to physics.

Where I finally saw a rupture with traditional academic practice was when I got to chapter 4 and his discussion of the back-door criterion for selection of variables to control for when assessing models applied to observational data. Social scientists (to exemplify with the academic practice I am most familiar with), when trying to measure the potential impact of a variable X on Y, will typically “control for” any other variable that may also be impacting Y under the argument that this will isolate the effect of X and, therefore, avoid capturing a “spurious” correlation between X and Y, given that we are dealing with observational data and not a controlled experiment (at least in the typical case). Here is where Pearl’s contribution of insisting that modelers design the causal diagram (the theoretical framework) they have in mind, and do not control for variables that actually capture an indirect effect of X, became embarrassingly clear to me. I say embarrassingly both because I am probably guilty of controlling for variables I shouldn’t have in the past and because I realize that what he is saying should be somewhat obvious…and yet that has not always been the case.

If I can summarize his main arguments, they seem to be that:

First, P(Y|X)P(Y|do(X)), in other words, if P(Y|do(X)) reflects the isolated causal effect of X on Y, that we would observe in a controlled experiment (like in a randomized control trial), the P(Y|X) that we capture in observational data is something completely different for at least two reasons: a) it captures the effects of other confounding variables; b) it actually tells us, in itself, nothing about causality. It is the old “correlation does not mean causality argument.” Up to this point, his argument seems to me nothing new. He claims, however, that the language is important to allow us to talk about causality: that the “do(X)” needs to be introduced in our lexicon. Fine.

Second, causal diagrams are key for our proper modeling of causal relationships to be tested by observed data. Consider the model below (Figure 1, not in the book, I made this up). Assume the average success of, say, high school basketball players in dunking the ball, is modeled as a function of the average jumping height of each player, and that this jumping height, in turn, is a function of the average player height and, say, the frequency and intensity of jumping practice on the team. A typical econometric test for the effect of jumping practice on the average dunking success of the players might control for player height to isolate the effect of jumping practice. Pearl’s argument would be that there is no need to control for player height and the way to see this is that, not controlling simulates the situation we would have in an RCT, where any other factors are assumed away because, on average, they should be the same among the treatment and control populations, assuming these are large enough. That said, in this case my understanding is that, if we do control for player height, no harm would be done and the result should be the same, assuming the model is correct.

Now assume the true model is the one in Figure 2, perhaps because height affects players’ expectations from benefiting from practice. Now we do need to control for player height to establish the effect of practice alone on average dunking success. Otherwise, we may be capturing in part the effect of player height, when only looking at the two variables of jumping practice and dunking success.

Let’s look at yet one other model (Figure 3), where jumping practice and average jumping height signal the likelihood of any high school basketball player also competing in the high jump event on the track and field team. Pearl would warn us against controlling for observed participation in the high jump event, because this would detract from the measured effect of jumping practice (“explain-away effect”).

More generally, Pearl proposes the following rules when deciding what to control or not for, to be able to mimic the effect of RCTs in observational data (Pearl and Mackenzie (2018), pgs 157-158):

a) In a chain junction, A→B→C, controlling for B prevents information about A from getting to C or vice versa;

b) Likewise, in a fork or confounding junction A←B→C, controlling for B prevents information about A from getting to C or vice versa

c) Finally, in a collider, A→B←C, exactly the opposite rule hold. The variables A and C start out independent, so that information about A tells you nothing about C. But if you control for B, then information starts flowing through the “pipe,” due to the explain-away effect

[…]

d) Controlling for descendants (or proxies) of a variable is like “partially” controlling for the variable itself.

The idea is that we would not want to control for mediators (B in item “a” above), colliders (B in item “c” above), or proxies (B in item “d” above) but we do want to control for confounders (B in item “b” above), all represented in Figure 4 below.

Pearl actually discusses controlling more in terms of paths rather than variables and calls “back-door adjustment” ensuring that any path connecting the variables X and Y that is not the causal path we are willing to test for, is blocked (by appropriate controls), and and that there are no blockers in the path that we do want to test for.

As Pearl went on discussing RCTs, instrumental variables, and observational studies controlling for confounding variables, I grew increasingly intrigued by what I might be missing that was so revolutionary:

Perhaps it is the emphasis on the diagrams, which I do find useful, although I still have trouble thinking of them as revolutionary, but rather as a way to bring clarity to our models. I would have benefited from doing this, for example, during my graduate studies where, yes, my tendency was to control for everything under the sun without clearly realizing the implications.
Perhaps it is the do-calculus, and I may need to find more examples where it is used to see how this generates responses that otherwise we would not have.

What I was not able to find, however – and it may be just me missing it – was some discussion of how we can use observed data to not just reject assumed causal relationships, but to help us better define our causal models. Most of the book “thinks” in a very traditional scientific way: from model to data. There is some discussion towards the end of how data mining can help direct our focus to certain correlations (and, thus, potential causal connections to be investigated). We also know there are things we can do to help at least inform how robust our models are to the assumptions we make, such as sensitivity analysis, which he brushes on very, very briefly in chapter 5. It also seems to me that Bayesian Networks, described and discussed in the book, should be useful in feeding into the reverse discussion, from evidence to models. However, Pearl seems to go out of his way to constantly make the point that the data themselves say nothing about causality, without discussing where, then, does our causal reasoning come from. It seems we are simply wired for it, or, as he discusses, it comes from our imagination. Perhaps it was just my misplaced expectation that this book would explore this further.

In the future I hope to further explore some of the aspects of this book and Pearl’s thinking that I have not adequately covered here: particularly Bayesian Networks, the causal ladder and the role of imagination.

Sources:

Pearl, Judea and Dana Mackenzie. 2018. The Book of Why. The New Science of Cause and Effect. New York: Basic Books