Fan – Joeira

Statistical Inference, Observational Data and Machine Learning

Post author:Alex Uriarte
Post published:April 28, 2025
Post category:Fan

I spent some time thinking about where statistical inference is considered in predictions made by Machine Learning (ML) models and came to the conclusion that the answer is most (if not all) of the time, nowhere. The reason is the typical use of observational data by ML models. Here is my reasoning.

Statistical Inference

Statistical inference consists of using observed events to draw conclusions about the underlying process that generated them. Most commonly, this is used to draw conclusions about an unobserved population based on an observed sample.

Perhaps the most important tool in statistical inference is random sampling. Random sampling consists of selecting subsets of a population (samples) at random, that is, through some experiment where the outcome cannot be predicted in advance and where each unit of observation has the same chance of being selected for the sample.

Random samples have a few properties that allow us to make statements about the population (or the generating process). In particular, if we have a random sample:

The law of large numbers (LLM) states that the average of variables in that sample converge to the average of those variables in the population, the larger the size of the sample
The central limit theorem (CLT) states that, as multiple random samples are pulled from the same population, the average of the samples is distributed normally and can be normalized to be present a standard normal distribution (with mean equal to 0 and variance equal to 1)

Random sampling, the LLM and the CLT are all key in allowing us to draw conclusions about a population (or generating process) based on an observed sample.

Note that all the above refers to statistical inference and not causal inference. In statistical inference we are drawing conclusions about a population based on a sample. In causal inference we are drawing conclusions about how one factor may cause another. Random sampling by itself does not help with causal inference as it does with statistical inference. Causal inference is where, for example, randomized control trials (RCTs) are particularly powerful. But that may be the subject for another post.

Observational data

So what do you do when you have a set of data that was not generated through some random process, but rather, where the data were simply observed? As far as I understand, the only real option is to limit our analysis to the analysis of the sample data itself. That is, draw no inferences about the generating process. This, however, is rarely the purpose of data analysis. We most often analyze the data to be able to draw conclusions about the generating process and be able to predict future outcomes, not just to understand past observed data

The more common practice seems to be to continue the analysis of data as if they were the result of a random sample of some generated process. For example, we often see econometric models where someone conducts a linear regression using observational data, estimates the p-values of the parameters (the coefficients of the regression), and then interprets those p-values as telling us the probability of observing values for the coefficients that are as far away or more from zero, in a situation where the true value of the coefficients in a hypothetical population were in fact 0. In other word, if they see a p-value low enough, (say 5%) they say that in only 5% of samples extracted from a population would we estimate a coefficient as far from 0 as the ones we estimated and, therefore, they conclude that there is a very good chance that the true coefficient in the population is in fact not zero. They then celebrate that they found a “significant” correlation in the data. But all this interpretation assumes that the observed data are randomly sampled through some generating process. With observational data, that is not the case. So the use of statistical inference, calculation of p-values, hypothesis testing make no sense.

Additional confusion is generated by terminology commonly used in linear regression. For example, the Gauss-Markov theorem gives us conditions under which an estimate is the Best Linear Unbiased Estimator (BLUE). However:

“Estimator” refers to a linear function for obtaining an expected value for the explained variable Y given values for the explanatory variable Xs. It does NOT refer a function for obtaining estimates of population parameters based on a sample;
Similarly, “unbiased” refers to the estimator of the Y in the observed data set, given the explanatory X variables, not to an estimator of the population Y.

So what does this mean for ML models?

Machine Learning

Because ML typically relies on observational data, my conclusion is that traditional machine learning models suffer from the same constraints of any other statistical analysis of observational data: because observations are not pulled from a population at random, there is very little that we can say or do to ensure that our ML estimators are unbiased estimators of an underlying generating process.

What we can do is:

Knowing that data sets used in ML models are typically large (“big data”), consider the extent to which the data set is likely very close to the universe of interest, or even consider that the data may be in itself the universe of interest. In that case, no statistical inference is needed. In other words, consider the potential for “bias” in the observed data relative to a hypothetical population of interest.
Consider whether bootstrapping sub-samples of data and testing the model on many sub-samples is likely to generate robust enough estimators that, perhaps combined with the large size of the population, we can be sufficiently confident that biases in the sample will not meaningfully affect predicted outcomes.

To understand the above, let us better understand traditional ML models.

Using again the example of linear regression, regression analysis is now commonly thought of as one more application of traditional ML. In traditional ML approaches, an algorithm is applied to a (typically large) data set in search of patterns. When the algorithm “labels” the outcomes and looks for patterns that identify the outcome, this is called supervised machine learning. This would be the case of regression analysis, since we observe the dependent variable in the data set. The algorithm is typically some optimizing criteria – such as least squares or maximum likelihood – but sometimes the criteria are based on sufficiency. In ML, the machine is said to “learn” when it has generated a model, that is, when it has identified patterns based on applying an algorithm to a dataset. In a non-supervised ML approach, there is no labelling of the outcome in the dataset, but in either case, some algorithm searches for patterns, and this means that:

“Features” of the data set where patterns should be found are either selected by the modeler before applying the algorithm or are part of the algorithm itself.
The criteria for identifying a pattern is defined in the choice of algorithm

The choice of algorithm is key, and the choice is often guided by the purpose of the model, that is, what type of event we are trying to identify in the data. However, if we want to compare different algorithms, and also when testing how good a model we obtain:

The common criteria used typically focus on internal validation, that is, how well the model fits the data. In supervised models, this means criteria such as precision, recall and the F1 score (that balances the two). In unsupervised models, there are different types of internal validation criteria depending on the algorithm used.
To increase the chances that the model will work well in other datasets, the dataset is typically broken into sub-samples and model parameters are estimated on more than one sub-sample. This reduces the “over fitting” of a model to one specific sub-sample.

So, back to the question of how to deal with observational data in ML models. It seems to me the best that we can do is to (again):

ask ourselves if the data at hand can be considered our universe of interest or to look for potential bias in the data (relative to our targeted universe); and
if we think there is bias relative to our population of interest, consider whether testing the model on many bootstrapped sub-samples is likely to generate robust enough estimators that we can be sufficiently confident that biases in the sample will not meaningfully affect predicted outcomes.

But the bootstrapped sub-samples, even being randomly generated from the data at hand, still do not allow us to draw conclusions about an underlying process and population, unless that population is assumed to be the observed data set.

Sources (and note)

I did not do a very good job at registering my sources for this post. I relied mainly on my own knowledge while searching Wikipedia and interacting with ChatGPT 4o. I Have become a big fan of ChatGPT 4o and have been using it considerably to learn. I find it is particularly good when we are interacting about subjects we have enough knowledge about to identify possible weak or incomplete answers and are able to follow-up with questions that lead to answers that are closer to what we wanted to know, and when we are able to judge somewhat the reliability of the answers based on previous knowledge. In the future I will try to better register Wikipedia pages and ChatGPT prompts.

OpenAI. ChatGPT 4o (omni – May 2024), accessed March-April 2025.

Wikipedia contributors. Various pages, accessed March-April of 2025

A Brief Incursion into Epistemology (to be continued)

Post author:Alex Uriarte
Post published:January 13, 2025
Post category:Fan

I walk my dog in the mornings while typically listening to a news podcast. Sometimes I get tired of the news and listen to music or search for some other type of podcast to accompany my walk. Recently I listened to a few episodes of one called “European Intellectual History since Nietzsche,” which consists essentially of recordings of a class given at Yale by an Associate Professor of History called Marci Shore. I enjoyed the first few classes but soon some things didn’t sound quite right to me – admittedly, based on my very limited knowledge – and these issues were enough to make me stop listening. No demerit to the professor, this may only reflect my own limitations and I won’t get into what those issues were because what matters is that it got me wanting to learn more about the epistemology of different philosophers.

Before jumping into epistemology, here is a brief summary of what I got out of Prof Shore’s first two episodes with a broad overview of the Enlightenment and Romanticism.

I next spent some time trying to pin down the epistemological view of different philosophers making use of a few introductory sources: the Oxford Companion to Philosophy (Honderich 1995), which I found in a used book store, Wikipedia and ChatGPT. Yes, I went there. However, I had very little confidence in what I was getting from those sources.

So I finally shifted my efforts and went to my go-to philosopher – Bertrand Russell – for, rather than a history of epistemological thought, at least his own views.

I should first clarify that epistemology can be defined in different ways, but it essentially refers to the branch of philosophy that addresses how we know. The Oxford Companion to Philosophy defines it differently in different entries written by different contributors. The entry for history of epistemology, written by Prof. D.W. Hamlyn of Birkbeck College, London, defines it as “the branch of philosophy concerned with the nature of knowledge, its possibility, scope, and general basis” (Honderich 1995, p. 242). The entry for problems of epistemology, written by Prof. Jonathan Dancy of Keele University, defines it as the “study of our right to the beliefs we have” (Honderich 1995, p. 245). I take these definitions, in combination, to be sufficient to convey what the focus of this post (and my interest) is.

Bertrand Russell has a little book called “The Problems of Philosophy,” published in 1912, and that largely focuses on epistemology. The book is not exclusively about epistemology, however, and wonders into ontological questions about the nature of reality (e.g. addressing in several parts the issue of “idealism”). I try to focus on the epistemology parts, but I do understand how the two issues are intertwined.

Here is my understanding of Russell’s views, based on the book.

What we perceive through our senses is only indirectly a physical object. We perceive what he calls “sense-data,” which are signals of the actual physical object. “Sensation” is the awareness of things through sense-data. The collection of physical objects is “matter.”

Russell distinguishes between knowing truths (e.g. savoir in French, saber in Spanish and Portuguese) from knowing things (e.g. connaitre in French, conocer in Spanish, conhecer in Portuguese). He then turns to focus on the knowledge of things.

We can “know” things directly or indirectly. The former he will call “knowledge by acquaintance,” the latter “knowledge by description.”

Knowledge by acquaintance can happen in several ways, such as through sense-data, through memory or introspection (self-consciousness, knowledge of self).

Figure 2 below summarizes his thought so far.

Figure 2. Knowledge of Things

The fact that we are able to generate inferences from what we know about things means that we are drawing on some general principles to do so. Examples are:

The principles of induction: the more two things are observed together, the more we expect them to be
The principles of logic. E.g.: if it is know that: a) if this is true that is true; and b) this is true; then c) that is true

Principles of inference are examples of what he calls “a priori” knowledge. Other examples are mathematics and knowledge as to ethical value (or the intrinsic desirability of things). Russell argues that these principles (or a priori knowledge) cannot be proved by experience. This is an old debate between empiricists and rationalists, and key in defining an epistemological view. The debate often uses the terms “innate” knowledge rather than a priori. Russell prefers to use a priori to innate because, although a priori knowledge cannot be proved by experience, he considers it to be elicited and caused by experience. In the debate between empiricists (e.g. Locke, Berkeley, Hume) and rationalists (e.g. Descartes, Leibniz), Russell considers that the rationalists were correct in that a priori knowledge cannot be derived from experience. But he thinks rationalists were incorrect in thinking that they could deduce what they know from a priori knowledge since, as mentioned above, the a priori knowledge are elicited and caused by experience.

How Russell argues that the general principles cannot result from experience alone is central to his understanding of how we know. His main argument seems to be that any generalization from experience (induction) presupposes some general principle. Therefore, no general principle can be proven by experience. He gives the example of a chicken who expects food every time it sees the person who feeds it. Every day the expectation is confirmed….until the day the person breaks the chicken’s neck. There is no logical reason that simple repetition should guarantee its continuity, no matter how much we expect it to be so, unless we associate to that expectation some general principle (e.g. a logical principle). A note: Russell does not mention causality in his argumentation, but it is my understanding that all causal argumentation presupposes logical principles, so Russell’s argument is consistent with someone bringing causality into the discussions to justify expectations based on experience. The point made in this paragraph seems simple enough, but it is key to establishing an epistemological view and, as mentioned in the previous paragraph it addresses a long standing epistemological debate. On this point, Russel makes a lot of sense to me.

A consequence for scientific thought:

“The general principles of science […] are as completely dependent on the inductive principle as are the beliefs of daily life […]. Thus all knowledge which, on the basis of experience tells us something about what is not experienced, is based upon a belief which experience can neither confirm nor refute […].” (Russell 1912, p. 40)

Deduction then also plays a part in the building of our knowledge, because we can often know general principles without inferring it from its instances (e.g. 2+2=4). Deduction starts from the general to the general or to the particular; induction starts from the particular to the particular or to the general.

Russell then asks himself how a priori knowledge is possible. Here the discussion seems to veer again quite a bit into ontological questions, since it becomes not just how to “know” a priori principles but also about the nature of a priori principles.

He first explains Kant’s view, which states that a priori knowledge is generated from the interaction of ourselves and physical objects (the “things in themselves”), what he calls “phenomenon.” We cannot know a thing in itself, only to the extent that it conforms with our own nature. If I understood Russell’s explanation, according to Kant a priori knowledge would be a product of our interaction with physical objects. This view would not quite fit empirical views, because we are not just observers, but our own nature is part of what generates our knowledge, just as much as the things in themselves and the perception we have of them. Kant’s view on this makes a lot of sense to me.

Russell, however, proceeds to make a point that I find less immediately obvious and that I am still not sold on. He argues that we should not think of our part in this phenomenon as reflecting the nature of our minds but rather, that a priori knowledge must have a nature that is neither material nor an idea. He gives as an example the law of contradiction (nothing can both be and not be), which he argues is not just a statement about our beliefs (our minds) but of the things themselves. I do not see how this necessarily follows. Why would the law of contradiction not be something that we take for given because of the structure of our minds? How do we know that, in fact, this law must apply to things, if we cannot even perceive things directly? I do not follow Russell’s argument here. At the same time, this is an ontological question and, therefore, not of particular interest to me. Whether the law of contradiction is imposed on things by our minds or is something that exists beyond matter and ideas it does not seem to have immediate consequences for how we know, in any practical way. On this matter, for now, I will stick to Kant’s view, which is more intuitive to me.

Russell then goes on to discuss the nature of a priori knowledge as being neither material, nor a product of our minds. To do so, he reaches out to Plato. To avoid thinking of a priori knowledge as an “idea” he suggests using the term “universal,” and states that the essence of universals is that they do not arise from a given sensation. He goes on to discuss the nature of universals, which I will jump here, both because he lost me and because it seems like too much of a deep dive into ontological questions for this post.

Russel’s next step is to suggest that our knowledge of universals can also be acquired by acquaintance or by description, just like our knowledge of particulars. Through acquaintance, we come to many different types of universal knowledge, such as sensible qualities (e.g. “whiteness), relations (e.g. before and after, above or below, greater or smaller than) and he states that all a priori knowledge deals with relations of universals (Russell 1912, p. 63).

Figure 3 below modifies Figure 2 to include universals in the picture, the nature of which will remain a mystery to me for now. I must say, however, that there is more to be discussed regarding the knowledge of truths. Russell’s book contains a few more chapters on this, including on intuitive knowledge, truth and falsehood, probable opinion and the value and limits of philosophy. I stopped before these, however, since the discussion of the nature of universals already stumped me and is, in any case, as far as I am willing and have time to go at the moment.

Figure 3. Knowledge of Truths*

*There is more to discuss regarding the knowledge of truths based on Russell’s book. To be continued.

Sources

Russell, Bertrand. 1912. The Problems of Philosophy. Printed version of work in the public domain.

Honderich, Ted (Editor). 1995. The Oxford Companion to Philosophy. Oxford University Press

Shore, Marci. 2024. European Intellectual History since Nietzsche. Podcast with recording of classes offered at Yale in 2023. Available on Spotify. Accessed: January 2025

Output and Income Indicators

Post author:Alex Uriarte
Post published:June 9, 2024
Post category:Fan

In my previous Fan post (“Indicators of Government Expenditures”) I noted that, when using output indicators such as GDP, we should keep in mind that: a) there are important limitations to this indicator, and b) when used, there are different indicators that may be more or less appropriate for different purposes. I develop a bit on those two points here.

On the first point, an assessment was done in 2008 by a commission led by three economists, two of which Nobel prize awardees, at the request of the Government of France, and later summarized in a book. I draw from it here, although additional details are available online.¹

The commission was led by Joseph Stiglitz, Amartya Sen (both Nobel Laureates), and Jean-Paul Fitoussi. Other economists were also part of the commission. The commission was divided in three working groups:

One to focus on standard issues of national accounting, such as measuring government output and treatment of household production;
A second group focused on the relationship between output measures and efforts to measure well-being or quality of life;
A third group looked at attempts to capture sustainability in measures of output.

On the “classical GDP Issues,” GDP mainly measures market production, and one reason why money measures have come to play an important role in our evaluation of economic performance is that money valuations facilitate aggregation. However:

Prices do not exist for some types of output (e.g. government services provided free of charge or household services such as child care);
Market prices may not reflect consumer’s appreciation of goods and services if there is imperfect information (e.g. financial products, telecommunications bundles);
Market prices may not fully reflect societal evaluation due to externalities (e.g. environmental costs);
Collecting accurate data may be challenging when there are sales or differences in prices among alternative selling mechanisms (e.g. online vs store prices);
Accounting for quality of products and changes in quality is challenging and may not always be reflected in prices;
Underestimating quality improvement means overestimating inflation, which, in turn, means underestimating real income.

These are not minor inconveniences, but real issues, and the extent to which GDP measures are distorted by them is not clear. They discuss in some length the issues with measuring services, for example. Services account for up to two thirds of output and measuring the quality of services is challenging. Measuring government provision of services, for example, is often done through inputs, which leave aside the possibility of capturing changes in productivity. Attempts to measure government services using outputs face known challenges, such as accounting for quality. What services are considered final and what intermediate (or “defensive”) services is difficult to define. E.g.: government costs with prisons? Private costs with commuting?

The authors suggest five ways of dealing with some of the deficiencies of GDP as an indicator of living standards:

Emphasize well established indicators other than GDP
- Gross, rather than Net, has the issue of not accounting for the amount of output that is needed to maintain capital goods (depreciation). When technology is changing rapidly, this could be substantial and the difference between Gross and Net can be considerable. Then – consider “Net” (although depreciation is hard to estimate);
- Product, rather than Income, has the issue of not being as good for accounting household consumption and, therefore, associated well-being. The difference is the purchasing power sent to and received from abroad (net income from abroad). Also, changes in the relative prices of exports and imports will affect national income even if domestic product stays the same. Consider “Income;”
Consider wealth jointly with consumption to capture consumption possibilities over time;
Bring out the household perspective
- Adjusted disposable income accounts for government taxes and monetary transfers but not for transfers in kind;
Add information on the distribution of income, consumption and wealth:
- Median is better than average, but depends on survey data and these have known challenges:
  - Unit of measurement? Consumption unit?
  - Measuring property income?
  - International comparability
  - Whose bundle of consumption?
  - Changes in the provision of services within households or between families to provision by markets creates distortions
- Also, we should be looking at distribution of full income, not just market income, including values such as household income and leisure
Widen the scope of what is being measured (may require imputation):

- Recommendation is to keep a satellite account because: a) imputed values are not as reliable as observed values; b) non-observed values could end up being a very large share of total output. E.g.:
  - Household work, under the authors estimates, could be 30% of currently measured GDP;
  - Leisure could be 80%;
- They still recommend it be done for a) completeness; b) the invariance principle – under which the value of a good or services should not depend on the institutional arrangement under which it is provided (e.g. free by state or charged by private sector).

The other two areas taken on by the commission working groups are more intuitive to me, even if not easy to address so I only briefly summarize the conclusions of the corresponding working groups:

On the relationship between output measures and efforts to measure well-being or quality of life, the argument is that these latter concepts cannot be reduced to resources. Efforts to measure well-being and quality of life have either attempted to measure subjective perceptions, tried to assess capabilities that would enable and support human functioning (health, education, security…) or tried to identify how individuals themselves weigh the non-monetary aspects of their well-being. All these attempts face challenges, including how to incorporate inequalities, how to access the linkages between the various dimensions of well-being or quality of life, and how to aggregate them;
On attempts to capture sustainability in measures of output, there is a large and varied literature that the commission divided in four groups: attempts to establish large dashboards with sets of indicators addressing different aspects of sustainability; attempt to develop composite indices; attempts to develop adjusted GDP indicators; and indicators focusing on overconsumption or overinvestment.

What do I draw from the above? A few initial thoughts:

When using an indicator of output growth for a selected country or group of countries, I have typically used the World Bank, World Development Indicators (WDI), Gross Domestic Product (GDP) series in Local Currency Units (LCUs). I have used LCUs when looking at growth instead of alternative monetary units, to avoid the influence of short term fluctuations of exchange rates. Attempts to correct for this influence, such as the World Bank’s Atlas measure (more on this below) or the use of Purchase Power Parity (PPP) measures seem unnecessary, given their imperfections and that we are only interested in growth and not in comparing the absolute value of output among countries. This series can be used to break down domestic output in its expenditure components (G+C+I+Ex-Im+changes in inventories), as well as by sector of the economy (agriculture, industry and services)². It is available for a period of over 60 years for most countries. Based on the input above:
- The use of output rather than income indicators when looking at growth seems reasonable to me and perhaps more relevant: it better reflects the production capacity of a country (rather than its standard of living) and, for most countries, output and income do not tend to diverge much over time (although this may not always be the case and would be interesting to look at the data).
- The fact that GDP indicators do not capture household production means that growth is likely overestimated during periods where agricultural production for own consumption is reduced and production for the market is increased. GDP growth is also likely overestimated during periods of increased entry of women in the labor market, if this also means decreased services within the household. I would need to further research the WB WDI methodology to see the extent to which the WB tries to address this issue in their measurements;
- The extent to which the informal economy is captured also requires further look into the WB WDI indicator methodology. If it does not capture the informal economy well, growth would also be overestimated during periods of formalization.
I have used The World Bank, World Development Indicators (WDI), Gross National Income (GNI) series in Purchasing Power Parity (PPP) when comparing countries. I have preferred to use at the concept of income (what belongs to the residents of a country) rather than product (what is produced within the boundaries of a country) when comparing countries because it is a better indicator of resources available to the local population. For cross country comparisons, PPP measures (even if imperfect) allow some correction for price and exchange rate distortions regarding how much residents of two compared countries can actually purchase with their income. This series is available for fewer years and countries. Based on the input above:
- Periods of rapid technological transformation – such as the one we are in now – are likely generating considerable distortion in our relative measurements of income by country, given the challenges in addressing quality of products and services. To the extent that we are able to use net indicators (as opposed to gross), accounting for depreciation in such periods is also a more serious challenge and a source of distortion.
- Does our association of value with market prices mean that our association of income per capita with productivity is somewhat distorted? I explain: think of luxury goods, where price is not necessarily associated with quality but where status of a brand plays an important role in product prices. Countries with heavy presence of luxury industries will have their per capita incomes associated with this higher price that is fabricated by the status of their products rather than by the quality of their products. How we understand the productivity of their population would need to be interpreted in this context (Italy, I am thinking of you).
- Do the decaying European houses (that we think of as so charming) mean that European household income tends to be overestimated by the use of gross measurements?
- On the other hand, does the fact that we do not capture the value of leisure underestimate European household income relative to countries like the US?
The World Bank uses GNI per capita in US dollars converted from local currency through the Atlas method to classify countries in income groups (low income, lower middle income, higher middle income and high income). The Atlas method is based on three year moving averages of exchange rates. They use the Atlas method rather than PPP arguing that “issues concerning methodology, geographic coverage, timeliness, quality and extrapolation techniques have precluded the use of PPP conversion factors for this purpose” (World Bank, undated). This seems to also be the indicator the WB uses for establishing the annual threshold for countries to qualify for International Development Association (IDA) loans. The US Millennium Challenge Corporation (MCC) uses the WB country income groups to select countries that qualify for its assistance (low income and lower middle income). Based on the input above:
- If we underestimate income in low-income economies, given that they often also have larger portions of their economies not captured by GNI measurements (greater presence of subsistence agriculture, household production and services, informality), what does this mean for our categorization of countries in income groups? How distorted are these classifications? Should we be interpreting them as rather “market income” groups? If so, to what extent are our foreign assistance programs directed at increasing “market income,” rather than income as a whole? To what extent are our foreign assistance impact evaluations distorted by not recognizing this distinction?

Notes

There used to be a site with technical papers at the URL: www.stiglitz-sen-fitoussi.fr . This seems to no longer be available but I found a link to the content here: https://web.archive.org/web/20150622185128/http://www.stiglitz-sen-fitoussi.fr/en/index.htm
The WB World Development Indicators reports total value added at basic or producer prices and GDP at purchaser prices. That is why their measurements differ. Purchaser prices include taxes and exclude subsidies. For more information, see here: https://datahelpdesk.worldbank.org/knowledgebase/articles/114948-what-is-the-difference-between-total-value-added-a

References

Stiglitz, Joseph A; Sen, Amartya; and Jean-Paul Fitoussi. 2010. Measuring our Lives: Why GDP Doesn’t Add Up. The Report by the Commission on the Measurement of Economic Peformance and Social Progress. The New York Press.

World Bank. Undated. Why use GNI per capita to classify economies into income groupings?. Available: https://datahelpdesk.worldbank.org/knowledgebase/articles/378831-why-use-gni-per-capita-to-classify-economies-into. Accessed: June 08, 2024.

Indicators of Government Expenditures

Post author:Alex Uriarte
Post published:April 14, 2024
Post category:Fan

The International Monetary Fund (IMF) has a couple of public dashboards showing government expenditures as a percentage of Gross Domestic Product (GDP), by country. See here and here. There is nothing wrong in doing this if we keep in mind that we are using GDP as a denominator just as a tool to give us a reference of the relative size of government expenditures in different countries. But, based on this kind of data, it is common to hear things like “government expenditures were 61% of the entire French economy or 45% of the US economy in 2020,” as if these numbers were breaking down the total of the economy (100%) in its government and non-government portions. This would be incorrect and, unfortunately, it ends up supporting all sorts of confused discussions about the role of government in the economy.

The comparison between government expenditures and GDP is one of apples and oranges and only makes sense if we understand, again, that GDP is being used as a denominator only as a convenient tool to facilitate country comparisons. Government expenditures, as reflected in databases like that of the IMF, are measures of total expenditures, either by central and local governments or just by central governments (depending on the country), over a one year period. GDP does not measure total expenditures, but rather “value added” by the economy over a one year period. The difference is that measures of value added discount from measures of expenditures, the purchases of intermediate goods and services used to provide the goods and services by the sector in question. Value added is used when measuring output by sector, to allow summing these sectors without double counting. The result is a general measure of output, such as GDP.

To illustrate, see the table below (Figure 1). The second column shows the government as a share of GDP in 2020 for selected countries, as measured in total expenditures and reported by the IMF. The third column shows government consumption as a share of GDP, as measured in value added and reported by the World Bank World Development Indicators. The actual share of the GDP that corresponds to the government would need to add government investment (fixed capital formation) to government consumption. These data were not readily available for most countries in the WB WDI dataset and it seems like disentangling government and private fixed capital formation is not very simple. So I added total fixed capital formation (public and private) to government consumption, for the sake of comparison with IMF numbers (fourth column). The actual weight of the government in GDP should be somewhere between columns three and four.

Figure 1. Government Relative to GDP, Selected Countries, 2020

Country	Government Expenditures as % of GDP (IMF)1	Government Consumption (value added) as % of GDP (WB)2	Government Consumption +Total (public and private) Fixed Capital Formation (value added) as % of GDP (WB)2
France	61.35	24.84	48.12
Germany	50.46	22.02	43.57
Brazil	49.92	20.14	36.70
United Kingdom	49.87	22.60	40.07
United States	44.82	15.09	36.94

Sources: 1. IMF DATAMAPPER. Fiscal Monitor, October 2023, https://www.imf.org/external/datamapper/G_X_G01_GDP_PT@FM/ADVEC/FM_EMG/FM_LIDC. 2. World Bank World Development Indicators. Accessed April 2024, https://databank.worldbank.org/source/world-development-indicators.

Note: government expenditures in 2020 were generally higher than usual, as countries tried to minimize the economic effects of the COVID 19 pandemic.

I am sure there are better data out there somewhere but, after spending some time trying to unbury the IMF metadata (should be more easily findable) my patience was running low. For the US, see data from the Bureau of Economic Analysis which defines the value added by Government as being “the sum of compensation paid to general government employees plus consumption of government owned fixed capital (CFC), which is commonly known as depreciation (BEA, 2008, p.29).” My point still holds.

Another way of looking at the actual weight of government expenditures in the economy would be to compare, not with GDP, but with total output in an economy over a one year period, that is, not discounting intermediate products and services. Country national accounts typically do show this indicator and it tends to be roughly twice as large of the total value added in any one year. The ratio of total output to value added is available in Table 2.6 of the United Nations (UN) National Accounts Statistics. Figure 2 below applies that ratio to the IMF indicator of government expenditures as a share of GDP to obtain a rough estimate of the share of government expenditures over total output in the last column of the table. Note that the resulting estimates are within the range of columns 3 and 4 of Figure 1.

Figure 2. Government Relative to Total Output, Selected Countries, 2020

Country	Government Expenditures as % of GDP (IMF)1(a)	Ratio of Total Output to Value Added (UN)2 (b)	Rough Estimate of Government Expenditures as % of Total Output (a/b)
France	61.35	1.95	31.42
Germany	50.46	2.03	24.83
Brazil	49.92	2.07	24.14
United Kingdom	49.87	1.89	26.40
United States	44.82	1.77	25.39

Sources: 1. IMF DATAMAPPER. Fiscal Monitor, October 2023, https://www.imf.org/external/datamapper/G_X_G01_GDP_PT@FM/ADVEC/FM_EMG/FM_LIDC; 2. UN National Accounts Statistics. Main Aggregates and Detailed Tables. Table 2.6, Accessed April 2024, https://unstats.un.org/unsd/nationalaccount/madt.asp?SB=1&#SBG.

Again, I am sure there are better data out there, but the fact that I had to spend considerable time deciphering the data above and still don’t have non-misleading comparable cross-country data for the actual size of government expenditures relative to total output is of relevance itself for my purposes on this blog.

Other than the issue of comparing apples and oranges, there are additional considerations we need to make when assessing statements like the ones I made above (“government expenditures were 61% of the entire French economy or 45% of the US economy in 2020”). One is about what we are supposed to infer from looking at government expenditures. If a measure is provided as a reference for the extent to which governments participate in the economy, using expenditures ignores the entire side of government regulation, which, in market economies, is likely at least as important as government expenditures to understand the influence of the government in the functioning of an economy. Looking beyond total expenditures and into their breakdown by levels of government, by consumption and investment, and other disaggregated data would likely also contribute to a much richer and productive discussion, not to mention the large literature on taxation, as well as financial indicators of debt and debt sustainability. These are all subjects that the IMF delves into professionally and releases publicly a lot of information about, even if not always easy to decipher. I can’t help wondering, however, whether sites like those of the IMF dashboards linked above are actually doing more harm than good by stressing one small and misleading indicator of government participation in the economy.

Another consideration in interpreting data such as that shown in the IMF dashboards is about GDP and what it represents. Although we often think of it as an indicator of the size of the economy: a) there are important limitations to this indicator, and b) when used, there are different indicators that may be more or less appropriate for different purposes. I will look at these issues in a future post.

References

BEA (Bureau of Economic Analysis). 2008. A Primer on BEA’s Government Accounts, by Bruce E. Baker and Pamela A. Kelly. Available: https://apps.bea.gov/scb/pdf/2008/03%20March/0308_primer.pdf?_gl=1*1anuf1l*_ga*NjM4MDQ4ODA2LjE3MTI3Nzc2ODE.*_ga_J4698JNNFT*MTcxMzExMzg4NC44LjAuMTcxMzExMzg4NC42MC4wLjA. Accessed: April 14, 20244.

BEA (Bureau of Economic Analysis). 2010. Frequently Asked Questions: BEA seems to have several different measures of government spending. What are they for and what do they measure? Available: https://www.bea.gov/help/faq/552 Accessed: April 12, 2024

International Monetary Fund (IMF). 2023. IMF DATAMAPPER. Fiscal Monitor, October. Available: https://www.imf.org/external/datamapper/G_X_G01_GDP_PT@FM/ADVEC/FM_EMG/FM_LIDC; Accessed: April 14, 2024.

United Nations (UN). 2024. UN National Accounts Statistics. Main Aggregates and Detailed Tables. Table 2.6, Available: https://unstats.un.org/unsd/nationalaccount/madt.asp?SB=1&#SBG; Accessed: April 14, 2024.

World Bank. 2024. World Development Indicators. Available: https://databank.worldbank.org/source/world-development-indicators; Accessed: April 14, 2024

Mental Models and Academic Models

Post author:Alex Uriarte
Post published:April 7, 2024
Post category:Fan

Every year, the World Bank publishes a World Development Report, an analysis of a selected aspect of Economic Development and its status in the world at the time. In 2015, the selected theme was “Mind, Society, and Behavior.” In this report, the WB argues that there have been advances in our understanding of how people make decisions, and that this better understanding can be used to increase the effectiveness of development interventions.

They highlight three principles of human decision making:

Many of our decisions are done quickly, making use of an automatic and effortless system of thinking that contrasts with the slower and more deliberative and thoughtful process that we often identify with rational decision making processes. This argument builds on the work of psychologists such as Daniel Kahneman and Amos Tversky and I have discussed this in other blog posts on this site as well.
Our individual decision making is not really just individual, but influenced by the society around us: social preferences, norms, identities. We cannot assume that factors (preferences) taken in consideration in individual decision making are not shaped by the communities in which we are embedded.
The social influences that we receive are embedded in “mental models:” worldviews, stereotypes, simplifying concepts and categories that we use for decision making

The consequence of the three principles is that our decision making is influenced by “culture,” deeply rooted beliefs and practices that we often take for granted and may not even recognize. These beliefs and practices may favor or be detrimental to the achievement of desired development goals by any community. When they are detrimental, breaking the cultural patterns may require addressing social practices and institutions before individual incentives and decision-making can change.

The authors argue that “recognizing that individuals think automatically, think socially, and think with mental models expands the set of assumptions policy makers can use to analyze a given policy problem and suggests three main ways for improving the intervention cycle and development effectiveness:” (p. 192)

“First, concentrating more on the definition and diagnosis of problems, and expending more cognitive and financial investments at that stage, can lead to better-designed interventions. […]
Second, an experimental approach that incorporates testing during the implementation phase and tolerates failure can help identify cost-effective interventions […]
Third, since development practitioners themselves face cognitive constraints, abide by social norms, and use mental models in their work, development organizations may need to change their incentive structures, budget processes, and institutional culture to promote better diagnosis and experimentation so that evidence can feed back into midcourse adaptations and future intervention designs.” (p.192-193).

The recognition by the World Bank of the role that culture plays in development, through the functioning of mental models, came on the tails of increased attention paid to behavioral sciences. The report often cites, for example, the work of Nobel Prize laureates Esther Duflo and Abhijit Banerjee, that (among other things) call attention to the fact that there is evidence from randomized control trials that how foreign aid is designed and delivered often matters for their effectiveness. One of several members of the Advisory Panel to the World Bank report was Cass Sunstein, a legal scholar that, among other things, wrote a book called “Nudge” arguing that policy design and delivery can affect the choices that people make. As I write this post, he also happens to be the husband of the USAID administrator Samantha Power. When she took office in 2021, she seemed to bring the belief that the Agency could use more insights from behavioral science in its own design and delivery of foreign assistance activities and even brought her husband to speak to USAID staff.

Of particular interest to me is the role played by mental models as devices that seem inherent to our nature, to how our brains work, and necessary for our daily functioning and (often unconscious) decision making, but that simultaneously can be detrimental to our goals and hard to break from.

How far can a parallel be drawn with academic models?

Academic models would seem, at first, to be quite the opposite of mental models. They belong to the “thinking slow” realm, we are conscious of their assumptions, the connections between those and their implications, and are able to modify them, consciously, as needed to better explain what we observe in our reality. They would seem to only have in common with mental models, the fact that they are, umm… “models,” simplifications of reality used to allow us to deal with its complexities in a productive way. However, academic models too have a way of inserting themselves into our unconsciousness and biasing our thinking over time, to the point that we are no longer able to recognize this effect.

Hoping for some more insight on how academic models allow us to better understand reality, I found a 2008 paper by Mary S. Morgan and Tarja Knuuttila titled “Models and Modelling in Economics,” which I understand was later (in 2012) published in the “Handbook of Philosophy of Economics” edited by Uskali Mäki and published by Elsevier. I should state upfront that I do not know the extent to which the draft I found was edited before being published three years later. A quick internet search shows that Oxford published its own Handbook of Philosophy of Economics in 2009, as did Routledge in 2021. Philosophy of economics is the kind of subject that interests, annoys and troubles me, all at the same time. I do have a genuine interest in how we claim to know things, but my interest in economics always came from a practical standpoint of wanting to improve the conditions in which the populations I came from lived in. So having to spend too much time on these issues to be able to digest economic theory always struck me as simultaneously necessary but too time consuming, perhaps beyond my capacity to fully grasp, and potentially a waste of my time. In the end, my failure to overcome my methodological or philosophical discomfort with economic theory in general became a source of personal internal conflict and, hence, the troubling nature that these discussions have for me.

But Morgan and Knuuttila’s paper did seem promising to shed some light on these matters, so I dived into it and I will summarize my understanding and takeaways here.

They distinguish between two major views of models in economics: as “idealized entities” or “purpose built constructions.”

As idealizations, they can be viewed as generalizing, abstracting, simplifying and/or isolating, for reasons such as facilitating deductive reasoning or for mathematical tractability. This can be done by identifying aspects of reality that are considered absent or negligible or that can be ignored because they remain unchanged over the time, place or scope of analysis. Often the idea is that, once used for its purpose of analysis, models can be “de-idealized,” or made more concrete, by adding back specificity. The practice of “idealizing” and “de-idealizing” is not simple or inconsequential, however. Deductions from an idealized model may not necessarily hold when more specificity is added back. In fact, the risk of distortion in each direction of the process is considerable.

A couple of traditional discussions around this view of models are, first, the traditional view that data without models reflecting theory can suggest spurious relationships and, therefore, data should follow models that should follow theory. A more recent discussion is on whether it is best to build models from the more general to specific or the other way around. Theory and data jointly feed model building under one or the other approach.

The second view of models, as purpose built constructions, sees them as “fictional entities,” “autonomous objects,” that are not constructed in relation to an observed reality, nor are they necessarily related to theory, or perhaps they are related to just one particular aspect of the observed reality or theory, but are simple tools for thought, perhaps creating a parallel, stylized reality that allows for understanding of the connecting between cause and effect, or “for a variety of purposes.”

My experience with economic theory in the past suggests a greater prevalence of the second view relative to the first, at least in the academic circles I was a part of. That is why I also came to see as a reasonable justification for economic theory that it can show that some results are possible to be observed, and it can show a set of conditions under which those results can be observed (sufficient conditions). This is useful often to demonstrate that what we observe does not necessarily mean that “a” or “b”is true, but could also mean that “c” is true. It helps dispel many myths that we create in our daily lives by not understanding the many circumstances that can lead to what we observe. However, being able to show all possible sufficient conditions for an observation to be true (i.e. the collection of scenarios that, seen as a whole, constitute the necessary conditions for an observation to be true), is much harder. A set of identified sufficient conditions may turn out to only be one of many sets of conditions under which the same result is observed. That is where the usefulness of these theoretical models ends. In other words, these constructed models tell us a lot of what reality is not necessarily, but very little about what reality is.

Morgan and Knuuttila then seem to suggest that, under either view of models, a more useful way to look at them could be one focused on their function: “instead of trying to define models in terms of what they are, a focus could be directed on what they are used to do (p28).”They go on to argue that this would be the focus less on the models themselves but on the process of modeling.

The parallel I was seeking with mental models is not in this paper, after all. The authors only very tangentially allude to it early on when they state that “economics shares an hermeneutic character with other social sciences […] individuals’ knowledge of economics feeds back into their economic behavior, and that of economic scientists feeds in turn into economic policy advice, giving economics a reflexive character quite unlike the natural sciences.” In my experience, no matter how rigorous academic economist may believe themselves to be in their views and uses of models, the moment they are asked to give their opinion about reality they will refer to them (irrespective of their constructive or simplified nature) and draw suggestions about reality that the models themselves to not allow them to. Academic models become academics’ mental models, and the myriad of assumptions, conditions, and circumstances under which they may inform reality get lost in the process.

I also found lacking in Morgan and Knuuttila’s paper a discussion of the possibility or not of actually “testing” models with data. Here too they tangentially allude to it when noting that econometric models are not just testing mathematical models built on theory (selection of variables and causal relationships) but are often simultaneously testing the data, assuming probabilistic distributions, functional forms, the nature of observed errors and stochastic behavior (p15).

The bottomline for me is that, as I found the World Bank’s effort to assess the role of mental models in development practice to be refreshing, I suspect I would find equally refreshing a critical look at the impact that academic modeling has had on our economic understanding as applied to our daily practice, the good and the bad of it. The discussion of “robustness” of academic models, touched in passing in Morgan and Knuuttila’s paper, makes some headway in this direction, as it recognizes the need to question whether models hold to changes in assumptions, place, time, circumstances. A step further would be to ask ourselves the extent to which we fall from academic rigor when we translate academic models to our daily view of the world and start confusing our models with reality. To be continued.

References

Morgan, Mary S. and Tarja Knuuttila. 2008. Models and Modelling in Economics. Forthcoming in U. Mäki (ed) Handbook of the Philosophy of Economics [one volume in Handbook of the Philosophy of Science, general editors: Dov Gabbay, Paul Thagard and John Woods]. Available: chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://sites.pitt.edu/~jdnorton/teaching/Phil_Sci_Core/HPS_2501_2020/more_pdf/Knuuttila_Morgan_Models_2009.pdf/. Accessed: April 07, 2024.

World Bank. 2015. World Development Report 2015: Mind, Society, and Behavior. Available: https://www.worldbank.org/en/publication/wdr2015. Accessed: April 07, 2024

Management as a Balancing Act: A Personal Account

Post author:Alex Uriarte
Post published:February 9, 2024
Post category:Fan

As I explain in the “Home” and “About” pages of this site, how we know has been a lifelong interest of mine and is a theme throughout this site. It is, therefore, with some discomfort that I write this personal account. Any personal account is anecdotal evidence and of a particular kind: it rings true to ourselves because we lived it, but it is also subject to our own unrecognized biases. So the value of a personal account in a learning blog is hard for me to gauge. In any case, as I write this post, I have been for eight years leading a team providing services to USAID. I feel it is almost an obligation to myself to want to try to draw lessons from this experience. So here goes.

In my role of leading and managing a team, I see my performance as having greatly benefited from luck. This luck is of two kinds: a) I benefited from a few personal traits that I developed over time through no merit of my own; b) I benefited from the environment I was placed into when starting to lead this team. The personal traits that I think helped me have been some degree of humility and empathy born out of not so memorable events that resulted in conflicting feelings of superiority (ugly, yes, I know) and failure, confidence and insecurity, and I think may have translated into a relatable and approachable style of leadership (I will not delve any further here). The environment that I was placed in and that also helped me, was a receptive group of young, kind and competent staff, that was delivering day after day on its own and had built a culture of collaboration and collegiality. For whatever reason, I was embraced. Independent of my own feelings of luck, personal psychological history or additional details of my team, the important part is that, early on, my role on this team was steered by supportive personal relationships, rather than any particular management capabilities I brought to the team.

Over time, however, my team grew both in size as it did in scope and complexity of the work we were asked to take on. It gradually became clear to me – and I believe to others on the team – that supportive and collegial relationships would not be enough to sustain a successful team performance. We needed to improve in ways that none of us were very familiar or comfortable with.

The first direction we sought was towards better defined and established processes, management tools and standards of operation and behavior across the team. The Project Management Institute’s Project Management Body of Knowledge (PMBOK), 6th edition (2017, the only one I have) has a table distinguishing between leadership and management (p. 64) that I reproduce below:

To me the central point is the fourth row: leadership focuses on people, management focuses on systems and structure (I would say processes and tools). We needed to continue nurturing a leadership structure and culture that we thought had been successful until then, while improving on processes and workflows, and providing the team with tools (software, templates, established standards and procedures) that would enable us to gain in effectiveness and efficiency. This all may sound like jargon, but it is really what we thought we needed to do.

I believe we have advanced considerably in this direction, although there is still much to be done and it is an ongoing effort, so I will not provide details in this post (perhaps in another, future one). But I would like to highlight that, moving towards better established processes, standards and tools, does not replace the role of leadership. I have found that professional managers often dive into the PMI management jargon and guidance, while forgetting that the PMBOK itself distinguishes between management and leadership, and attention to both are necessary for the good functioning of a team. Our own efforts and time must be geared towards both management and leadership, This is the first balancing act.

A second and more recent direction we’ve been pursuing is in assessing what kinds of top-down authority we need to allow ourselves to exercise and enforce, unapologetically. As mentioned, our team relied largely on a supportive and collegial culture to function. That is a good thing. But, as such, establishing authorities of a more hierarchical nature is not always easy: it comes with a risk of creating unhealthy power relations. A common illustration that I have found in several places on the internet, mostly blogs, is the one below.

In several blogs, I have seen this picture accompanied by text that goes something like: “when the top guy looks down, they only see s***, when the bottom guy looks up, they only see a**h*****.” The top guy is portrayed as a manger or a CEO and the layers typically reflect layers of management. This is a common view of a top-down management structure. We wanted to avoid the negative relations that are often associated with such structures. However, we did find that some degree of top-down enforcement is needed to enforce minimum standards across a team, and minimum levels of accountability and fairness.

As with the effort to establish better management processes and tools, this effort to better establish authorities and accountability is also ongoing, and here too I will not get into detail in this post. But this is a second balancing act: establishing clear expectations, responsibility, accountability and a structure to enforce such accountability, without losing the supportive and collegial culture built collectively over time.

So, for whatever it’s worth, there is my personal account. I’m sure I will come back to this in the future, with the critical eyes of our ever transforming selves, as it should be, neither kind, nor mean, but hopefully as honest as self-assessments can possibly be. I also hope to, in the future, further develop how we’ve been rolling out our efforts to better balance leadership and management, bottom-up and top-down structures and processes, the extent to which our efforts succeeded and any insights from the experience.

References

Project Management Institute (PMI). 2017. A Guide to the Project Management Body of Knowledge. PMBOK Guide. Sixth Edition

Poverty Traps and Foreign Aid

Post author:Alex Uriarte
Post published:December 29, 2022
Post category:Fan

I recently finished reading Esther Duflo and Abhijit Banerjee’s “Poor Economics.”

I have at least three topics in my mind from this reading and will need to discuss them in separate posts:

The concept of a Poverty Trap and its application.
The value and applicability of Randomized Control Trials (RCTs).
The value of foreign assistance for poor countries and the conditionality of that value on exactly what is done with foreign assistance resources.

I might be able to address the last two bullets jointly in a future post, given current trends in US foreign assistance towards evidence from RCTs. In this post, however, I will focus on Poverty Traps.

Duflo and Banerjee describe poverty traps as follows:

“There will be a poverty trap whenever the scope for growing income or wealth at a very fast rate is limited for those who have too little to invest, but expands dramatically for those who can invest some more. On the other hand, if the potential for fast growth is high among the poor, and then tapers off as one gets richer, there is no poverty trap” (p.11).

Adapting figures 1 and 2 from their book, in the figure below, function G (the S-shaped curve) represents a situation where an individual faces a poverty trap, if their income is currently below the level C. In that situation, a boost in income from say, income level A to income level B, is insufficient to support a sustainable increase in income over time and income levels will tend to fall back to A. Function F, on the other hand, represents a situation where an individual faces no poverty trap at any income level. A boost in income above the income level A would allow the individual to continue investing until income level D is reached (sustainable levels of income are those where Yt+1 = Yt, that is, where the function that maps Yt+1 to Yt crosses the 45° line).

The concept of a poverty trap is at the center of their book. The authors argue that, in a situation where there is a poverty trap, foreign aid can be of assistance to allow people to climb out of poverty. In a situation where there is no poverty trap, there is little justification for foreign aid.

They argue that whether there is or not a poverty trap – and, therefore, whether foreign aid is justified – is an empirical question that needs to be answered with evidence for any particular situation, and that it cannot be answered generally for all situations.

They contrast their position with economists that advocate more broadly for foreign assistance, or even greatly expanding it (e.g. Jeffrey Sachs in his well known book The End of Poverty) and other economists that largely oppose it (e.g. William Easterly in his also well known book The White Man’s Burden)¹.

Much of the remainder of the book is spent looking at particular cases and whether they show evidence of a poverty trap or not. Those cases pertain to hunger, health, schooling, family planning, risk, access to credit, savings, and entrepreneurship. Evidence is collected typically from RCTs.

In some cases the authors find evidence of a poverty trap, in others they do not. Their main message seems to be that aid can be effective if directed to projects where there is evidence of impact, under conditions where there is evidence of impact, implemented based on evidence of what works and what doesn’t, and avoiding what they call the “three Is:” ideology, ignorance and inertia.

As previously mentioned, there is much more to this book and I hope to come back to it in future posts. But for this post, I want to register a few thoughts on the poverty trap concept and its application to foreign aid.

First, if poverty traps are looked at as circumstances that certain individuals face and that impede them from overcoming poverty, they could exist in both rich and poor countries. If so, the difference in the presence of poverty between rich and poor countries would need to be explained by: a) relatively more people facing poverty traps in poor countries than in rich countries; or b) better standards of living for the poor facing poverty traps in rich countries than those in poor countries, due to other factors such as government assistance; or c) both. The question is then whether foreign assistance should target helping people overcome poverty traps or helping reduce the existence of poverty traps in the first place. The book’s last chapter, on policies and politics, can be interpreted as arguing that there is space for direct assistance to the poor, and that these changes can, overtime, affect the prevalence of poverty traps (by affecting political and economic institutions from the bottom up).

Second, if we study a poor country and come to the conclusion that there are few poverty traps and that the poor are poor because of their own choices, Duflo and Banerjee would seem to suggest that there is little that foreign assistance can or should do and that most economists (from Easterly to Sachs) would likely agree: in this scenario, the poor were given a choice and they chose to be poor, it is their right, we have no place in imposing anything different. Is this really the case? What about the children that are being affected by the choices of their parents? What about inter-temporal inconsistencies in preferences: would the current behavioral science literature suggest there would still be a value in assisting the poor to address those inconsistencies? Are there externalities where the choice of someone to be poor affects the well-being of the rest of society, or would attempting to affect those choices just be stepping on individual liberties? I have not given any thought to these questions but leave them here for rhetorical purposes, potentially to address them in the future.

Third, having grown in poor countries with large inequalities of income (Brazil and Paraguay), I always thought that exposure to the consumption habits of the rich negatively affected the choices made by the poor. The greater the income inequality in a country, the more it would seem the poor are routinely tempted to make consumption choices they cannot afford. Similarly, those living in poor countries are routinely exposed to the living standards of rich countries through movies, television, commercials, social media, tourism, trade; with potentially a similar effect on the consumption and investment choices made by those living in poor countries. It would seem this would generate an added deterrent to poor country saving and investment choices, relative to those faced by countries who grew at the forefront of technological development. Is there anything rich countries can or should do to help poor countries deal with this effect that they may not have suffered themselves? Again, the question is rhetorical. I may give it some thought in the future.

The debate between Easterly and Sachs has served as a reference for the discussion of the value of foreign aid for over a decade. I actually had the opportunity to provide both readings to students of mine in a course I gave in 2007 on Foreign Aid at the Catholic University of Rio de Janeiro, shortly after the books came out.

References

Duflo, Esther and Banerjee, Abhijit V. 2011. Poor Economics. A Radical Rethinking of the Way to Fight Global Poverty. New York: Public Affairs.

Our Brains and Decision Making

Post author:Alex Uriarte
Post published:November 20, 2022
Post category:Fan

I recently read David Eagleman’s book “The Brain. A Story of You.” I also watched the associated PBS documentary. The book and documentary follow each other very closely and the documentary allows you to see some of the people and experiments referred to in the book. A couple of points made by David Eagleman made me think again about the common use of the terms “data-driven decision making” and “data-informed decision making,” the extent to which our decisions are made based on evidence presented to us, and what evidence exactly do we make decisions based on.

I will leave a more detailed review of the use of the terms “data-driven,” “data-informed” or “evidence-based,” for another post. But these terms are typically used without a recognition of how much we impose on our analysis of data our prior beliefs and assumptions. From the moment we ask ourselves a question, we are choosing what interests us. When we decide what data we need to look at, we are making assumptions about what data matters, based on our experience, reasoning and assumed knowledge of the world. When we actually obtain data, our analysis is now constrained by data availability and what they represent: how the data were defined and collected. The recognition of this dependency on prior beliefs and “learned” experiences should make the use of the term “data-driven” highly problematic. But it should also make us question what exactly we mean by “data-informed” or “evidence-based” decision making. What evidence exactly are we talking about and how exactly are we using it?

With this in mind, I found a couple of points made by Eagleman to be illuminating.

One of the points is that decisions often (perhaps most often) require connecting the analytical parts/networks of the brain to the emotional ones. Without the connections to our emotions, we are often unable to make decisions. The book provides a couple of examples, such as a woman who, due to a motorcycle accident, had these brain connections weakened and found herself unable to make daily decisions, such as what to wear, eat or to do during the day. Another example was an experiment where decisions were reverted when emotional factors were brought into play even though the choices were, analytically speaking, unaltered. The insight is that choices often involve many factors offering trade-offs and that our logical brain cannot often assign values to those trade-offs to make a decision. The values are assigned based on bodily/emotional signatures built from past experience. Without those, decisions are often not possible. These signals are often embodied in the release or suppression of hormones affecting transmission of stimuli between neurons. Hormones such as dopamine or oxytocin. As we acquire new experiences, the stimuli that these neurotransmitters produce in our brains are often adjusted based on the confirmation or frustration of past experiences (differences between expectations and reality). That is how we learn.

Another point made in Eagleman’s book is that our brains are primed for social interaction. We are wired to see social intention where it does not exist. This is exemplified by an experiment where a short film of geometrical objects moving around a screen tends to induce subjects to interpret the movements as if telling a story, where the objects would move intentionally as if they represented humans or animals. Further, in the same way as we tend to humanize objects, we also sometimes dehumanize other people, presumably when seeing them as humans creates a burden we consider too much to bear (e.g. experiments show this often happens when we are faced with the homeless).

The first point means that, we typically will not make decisions based on data or evidence presented to us alone. No matter how much we may want to make data and evidence based decisions, when factoring options we will likely bring to bear, consciously or unconsciously, our lifetime experiences, transmitted to our brain through chemical stimuli.

The second point means that, in interpreting events, occurrences, phenomena of all kinds, from social phenomena to purely physical ones, we tend to attribute intentionality to those events, we tend to attribute human characteristics to phenomena that may not have it or not be able to be reduced to such. Eagleman sees in this tendency, evidence of the importance of human interactions for our brains and for who we are. But it can also be seen as a potential factor in our tendency to see organizations, firms, governments behaving as if they were individual decision-making units rather than composed of people themselves. Perhaps this attribution of human intentionality to anything but a person could help explain conspiratory theories, where large networks are assumed to work in unison towards a common goal; and perhaps it could help explain situations where we see cause and effect where there is none, simply because we attributed agency to entities that do not have it.

Both points made by Eagleman, the role of emotions in decision-making and our tendency to attribute human intentionality to entities other than a person, should make us question the extent to which our minds are pre-conditioned to make decisions largely based on factors beyond the data and evidence put in front of us on any given decision-making occasion. They should make us think of ways to build into decision-making processes awareness of how our brain works and the possible implications for the decisions we end up making.

References

Eagleman, David. 2017. The Brain. The Story of You. Vintage Books.

On Data and Evidence in the Social Sciences

Post author:Alex Uriarte
Post published:October 30, 2021
Post category:Fan

On a recent trip to the local public library I happened to find a copy of a book of collected works by Bertrand Russel that I used to own but that, as many other books, had been a victim of my international moves. I always admired Bertrand Russel’s clear, simple and straightforward way of discussing not so simple topics without distorting them (at least not in ways that were obvious to me). The library was selling the book as part of a used book sale and I bought it, together with a copy of Bertrand Russel’s “The Scientific Outlook.”

In reading this latter book, I found myself sucked into a web of interrelated methodological discussions; some old ones (e.g. how scientific the social sciences are or can be) and some newer ones (e.g. whether the huge amount and speed of data availability, and easy access to it, brought by information technology, has challenged traditional scientific methodology and put correlation – with no theory – front and center on the research agenda). I remember delving into economic methodological discussions some thirty years ago as an economics student but have distanced myself from economic theory since. After getting lost in the rabbit hole, I decided I did not have enough time to dig deep enough into these discussions, but thought I would register what I found, perhaps for continuing/revisiting at a later date.

So here goes.

Both Mlodinow (2009) and Russel (1962) place the origins of the scientific revolution in the late sixteenth and early seventeenth century, pretty much on the shoulders of Galileo Galilei (1564-1642), his contemporaries and those coming soon after him (e.g. Isaac Newton – 1643-1727). They also both characterize the scientific revolution as being centered on induction and experimentation, as opposed to deduction, as a source of knowledge. Both deduction (theory) and induction (evidence) have a role in science and Russel describes the scientific method as including three stages:

Observing significant facts
Arriving at a hypothesis, which, if true, would account for these facts
Deducing from this hypothesis consequences which can be tested by observation (some, quoting Karl Popper, would say “refuted” by evidence)

This characterization of the scientific method (and its variations) seems to have been criticized over time as not adequately portraying how science evolves. The idea that science progresses by refuting hypotheses empirically, for example, seems to have been criticized repeatedly over time. A recent opinion article in Scientific American (Singham 2020) claims that it must be abandoned for good for at least two reasons: first, because empirical experiments are framed by many theories themselves making its results more reflective of comparisons between theories than between theory and evidence; second, because this is not really how science has advanced historically. Rather, the author claims, “It is the single-minded focus on finding what works that gives science its strength, not any philosophy.“ Similar arguments have been made by various philosophers of science, including Thomas Kuhn (Wikipedia 2021).

The use of empirical evidence may vary from one branch of science or research program to another. I particularly looked for discussions among economists, because that is an area I have more of a background in and because of its relevance to international development. In a well known paper, Larry Summers (1991) argues that elaborated statistical tests aimed at estimating model parameters have had little consequence to advance economic thinking, that most papers remembered as having advanced economic theory have little empirical content at all, and that successful empirical research in economics have relied mostly on attempts to gauge strength of association and on persuasiveness. He criticizes models that have been overspecified to enable testing under the argument that results tend be of little worth and, comparing economics to natural sciences he states that “The image of an economic theorist nervously awaiting the result of a decisive econometric test does not ring true.”

In general, the Popperian criteria of falsifiability through testing seems to be simultaneously nominally accepted and yet in practice not met in economics, with theory moving forward anyway, based on the use of empirical evidence to support argumentation. Hausman (2018) summarizes the challenges of application of Popperian criteria to economics (presumably applicable to the social sciences more generally) and how several authors have abandoned completely the criteria to argue that economics (and, again, presumably the social sciences more generally) advances by using a more comprehensive blend of theory and empirical evidence. Durlauf (2012) states that “while some empirical economics involves the full delineation of an economic environment, so that empirical analysis is conducted through the prism of a fully specified general equilibrium model, other forms of empirical work use economic theory in order to guide, as opposed to determine, statistical model specification. Further, a distinct body of empirical economics is explicitly atheoretical, employing so-called natural experiments to evaluate economic propositions and to measure objects such as incentives.

A more recent discussion on the use of evidence to advance our knowledge of society gained traction with the rapid growth of “big data. ” Some claimed that data in large volume would make the scientific method obsolete and the correlation would suffice to advance our knowledge, even if these claims may come from outside the academic community. For example, an article in WIRED magazine, written by its Editor in Chief, claimed that “Petabytes allow us to say: ‘correlation is enough.’ We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot” (Anderson 2000). These kinds of claims have been countered by others (e.g. Mazzochi 2015) and it is hard for me to imagine how computerized analysis of data would not be imbued with human theorizing, no matter how much one tries to step aside and let “data speak.” In addition, for my purposes, much of the data in international development is not “big data.” Even if it were, it is not clear to me how we would separate the many variables that in international development tend to move together (in the same or opposite direction) with just correlation as a criteria (and no theory).

There is a large literature to review on this topic and I have not even looked at the random control trial based research that gave Esther Duflo, Abhijit Banerjee and Michael Kremer the 2019 Nobel prize in economics, and what that line of research means for the discussion above. But I am thinking (for now) that there may not be a clear rule in the use of evidence and theory for discussing international development knowledge, and I am satisfied (for now) with looking for the reasonable use of theory, evidence, skepticism and caution in thinking of development policy and practice. I am sure I will come back to this discussion at a later date.

References:

Anderson, Chris. 2000. The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. WIRED. June. Available: https://www.wired.com/2008/06/pb-theory/. Accessed: October 30, 2021.

Boland, Lawrence. 2006. Seven decades of economic methodology: a Popperian perspective. In: Karl Popper: a Centenary Assessment: Science, I. Jarvie, K. Milford and D. Miller (Eds), 2006, 219–27. Available: http://www.sfu.ca/~boland/wien02.pdf. Accessed: October 30, 2021

Durlauf, Steven. 2012. Complexity, Economics, and Public Policy. Politics, Philosophy & Economics 11(1) 45–75. Sage. Available: http://home.uchicago.edu/sdurlauf/includes/pdf/Durlauf%20-%20Complexity%20Economics%20and%20Public%20Policy.pdf. Accessed: October 30, 2021.

Hausman, Daniel. 2018. Philosophy of Economics. Stanford Encyclopedia of Philosophy. Available: https://plato.stanford.edu/entries/economics/#RhetEcon. Accessed: October 30, 2021

Mazzocchi, Fulvio. 2015. Could Big Data be the end of theory in science? A few remarks on the epistemology of data-driven science. EMBO reports. EMBO Press. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4766450/pdf/EMBR-16-1250.pdf. Accessed: October 30, 2021.

Mlodinow, Leonard. 2009. The Drunkard’s Walk. How Randomness Rules Our Lives. New York: Vintage Books. A Division of Random House.

Russel, Bertrand. 1962 (first copyrighted in 1931). The Scientific Outlook. The Norton Library. W.W. Norton & Company.

Singham, Manu. 2020. The Idea That a Scientific Theory Can Be ‘Falsified’ Is a Myth. It’s time we abandoned the notion. In Scientific American, September 2020. Available: https://www.scientificamerican.com/article/the-idea-that-a-scientific-theory-can-be-falsified-is-a-myth/. Accessed: October 30, 2021

Summers, Larry. 1991. The Scientific Illusion in Empirical Macroeconomics. The Scandinavian Journal of Economics. Vol. 93, No. 2, Proceedings of a Conference on New Approaches to Empirical Macroeconomics (Jun., 1991), pp. 129-148 (20 pages). Wiley. Available: http://faculty.econ.ucdavis.edu/faculty/kdsalyer/LECTURES/Ecn200e/summers_illusion.pdf. Accessed: October 30, 2021.

Wikipedia contributors. 2021. Scientific Method. Available: https://en.wikipedia.org/wiki/Scientific_method. Accessed: October 30, 2021

Descriptive and Inferential Statistics

Post author:Alex Uriarte
Post published:May 23, 2021
Post category:Fan

Since I posted “Challenges in Exploratory Data Analysis” (February 1, 2021), I found myself struggling with the distinction between Exploratory Data Analysis and Confirmatory Data Analysis on one hand, and the distinction between Descriptive Statistics and Inferential Statistics on the other. The former distinction is relevant to what you can say with any one set of data and what you can say with more than one data set; while the latter distinction comes into play when deciding whether our interest lies in the sample at hand or on the process generating the sample we have (the population).

Clarifying these distinctions is more than an academic exercise: doing so, and understanding how the terms are used, help us understand what we can say with the data and what we cannot, what assumptions we are making when inferring from the data and at what point in our analysis we are making those assumptions. It helps develop our own guidelines for disciplining our thought process when thinking with data.

According to Wikipedia (Wikipedia contributors 2021a), Exploratory Data Analysis was promoted by US mathematician John Tukey in the 60s and 70s, as a way of unearthing hypotheses to be tested with data before jumping onto testing hypotheses based on assumptions made. It was to be in contract with Confirmatory Data Analysis (hypothesis testing). It was a way of exploring what information was contained in the data, independent of any already existing hypotheses about the relevant subject matter. It included a myriad of techniques such as looking at variable maximums, minimums, means, medians and quartiles, but was characterized more by the attitude than the techniques. A number of techniques applied in exploratory data analysis can be applied whether our focus is on the sample at hand (descriptive statistics) or the underlying generating process (inferential statistics). In thinking about these concepts, I produced the diagram below, that is useful to me, may be useful to others as well (I used mostly my accumulated knowledge at this point, but suggest readers start with Wikipedia entries for Descriptive Statistics [2021b] and Statistical Inference [2021c] for further reading).

Although exploratory data analysis techniques can be applied whether our focus is on the sample at hand or the underlying generating process, how things are done in each case may be different. In the table below I tried to establish some distinctions on how we would proceed with exploratory data analysis in descriptive and inferential statistics.

In either case, during exploratory data analysis, we do not talk about significance of correlation, causality or hypothesis testing. These require modeling and a second sample drawn from the same population (or treatment and control groups).

A final note on the terms used by Cassie Kozyrkov in her popular blogs and vlogs (Kozyrkov 2018; 2019a; 2019b; 2020). She refers to data analytics as being used when there is no uncertainty (what I refer to as descriptive statistics) and refers generally to statistics when there is uncertainty (what I refer to inferential statistics). She also refers to data analytics as being for inspiration (what I refer here as exploratory data analysis), as opposed to hypothesis testing, that would require another sample. From what I can tell from the literature, these are less traditional uses of the terms and I find the traditional uses (what I believe I capture here) seem to better highlight the difference between analyzing sample and population data.

References

Kozyrkov, Cassie. 2018. “Don’t Waste Your Time on Statistics.” Towards Data Science. May 29. Available: https://towardsdatascience.com/whats-the-point-of-statistics-8163635da56c. Accessed: May 23, 2021.

———-. 2019a. “Statistics for People in a Hurry.” Towards Data Science, May 29. Available: https://towardsdatascience.com/statistics-for-people-in-a-hurry-a9613c0ed0b. Accessed. May 23, 2021.

———-. 2019b. “The Most Powerful Idea in Data Science.” Towards Data Science. August 09. Available: https://towardsdatascience.com/the-most-powerful-idea-in-data-science-78b9cd451e72. Accessed: May 23, 2021

———- 2020. “How to Spot a Data Charlatan.” Towards Data Science. October 09. Available: https://towardsdatascience.com/how-to-spot-a-data-charlatan-85785c991433. Accessed: May 23, 2020.

Wikipedia contributors, 2021a.”Exploratory data analysis.” In Wikipedia, The Free Encyclopedia. Available: https://en.wikipedia.org/w/index.php?title=Exploratory_data_analysis&oldid=1021890236. Accessed May 15, 2021.

———-. 2021b. “Descriptive Statistics.” In Wikipedia; The Free Encyclopedia. Available: https://en.wikipedia.org/wiki/Descriptive_statistics. Accessed May 23, 2021.

———-. 2021c. “Statistical Inference.” In Wikipedia; The Free Encyclopedia. Available: https://en.wikipedia.org/wiki/Statistical_inference. Accessed May 23, 2021.

End of content

No more pages to load

Older Posts