Alex Uriarte – Joeira

Statistical Inference, Observational Data and Machine Learning

Post author:Alex Uriarte
Post published:April 28, 2025
Post category:Fan

I spent some time thinking about where statistical inference is considered in predictions made by Machine Learning (ML) models and came to the conclusion that the answer is most (if not all) of the time, nowhere. The reason is the typical use of observational data by ML models. Here is my reasoning.

Statistical Inference

Statistical inference consists of using observed events to draw conclusions about the underlying process that generated them. Most commonly, this is used to draw conclusions about an unobserved population based on an observed sample.

Perhaps the most important tool in statistical inference is random sampling. Random sampling consists of selecting subsets of a population (samples) at random, that is, through some experiment where the outcome cannot be predicted in advance and where each unit of observation has the same chance of being selected for the sample.

Random samples have a few properties that allow us to make statements about the population (or the generating process). In particular, if we have a random sample:

The law of large numbers (LLM) states that the average of variables in that sample converge to the average of those variables in the population, the larger the size of the sample
The central limit theorem (CLT) states that, as multiple random samples are pulled from the same population, the average of the samples is distributed normally and can be normalized to be present a standard normal distribution (with mean equal to 0 and variance equal to 1)

Random sampling, the LLM and the CLT are all key in allowing us to draw conclusions about a population (or generating process) based on an observed sample.

Note that all the above refers to statistical inference and not causal inference. In statistical inference we are drawing conclusions about a population based on a sample. In causal inference we are drawing conclusions about how one factor may cause another. Random sampling by itself does not help with causal inference as it does with statistical inference. Causal inference is where, for example, randomized control trials (RCTs) are particularly powerful. But that may be the subject for another post.

Observational data

So what do you do when you have a set of data that was not generated through some random process, but rather, where the data were simply observed? As far as I understand, the only real option is to limit our analysis to the analysis of the sample data itself. That is, draw no inferences about the generating process. This, however, is rarely the purpose of data analysis. We most often analyze the data to be able to draw conclusions about the generating process and be able to predict future outcomes, not just to understand past observed data

The more common practice seems to be to continue the analysis of data as if they were the result of a random sample of some generated process. For example, we often see econometric models where someone conducts a linear regression using observational data, estimates the p-values of the parameters (the coefficients of the regression), and then interprets those p-values as telling us the probability of observing values for the coefficients that are as far away or more from zero, in a situation where the true value of the coefficients in a hypothetical population were in fact 0. In other word, if they see a p-value low enough, (say 5%) they say that in only 5% of samples extracted from a population would we estimate a coefficient as far from 0 as the ones we estimated and, therefore, they conclude that there is a very good chance that the true coefficient in the population is in fact not zero. They then celebrate that they found a “significant” correlation in the data. But all this interpretation assumes that the observed data are randomly sampled through some generating process. With observational data, that is not the case. So the use of statistical inference, calculation of p-values, hypothesis testing make no sense.

Additional confusion is generated by terminology commonly used in linear regression. For example, the Gauss-Markov theorem gives us conditions under which an estimate is the Best Linear Unbiased Estimator (BLUE). However:

“Estimator” refers to a linear function for obtaining an expected value for the explained variable Y given values for the explanatory variable Xs. It does NOT refer a function for obtaining estimates of population parameters based on a sample;
Similarly, “unbiased” refers to the estimator of the Y in the observed data set, given the explanatory X variables, not to an estimator of the population Y.

So what does this mean for ML models?

Machine Learning

Because ML typically relies on observational data, my conclusion is that traditional machine learning models suffer from the same constraints of any other statistical analysis of observational data: because observations are not pulled from a population at random, there is very little that we can say or do to ensure that our ML estimators are unbiased estimators of an underlying generating process.

What we can do is:

Knowing that data sets used in ML models are typically large (“big data”), consider the extent to which the data set is likely very close to the universe of interest, or even consider that the data may be in itself the universe of interest. In that case, no statistical inference is needed. In other words, consider the potential for “bias” in the observed data relative to a hypothetical population of interest.
Consider whether bootstrapping sub-samples of data and testing the model on many sub-samples is likely to generate robust enough estimators that, perhaps combined with the large size of the population, we can be sufficiently confident that biases in the sample will not meaningfully affect predicted outcomes.

To understand the above, let us better understand traditional ML models.

Using again the example of linear regression, regression analysis is now commonly thought of as one more application of traditional ML. In traditional ML approaches, an algorithm is applied to a (typically large) data set in search of patterns. When the algorithm “labels” the outcomes and looks for patterns that identify the outcome, this is called supervised machine learning. This would be the case of regression analysis, since we observe the dependent variable in the data set. The algorithm is typically some optimizing criteria – such as least squares or maximum likelihood – but sometimes the criteria are based on sufficiency. In ML, the machine is said to “learn” when it has generated a model, that is, when it has identified patterns based on applying an algorithm to a dataset. In a non-supervised ML approach, there is no labelling of the outcome in the dataset, but in either case, some algorithm searches for patterns, and this means that:

“Features” of the data set where patterns should be found are either selected by the modeler before applying the algorithm or are part of the algorithm itself.
The criteria for identifying a pattern is defined in the choice of algorithm

The choice of algorithm is key, and the choice is often guided by the purpose of the model, that is, what type of event we are trying to identify in the data. However, if we want to compare different algorithms, and also when testing how good a model we obtain:

The common criteria used typically focus on internal validation, that is, how well the model fits the data. In supervised models, this means criteria such as precision, recall and the F1 score (that balances the two). In unsupervised models, there are different types of internal validation criteria depending on the algorithm used.
To increase the chances that the model will work well in other datasets, the dataset is typically broken into sub-samples and model parameters are estimated on more than one sub-sample. This reduces the “over fitting” of a model to one specific sub-sample.

So, back to the question of how to deal with observational data in ML models. It seems to me the best that we can do is to (again):

ask ourselves if the data at hand can be considered our universe of interest or to look for potential bias in the data (relative to our targeted universe); and
if we think there is bias relative to our population of interest, consider whether testing the model on many bootstrapped sub-samples is likely to generate robust enough estimators that we can be sufficiently confident that biases in the sample will not meaningfully affect predicted outcomes.

But the bootstrapped sub-samples, even being randomly generated from the data at hand, still do not allow us to draw conclusions about an underlying process and population, unless that population is assumed to be the observed data set.

Sources (and note)

I did not do a very good job at registering my sources for this post. I relied mainly on my own knowledge while searching Wikipedia and interacting with ChatGPT 4o. I Have become a big fan of ChatGPT 4o and have been using it considerably to learn. I find it is particularly good when we are interacting about subjects we have enough knowledge about to identify possible weak or incomplete answers and are able to follow-up with questions that lead to answers that are closer to what we wanted to know, and when we are able to judge somewhat the reliability of the answers based on previous knowledge. In the future I will try to better register Wikipedia pages and ChatGPT prompts.

OpenAI. ChatGPT 4o (omni – May 2024), accessed March-April 2025.

Wikipedia contributors. Various pages, accessed March-April of 2025

A Brief Incursion into Epistemology (to be continued)

Post author:Alex Uriarte
Post published:January 13, 2025
Post category:Fan

I walk my dog in the mornings while typically listening to a news podcast. Sometimes I get tired of the news and listen to music or search for some other type of podcast to accompany my walk. Recently I listened to a few episodes of one called “European Intellectual History since Nietzsche,” which consists essentially of recordings of a class given at Yale by an Associate Professor of History called Marci Shore. I enjoyed the first few classes but soon some things didn’t sound quite right to me – admittedly, based on my very limited knowledge – and these issues were enough to make me stop listening. No demerit to the professor, this may only reflect my own limitations and I won’t get into what those issues were because what matters is that it got me wanting to learn more about the epistemology of different philosophers.

Before jumping into epistemology, here is a brief summary of what I got out of Prof Shore’s first two episodes with a broad overview of the Enlightenment and Romanticism.

I next spent some time trying to pin down the epistemological view of different philosophers making use of a few introductory sources: the Oxford Companion to Philosophy (Honderich 1995), which I found in a used book store, Wikipedia and ChatGPT. Yes, I went there. However, I had very little confidence in what I was getting from those sources.

So I finally shifted my efforts and went to my go-to philosopher – Bertrand Russell – for, rather than a history of epistemological thought, at least his own views.

I should first clarify that epistemology can be defined in different ways, but it essentially refers to the branch of philosophy that addresses how we know. The Oxford Companion to Philosophy defines it differently in different entries written by different contributors. The entry for history of epistemology, written by Prof. D.W. Hamlyn of Birkbeck College, London, defines it as “the branch of philosophy concerned with the nature of knowledge, its possibility, scope, and general basis” (Honderich 1995, p. 242). The entry for problems of epistemology, written by Prof. Jonathan Dancy of Keele University, defines it as the “study of our right to the beliefs we have” (Honderich 1995, p. 245). I take these definitions, in combination, to be sufficient to convey what the focus of this post (and my interest) is.

Bertrand Russell has a little book called “The Problems of Philosophy,” published in 1912, and that largely focuses on epistemology. The book is not exclusively about epistemology, however, and wonders into ontological questions about the nature of reality (e.g. addressing in several parts the issue of “idealism”). I try to focus on the epistemology parts, but I do understand how the two issues are intertwined.

Here is my understanding of Russell’s views, based on the book.

What we perceive through our senses is only indirectly a physical object. We perceive what he calls “sense-data,” which are signals of the actual physical object. “Sensation” is the awareness of things through sense-data. The collection of physical objects is “matter.”

Russell distinguishes between knowing truths (e.g. savoir in French, saber in Spanish and Portuguese) from knowing things (e.g. connaitre in French, conocer in Spanish, conhecer in Portuguese). He then turns to focus on the knowledge of things.

We can “know” things directly or indirectly. The former he will call “knowledge by acquaintance,” the latter “knowledge by description.”

Knowledge by acquaintance can happen in several ways, such as through sense-data, through memory or introspection (self-consciousness, knowledge of self).

Figure 2 below summarizes his thought so far.

Figure 2. Knowledge of Things

The fact that we are able to generate inferences from what we know about things means that we are drawing on some general principles to do so. Examples are:

The principles of induction: the more two things are observed together, the more we expect them to be
The principles of logic. E.g.: if it is know that: a) if this is true that is true; and b) this is true; then c) that is true

Principles of inference are examples of what he calls “a priori” knowledge. Other examples are mathematics and knowledge as to ethical value (or the intrinsic desirability of things). Russell argues that these principles (or a priori knowledge) cannot be proved by experience. This is an old debate between empiricists and rationalists, and key in defining an epistemological view. The debate often uses the terms “innate” knowledge rather than a priori. Russell prefers to use a priori to innate because, although a priori knowledge cannot be proved by experience, he considers it to be elicited and caused by experience. In the debate between empiricists (e.g. Locke, Berkeley, Hume) and rationalists (e.g. Descartes, Leibniz), Russell considers that the rationalists were correct in that a priori knowledge cannot be derived from experience. But he thinks rationalists were incorrect in thinking that they could deduce what they know from a priori knowledge since, as mentioned above, the a priori knowledge are elicited and caused by experience.

How Russell argues that the general principles cannot result from experience alone is central to his understanding of how we know. His main argument seems to be that any generalization from experience (induction) presupposes some general principle. Therefore, no general principle can be proven by experience. He gives the example of a chicken who expects food every time it sees the person who feeds it. Every day the expectation is confirmed….until the day the person breaks the chicken’s neck. There is no logical reason that simple repetition should guarantee its continuity, no matter how much we expect it to be so, unless we associate to that expectation some general principle (e.g. a logical principle). A note: Russell does not mention causality in his argumentation, but it is my understanding that all causal argumentation presupposes logical principles, so Russell’s argument is consistent with someone bringing causality into the discussions to justify expectations based on experience. The point made in this paragraph seems simple enough, but it is key to establishing an epistemological view and, as mentioned in the previous paragraph it addresses a long standing epistemological debate. On this point, Russel makes a lot of sense to me.

A consequence for scientific thought:

“The general principles of science […] are as completely dependent on the inductive principle as are the beliefs of daily life […]. Thus all knowledge which, on the basis of experience tells us something about what is not experienced, is based upon a belief which experience can neither confirm nor refute […].” (Russell 1912, p. 40)

Deduction then also plays a part in the building of our knowledge, because we can often know general principles without inferring it from its instances (e.g. 2+2=4). Deduction starts from the general to the general or to the particular; induction starts from the particular to the particular or to the general.

Russell then asks himself how a priori knowledge is possible. Here the discussion seems to veer again quite a bit into ontological questions, since it becomes not just how to “know” a priori principles but also about the nature of a priori principles.

He first explains Kant’s view, which states that a priori knowledge is generated from the interaction of ourselves and physical objects (the “things in themselves”), what he calls “phenomenon.” We cannot know a thing in itself, only to the extent that it conforms with our own nature. If I understood Russell’s explanation, according to Kant a priori knowledge would be a product of our interaction with physical objects. This view would not quite fit empirical views, because we are not just observers, but our own nature is part of what generates our knowledge, just as much as the things in themselves and the perception we have of them. Kant’s view on this makes a lot of sense to me.

Russell, however, proceeds to make a point that I find less immediately obvious and that I am still not sold on. He argues that we should not think of our part in this phenomenon as reflecting the nature of our minds but rather, that a priori knowledge must have a nature that is neither material nor an idea. He gives as an example the law of contradiction (nothing can both be and not be), which he argues is not just a statement about our beliefs (our minds) but of the things themselves. I do not see how this necessarily follows. Why would the law of contradiction not be something that we take for given because of the structure of our minds? How do we know that, in fact, this law must apply to things, if we cannot even perceive things directly? I do not follow Russell’s argument here. At the same time, this is an ontological question and, therefore, not of particular interest to me. Whether the law of contradiction is imposed on things by our minds or is something that exists beyond matter and ideas it does not seem to have immediate consequences for how we know, in any practical way. On this matter, for now, I will stick to Kant’s view, which is more intuitive to me.

Russell then goes on to discuss the nature of a priori knowledge as being neither material, nor a product of our minds. To do so, he reaches out to Plato. To avoid thinking of a priori knowledge as an “idea” he suggests using the term “universal,” and states that the essence of universals is that they do not arise from a given sensation. He goes on to discuss the nature of universals, which I will jump here, both because he lost me and because it seems like too much of a deep dive into ontological questions for this post.

Russel’s next step is to suggest that our knowledge of universals can also be acquired by acquaintance or by description, just like our knowledge of particulars. Through acquaintance, we come to many different types of universal knowledge, such as sensible qualities (e.g. “whiteness), relations (e.g. before and after, above or below, greater or smaller than) and he states that all a priori knowledge deals with relations of universals (Russell 1912, p. 63).

Figure 3 below modifies Figure 2 to include universals in the picture, the nature of which will remain a mystery to me for now. I must say, however, that there is more to be discussed regarding the knowledge of truths. Russell’s book contains a few more chapters on this, including on intuitive knowledge, truth and falsehood, probable opinion and the value and limits of philosophy. I stopped before these, however, since the discussion of the nature of universals already stumped me and is, in any case, as far as I am willing and have time to go at the moment.

Figure 3. Knowledge of Truths*

*There is more to discuss regarding the knowledge of truths based on Russell’s book. To be continued.

Sources

Russell, Bertrand. 1912. The Problems of Philosophy. Printed version of work in the public domain.

Honderich, Ted (Editor). 1995. The Oxford Companion to Philosophy. Oxford University Press

Shore, Marci. 2024. European Intellectual History since Nietzsche. Podcast with recording of classes offered at Yale in 2023. Available on Spotify. Accessed: January 2025

Output and Income Indicators

Post author:Alex Uriarte
Post published:June 9, 2024
Post category:Fan

In my previous Fan post (“Indicators of Government Expenditures”) I noted that, when using output indicators such as GDP, we should keep in mind that: a) there are important limitations to this indicator, and b) when used, there are different indicators that may be more or less appropriate for different purposes. I develop a bit on those two points here.

On the first point, an assessment was done in 2008 by a commission led by three economists, two of which Nobel prize awardees, at the request of the Government of France, and later summarized in a book. I draw from it here, although additional details are available online.¹

The commission was led by Joseph Stiglitz, Amartya Sen (both Nobel Laureates), and Jean-Paul Fitoussi. Other economists were also part of the commission. The commission was divided in three working groups:

One to focus on standard issues of national accounting, such as measuring government output and treatment of household production;
A second group focused on the relationship between output measures and efforts to measure well-being or quality of life;
A third group looked at attempts to capture sustainability in measures of output.

On the “classical GDP Issues,” GDP mainly measures market production, and one reason why money measures have come to play an important role in our evaluation of economic performance is that money valuations facilitate aggregation. However:

Prices do not exist for some types of output (e.g. government services provided free of charge or household services such as child care);
Market prices may not reflect consumer’s appreciation of goods and services if there is imperfect information (e.g. financial products, telecommunications bundles);
Market prices may not fully reflect societal evaluation due to externalities (e.g. environmental costs);
Collecting accurate data may be challenging when there are sales or differences in prices among alternative selling mechanisms (e.g. online vs store prices);
Accounting for quality of products and changes in quality is challenging and may not always be reflected in prices;
Underestimating quality improvement means overestimating inflation, which, in turn, means underestimating real income.

These are not minor inconveniences, but real issues, and the extent to which GDP measures are distorted by them is not clear. They discuss in some length the issues with measuring services, for example. Services account for up to two thirds of output and measuring the quality of services is challenging. Measuring government provision of services, for example, is often done through inputs, which leave aside the possibility of capturing changes in productivity. Attempts to measure government services using outputs face known challenges, such as accounting for quality. What services are considered final and what intermediate (or “defensive”) services is difficult to define. E.g.: government costs with prisons? Private costs with commuting?

The authors suggest five ways of dealing with some of the deficiencies of GDP as an indicator of living standards:

Emphasize well established indicators other than GDP
- Gross, rather than Net, has the issue of not accounting for the amount of output that is needed to maintain capital goods (depreciation). When technology is changing rapidly, this could be substantial and the difference between Gross and Net can be considerable. Then – consider “Net” (although depreciation is hard to estimate);
- Product, rather than Income, has the issue of not being as good for accounting household consumption and, therefore, associated well-being. The difference is the purchasing power sent to and received from abroad (net income from abroad). Also, changes in the relative prices of exports and imports will affect national income even if domestic product stays the same. Consider “Income;”
Consider wealth jointly with consumption to capture consumption possibilities over time;
Bring out the household perspective
- Adjusted disposable income accounts for government taxes and monetary transfers but not for transfers in kind;
Add information on the distribution of income, consumption and wealth:
- Median is better than average, but depends on survey data and these have known challenges:
  - Unit of measurement? Consumption unit?
  - Measuring property income?
  - International comparability
  - Whose bundle of consumption?
  - Changes in the provision of services within households or between families to provision by markets creates distortions
- Also, we should be looking at distribution of full income, not just market income, including values such as household income and leisure
Widen the scope of what is being measured (may require imputation):

- Recommendation is to keep a satellite account because: a) imputed values are not as reliable as observed values; b) non-observed values could end up being a very large share of total output. E.g.:
  - Household work, under the authors estimates, could be 30% of currently measured GDP;
  - Leisure could be 80%;
- They still recommend it be done for a) completeness; b) the invariance principle – under which the value of a good or services should not depend on the institutional arrangement under which it is provided (e.g. free by state or charged by private sector).

The other two areas taken on by the commission working groups are more intuitive to me, even if not easy to address so I only briefly summarize the conclusions of the corresponding working groups:

On the relationship between output measures and efforts to measure well-being or quality of life, the argument is that these latter concepts cannot be reduced to resources. Efforts to measure well-being and quality of life have either attempted to measure subjective perceptions, tried to assess capabilities that would enable and support human functioning (health, education, security…) or tried to identify how individuals themselves weigh the non-monetary aspects of their well-being. All these attempts face challenges, including how to incorporate inequalities, how to access the linkages between the various dimensions of well-being or quality of life, and how to aggregate them;
On attempts to capture sustainability in measures of output, there is a large and varied literature that the commission divided in four groups: attempts to establish large dashboards with sets of indicators addressing different aspects of sustainability; attempt to develop composite indices; attempts to develop adjusted GDP indicators; and indicators focusing on overconsumption or overinvestment.

What do I draw from the above? A few initial thoughts:

When using an indicator of output growth for a selected country or group of countries, I have typically used the World Bank, World Development Indicators (WDI), Gross Domestic Product (GDP) series in Local Currency Units (LCUs). I have used LCUs when looking at growth instead of alternative monetary units, to avoid the influence of short term fluctuations of exchange rates. Attempts to correct for this influence, such as the World Bank’s Atlas measure (more on this below) or the use of Purchase Power Parity (PPP) measures seem unnecessary, given their imperfections and that we are only interested in growth and not in comparing the absolute value of output among countries. This series can be used to break down domestic output in its expenditure components (G+C+I+Ex-Im+changes in inventories), as well as by sector of the economy (agriculture, industry and services)². It is available for a period of over 60 years for most countries. Based on the input above:
- The use of output rather than income indicators when looking at growth seems reasonable to me and perhaps more relevant: it better reflects the production capacity of a country (rather than its standard of living) and, for most countries, output and income do not tend to diverge much over time (although this may not always be the case and would be interesting to look at the data).
- The fact that GDP indicators do not capture household production means that growth is likely overestimated during periods where agricultural production for own consumption is reduced and production for the market is increased. GDP growth is also likely overestimated during periods of increased entry of women in the labor market, if this also means decreased services within the household. I would need to further research the WB WDI methodology to see the extent to which the WB tries to address this issue in their measurements;
- The extent to which the informal economy is captured also requires further look into the WB WDI indicator methodology. If it does not capture the informal economy well, growth would also be overestimated during periods of formalization.
I have used The World Bank, World Development Indicators (WDI), Gross National Income (GNI) series in Purchasing Power Parity (PPP) when comparing countries. I have preferred to use at the concept of income (what belongs to the residents of a country) rather than product (what is produced within the boundaries of a country) when comparing countries because it is a better indicator of resources available to the local population. For cross country comparisons, PPP measures (even if imperfect) allow some correction for price and exchange rate distortions regarding how much residents of two compared countries can actually purchase with their income. This series is available for fewer years and countries. Based on the input above:
- Periods of rapid technological transformation – such as the one we are in now – are likely generating considerable distortion in our relative measurements of income by country, given the challenges in addressing quality of products and services. To the extent that we are able to use net indicators (as opposed to gross), accounting for depreciation in such periods is also a more serious challenge and a source of distortion.
- Does our association of value with market prices mean that our association of income per capita with productivity is somewhat distorted? I explain: think of luxury goods, where price is not necessarily associated with quality but where status of a brand plays an important role in product prices. Countries with heavy presence of luxury industries will have their per capita incomes associated with this higher price that is fabricated by the status of their products rather than by the quality of their products. How we understand the productivity of their population would need to be interpreted in this context (Italy, I am thinking of you).
- Do the decaying European houses (that we think of as so charming) mean that European household income tends to be overestimated by the use of gross measurements?
- On the other hand, does the fact that we do not capture the value of leisure underestimate European household income relative to countries like the US?
The World Bank uses GNI per capita in US dollars converted from local currency through the Atlas method to classify countries in income groups (low income, lower middle income, higher middle income and high income). The Atlas method is based on three year moving averages of exchange rates. They use the Atlas method rather than PPP arguing that “issues concerning methodology, geographic coverage, timeliness, quality and extrapolation techniques have precluded the use of PPP conversion factors for this purpose” (World Bank, undated). This seems to also be the indicator the WB uses for establishing the annual threshold for countries to qualify for International Development Association (IDA) loans. The US Millennium Challenge Corporation (MCC) uses the WB country income groups to select countries that qualify for its assistance (low income and lower middle income). Based on the input above:
- If we underestimate income in low-income economies, given that they often also have larger portions of their economies not captured by GNI measurements (greater presence of subsistence agriculture, household production and services, informality), what does this mean for our categorization of countries in income groups? How distorted are these classifications? Should we be interpreting them as rather “market income” groups? If so, to what extent are our foreign assistance programs directed at increasing “market income,” rather than income as a whole? To what extent are our foreign assistance impact evaluations distorted by not recognizing this distinction?

Notes

There used to be a site with technical papers at the URL: www.stiglitz-sen-fitoussi.fr . This seems to no longer be available but I found a link to the content here: https://web.archive.org/web/20150622185128/http://www.stiglitz-sen-fitoussi.fr/en/index.htm
The WB World Development Indicators reports total value added at basic or producer prices and GDP at purchaser prices. That is why their measurements differ. Purchaser prices include taxes and exclude subsidies. For more information, see here: https://datahelpdesk.worldbank.org/knowledgebase/articles/114948-what-is-the-difference-between-total-value-added-a

References

Stiglitz, Joseph A; Sen, Amartya; and Jean-Paul Fitoussi. 2010. Measuring our Lives: Why GDP Doesn’t Add Up. The Report by the Commission on the Measurement of Economic Peformance and Social Progress. The New York Press.

World Bank. Undated. Why use GNI per capita to classify economies into income groupings?. Available: https://datahelpdesk.worldbank.org/knowledgebase/articles/378831-why-use-gni-per-capita-to-classify-economies-into. Accessed: June 08, 2024.

Indicators of Government Expenditures

Post author:Alex Uriarte
Post published:April 14, 2024
Post category:Fan

The International Monetary Fund (IMF) has a couple of public dashboards showing government expenditures as a percentage of Gross Domestic Product (GDP), by country. See here and here. There is nothing wrong in doing this if we keep in mind that we are using GDP as a denominator just as a tool to give us a reference of the relative size of government expenditures in different countries. But, based on this kind of data, it is common to hear things like “government expenditures were 61% of the entire French economy or 45% of the US economy in 2020,” as if these numbers were breaking down the total of the economy (100%) in its government and non-government portions. This would be incorrect and, unfortunately, it ends up supporting all sorts of confused discussions about the role of government in the economy.

The comparison between government expenditures and GDP is one of apples and oranges and only makes sense if we understand, again, that GDP is being used as a denominator only as a convenient tool to facilitate country comparisons. Government expenditures, as reflected in databases like that of the IMF, are measures of total expenditures, either by central and local governments or just by central governments (depending on the country), over a one year period. GDP does not measure total expenditures, but rather “value added” by the economy over a one year period. The difference is that measures of value added discount from measures of expenditures, the purchases of intermediate goods and services used to provide the goods and services by the sector in question. Value added is used when measuring output by sector, to allow summing these sectors without double counting. The result is a general measure of output, such as GDP.

To illustrate, see the table below (Figure 1). The second column shows the government as a share of GDP in 2020 for selected countries, as measured in total expenditures and reported by the IMF. The third column shows government consumption as a share of GDP, as measured in value added and reported by the World Bank World Development Indicators. The actual share of the GDP that corresponds to the government would need to add government investment (fixed capital formation) to government consumption. These data were not readily available for most countries in the WB WDI dataset and it seems like disentangling government and private fixed capital formation is not very simple. So I added total fixed capital formation (public and private) to government consumption, for the sake of comparison with IMF numbers (fourth column). The actual weight of the government in GDP should be somewhere between columns three and four.

Figure 1. Government Relative to GDP, Selected Countries, 2020

Country	Government Expenditures as % of GDP (IMF)1	Government Consumption (value added) as % of GDP (WB)2	Government Consumption +Total (public and private) Fixed Capital Formation (value added) as % of GDP (WB)2
France	61.35	24.84	48.12
Germany	50.46	22.02	43.57
Brazil	49.92	20.14	36.70
United Kingdom	49.87	22.60	40.07
United States	44.82	15.09	36.94

Sources: 1. IMF DATAMAPPER. Fiscal Monitor, October 2023, https://www.imf.org/external/datamapper/G_X_G01_GDP_PT@FM/ADVEC/FM_EMG/FM_LIDC. 2. World Bank World Development Indicators. Accessed April 2024, https://databank.worldbank.org/source/world-development-indicators.

Note: government expenditures in 2020 were generally higher than usual, as countries tried to minimize the economic effects of the COVID 19 pandemic.

I am sure there are better data out there somewhere but, after spending some time trying to unbury the IMF metadata (should be more easily findable) my patience was running low. For the US, see data from the Bureau of Economic Analysis which defines the value added by Government as being “the sum of compensation paid to general government employees plus consumption of government owned fixed capital (CFC), which is commonly known as depreciation (BEA, 2008, p.29).” My point still holds.

Another way of looking at the actual weight of government expenditures in the economy would be to compare, not with GDP, but with total output in an economy over a one year period, that is, not discounting intermediate products and services. Country national accounts typically do show this indicator and it tends to be roughly twice as large of the total value added in any one year. The ratio of total output to value added is available in Table 2.6 of the United Nations (UN) National Accounts Statistics. Figure 2 below applies that ratio to the IMF indicator of government expenditures as a share of GDP to obtain a rough estimate of the share of government expenditures over total output in the last column of the table. Note that the resulting estimates are within the range of columns 3 and 4 of Figure 1.

Figure 2. Government Relative to Total Output, Selected Countries, 2020

Country	Government Expenditures as % of GDP (IMF)1(a)	Ratio of Total Output to Value Added (UN)2 (b)	Rough Estimate of Government Expenditures as % of Total Output (a/b)
France	61.35	1.95	31.42
Germany	50.46	2.03	24.83
Brazil	49.92	2.07	24.14
United Kingdom	49.87	1.89	26.40
United States	44.82	1.77	25.39

Sources: 1. IMF DATAMAPPER. Fiscal Monitor, October 2023, https://www.imf.org/external/datamapper/G_X_G01_GDP_PT@FM/ADVEC/FM_EMG/FM_LIDC; 2. UN National Accounts Statistics. Main Aggregates and Detailed Tables. Table 2.6, Accessed April 2024, https://unstats.un.org/unsd/nationalaccount/madt.asp?SB=1&#SBG.

Again, I am sure there are better data out there, but the fact that I had to spend considerable time deciphering the data above and still don’t have non-misleading comparable cross-country data for the actual size of government expenditures relative to total output is of relevance itself for my purposes on this blog.

Other than the issue of comparing apples and oranges, there are additional considerations we need to make when assessing statements like the ones I made above (“government expenditures were 61% of the entire French economy or 45% of the US economy in 2020”). One is about what we are supposed to infer from looking at government expenditures. If a measure is provided as a reference for the extent to which governments participate in the economy, using expenditures ignores the entire side of government regulation, which, in market economies, is likely at least as important as government expenditures to understand the influence of the government in the functioning of an economy. Looking beyond total expenditures and into their breakdown by levels of government, by consumption and investment, and other disaggregated data would likely also contribute to a much richer and productive discussion, not to mention the large literature on taxation, as well as financial indicators of debt and debt sustainability. These are all subjects that the IMF delves into professionally and releases publicly a lot of information about, even if not always easy to decipher. I can’t help wondering, however, whether sites like those of the IMF dashboards linked above are actually doing more harm than good by stressing one small and misleading indicator of government participation in the economy.

Another consideration in interpreting data such as that shown in the IMF dashboards is about GDP and what it represents. Although we often think of it as an indicator of the size of the economy: a) there are important limitations to this indicator, and b) when used, there are different indicators that may be more or less appropriate for different purposes. I will look at these issues in a future post.

References

BEA (Bureau of Economic Analysis). 2008. A Primer on BEA’s Government Accounts, by Bruce E. Baker and Pamela A. Kelly. Available: https://apps.bea.gov/scb/pdf/2008/03%20March/0308_primer.pdf?_gl=1*1anuf1l*_ga*NjM4MDQ4ODA2LjE3MTI3Nzc2ODE.*_ga_J4698JNNFT*MTcxMzExMzg4NC44LjAuMTcxMzExMzg4NC42MC4wLjA. Accessed: April 14, 20244.

BEA (Bureau of Economic Analysis). 2010. Frequently Asked Questions: BEA seems to have several different measures of government spending. What are they for and what do they measure? Available: https://www.bea.gov/help/faq/552 Accessed: April 12, 2024

International Monetary Fund (IMF). 2023. IMF DATAMAPPER. Fiscal Monitor, October. Available: https://www.imf.org/external/datamapper/G_X_G01_GDP_PT@FM/ADVEC/FM_EMG/FM_LIDC; Accessed: April 14, 2024.

United Nations (UN). 2024. UN National Accounts Statistics. Main Aggregates and Detailed Tables. Table 2.6, Available: https://unstats.un.org/unsd/nationalaccount/madt.asp?SB=1&#SBG; Accessed: April 14, 2024.

World Bank. 2024. World Development Indicators. Available: https://databank.worldbank.org/source/world-development-indicators; Accessed: April 14, 2024

Mental Models and Academic Models

Post author:Alex Uriarte
Post published:April 7, 2024
Post category:Fan

Every year, the World Bank publishes a World Development Report, an analysis of a selected aspect of Economic Development and its status in the world at the time. In 2015, the selected theme was “Mind, Society, and Behavior.” In this report, the WB argues that there have been advances in our understanding of how people make decisions, and that this better understanding can be used to increase the effectiveness of development interventions.

They highlight three principles of human decision making:

Many of our decisions are done quickly, making use of an automatic and effortless system of thinking that contrasts with the slower and more deliberative and thoughtful process that we often identify with rational decision making processes. This argument builds on the work of psychologists such as Daniel Kahneman and Amos Tversky and I have discussed this in other blog posts on this site as well.
Our individual decision making is not really just individual, but influenced by the society around us: social preferences, norms, identities. We cannot assume that factors (preferences) taken in consideration in individual decision making are not shaped by the communities in which we are embedded.
The social influences that we receive are embedded in “mental models:” worldviews, stereotypes, simplifying concepts and categories that we use for decision making

The consequence of the three principles is that our decision making is influenced by “culture,” deeply rooted beliefs and practices that we often take for granted and may not even recognize. These beliefs and practices may favor or be detrimental to the achievement of desired development goals by any community. When they are detrimental, breaking the cultural patterns may require addressing social practices and institutions before individual incentives and decision-making can change.

The authors argue that “recognizing that individuals think automatically, think socially, and think with mental models expands the set of assumptions policy makers can use to analyze a given policy problem and suggests three main ways for improving the intervention cycle and development effectiveness:” (p. 192)

“First, concentrating more on the definition and diagnosis of problems, and expending more cognitive and financial investments at that stage, can lead to better-designed interventions. […]
Second, an experimental approach that incorporates testing during the implementation phase and tolerates failure can help identify cost-effective interventions […]
Third, since development practitioners themselves face cognitive constraints, abide by social norms, and use mental models in their work, development organizations may need to change their incentive structures, budget processes, and institutional culture to promote better diagnosis and experimentation so that evidence can feed back into midcourse adaptations and future intervention designs.” (p.192-193).

The recognition by the World Bank of the role that culture plays in development, through the functioning of mental models, came on the tails of increased attention paid to behavioral sciences. The report often cites, for example, the work of Nobel Prize laureates Esther Duflo and Abhijit Banerjee, that (among other things) call attention to the fact that there is evidence from randomized control trials that how foreign aid is designed and delivered often matters for their effectiveness. One of several members of the Advisory Panel to the World Bank report was Cass Sunstein, a legal scholar that, among other things, wrote a book called “Nudge” arguing that policy design and delivery can affect the choices that people make. As I write this post, he also happens to be the husband of the USAID administrator Samantha Power. When she took office in 2021, she seemed to bring the belief that the Agency could use more insights from behavioral science in its own design and delivery of foreign assistance activities and even brought her husband to speak to USAID staff.

Of particular interest to me is the role played by mental models as devices that seem inherent to our nature, to how our brains work, and necessary for our daily functioning and (often unconscious) decision making, but that simultaneously can be detrimental to our goals and hard to break from.

How far can a parallel be drawn with academic models?

Academic models would seem, at first, to be quite the opposite of mental models. They belong to the “thinking slow” realm, we are conscious of their assumptions, the connections between those and their implications, and are able to modify them, consciously, as needed to better explain what we observe in our reality. They would seem to only have in common with mental models, the fact that they are, umm… “models,” simplifications of reality used to allow us to deal with its complexities in a productive way. However, academic models too have a way of inserting themselves into our unconsciousness and biasing our thinking over time, to the point that we are no longer able to recognize this effect.

Hoping for some more insight on how academic models allow us to better understand reality, I found a 2008 paper by Mary S. Morgan and Tarja Knuuttila titled “Models and Modelling in Economics,” which I understand was later (in 2012) published in the “Handbook of Philosophy of Economics” edited by Uskali Mäki and published by Elsevier. I should state upfront that I do not know the extent to which the draft I found was edited before being published three years later. A quick internet search shows that Oxford published its own Handbook of Philosophy of Economics in 2009, as did Routledge in 2021. Philosophy of economics is the kind of subject that interests, annoys and troubles me, all at the same time. I do have a genuine interest in how we claim to know things, but my interest in economics always came from a practical standpoint of wanting to improve the conditions in which the populations I came from lived in. So having to spend too much time on these issues to be able to digest economic theory always struck me as simultaneously necessary but too time consuming, perhaps beyond my capacity to fully grasp, and potentially a waste of my time. In the end, my failure to overcome my methodological or philosophical discomfort with economic theory in general became a source of personal internal conflict and, hence, the troubling nature that these discussions have for me.

But Morgan and Knuuttila’s paper did seem promising to shed some light on these matters, so I dived into it and I will summarize my understanding and takeaways here.

They distinguish between two major views of models in economics: as “idealized entities” or “purpose built constructions.”

As idealizations, they can be viewed as generalizing, abstracting, simplifying and/or isolating, for reasons such as facilitating deductive reasoning or for mathematical tractability. This can be done by identifying aspects of reality that are considered absent or negligible or that can be ignored because they remain unchanged over the time, place or scope of analysis. Often the idea is that, once used for its purpose of analysis, models can be “de-idealized,” or made more concrete, by adding back specificity. The practice of “idealizing” and “de-idealizing” is not simple or inconsequential, however. Deductions from an idealized model may not necessarily hold when more specificity is added back. In fact, the risk of distortion in each direction of the process is considerable.

A couple of traditional discussions around this view of models are, first, the traditional view that data without models reflecting theory can suggest spurious relationships and, therefore, data should follow models that should follow theory. A more recent discussion is on whether it is best to build models from the more general to specific or the other way around. Theory and data jointly feed model building under one or the other approach.

The second view of models, as purpose built constructions, sees them as “fictional entities,” “autonomous objects,” that are not constructed in relation to an observed reality, nor are they necessarily related to theory, or perhaps they are related to just one particular aspect of the observed reality or theory, but are simple tools for thought, perhaps creating a parallel, stylized reality that allows for understanding of the connecting between cause and effect, or “for a variety of purposes.”

My experience with economic theory in the past suggests a greater prevalence of the second view relative to the first, at least in the academic circles I was a part of. That is why I also came to see as a reasonable justification for economic theory that it can show that some results are possible to be observed, and it can show a set of conditions under which those results can be observed (sufficient conditions). This is useful often to demonstrate that what we observe does not necessarily mean that “a” or “b”is true, but could also mean that “c” is true. It helps dispel many myths that we create in our daily lives by not understanding the many circumstances that can lead to what we observe. However, being able to show all possible sufficient conditions for an observation to be true (i.e. the collection of scenarios that, seen as a whole, constitute the necessary conditions for an observation to be true), is much harder. A set of identified sufficient conditions may turn out to only be one of many sets of conditions under which the same result is observed. That is where the usefulness of these theoretical models ends. In other words, these constructed models tell us a lot of what reality is not necessarily, but very little about what reality is.

Morgan and Knuuttila then seem to suggest that, under either view of models, a more useful way to look at them could be one focused on their function: “instead of trying to define models in terms of what they are, a focus could be directed on what they are used to do (p28).”They go on to argue that this would be the focus less on the models themselves but on the process of modeling.

The parallel I was seeking with mental models is not in this paper, after all. The authors only very tangentially allude to it early on when they state that “economics shares an hermeneutic character with other social sciences […] individuals’ knowledge of economics feeds back into their economic behavior, and that of economic scientists feeds in turn into economic policy advice, giving economics a reflexive character quite unlike the natural sciences.” In my experience, no matter how rigorous academic economist may believe themselves to be in their views and uses of models, the moment they are asked to give their opinion about reality they will refer to them (irrespective of their constructive or simplified nature) and draw suggestions about reality that the models themselves to not allow them to. Academic models become academics’ mental models, and the myriad of assumptions, conditions, and circumstances under which they may inform reality get lost in the process.

I also found lacking in Morgan and Knuuttila’s paper a discussion of the possibility or not of actually “testing” models with data. Here too they tangentially allude to it when noting that econometric models are not just testing mathematical models built on theory (selection of variables and causal relationships) but are often simultaneously testing the data, assuming probabilistic distributions, functional forms, the nature of observed errors and stochastic behavior (p15).

The bottomline for me is that, as I found the World Bank’s effort to assess the role of mental models in development practice to be refreshing, I suspect I would find equally refreshing a critical look at the impact that academic modeling has had on our economic understanding as applied to our daily practice, the good and the bad of it. The discussion of “robustness” of academic models, touched in passing in Morgan and Knuuttila’s paper, makes some headway in this direction, as it recognizes the need to question whether models hold to changes in assumptions, place, time, circumstances. A step further would be to ask ourselves the extent to which we fall from academic rigor when we translate academic models to our daily view of the world and start confusing our models with reality. To be continued.

References

Morgan, Mary S. and Tarja Knuuttila. 2008. Models and Modelling in Economics. Forthcoming in U. Mäki (ed) Handbook of the Philosophy of Economics [one volume in Handbook of the Philosophy of Science, general editors: Dov Gabbay, Paul Thagard and John Woods]. Available: chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://sites.pitt.edu/~jdnorton/teaching/Phil_Sci_Core/HPS_2501_2020/more_pdf/Knuuttila_Morgan_Models_2009.pdf/. Accessed: April 07, 2024.

World Bank. 2015. World Development Report 2015: Mind, Society, and Behavior. Available: https://www.worldbank.org/en/publication/wdr2015. Accessed: April 07, 2024

Management as a Balancing Act: A Personal Account

Post author:Alex Uriarte
Post published:February 9, 2024
Post category:Fan

As I explain in the “Home” and “About” pages of this site, how we know has been a lifelong interest of mine and is a theme throughout this site. It is, therefore, with some discomfort that I write this personal account. Any personal account is anecdotal evidence and of a particular kind: it rings true to ourselves because we lived it, but it is also subject to our own unrecognized biases. So the value of a personal account in a learning blog is hard for me to gauge. In any case, as I write this post, I have been for eight years leading a team providing services to USAID. I feel it is almost an obligation to myself to want to try to draw lessons from this experience. So here goes.

In my role of leading and managing a team, I see my performance as having greatly benefited from luck. This luck is of two kinds: a) I benefited from a few personal traits that I developed over time through no merit of my own; b) I benefited from the environment I was placed into when starting to lead this team. The personal traits that I think helped me have been some degree of humility and empathy born out of not so memorable events that resulted in conflicting feelings of superiority (ugly, yes, I know) and failure, confidence and insecurity, and I think may have translated into a relatable and approachable style of leadership (I will not delve any further here). The environment that I was placed in and that also helped me, was a receptive group of young, kind and competent staff, that was delivering day after day on its own and had built a culture of collaboration and collegiality. For whatever reason, I was embraced. Independent of my own feelings of luck, personal psychological history or additional details of my team, the important part is that, early on, my role on this team was steered by supportive personal relationships, rather than any particular management capabilities I brought to the team.

Over time, however, my team grew both in size as it did in scope and complexity of the work we were asked to take on. It gradually became clear to me – and I believe to others on the team – that supportive and collegial relationships would not be enough to sustain a successful team performance. We needed to improve in ways that none of us were very familiar or comfortable with.

The first direction we sought was towards better defined and established processes, management tools and standards of operation and behavior across the team. The Project Management Institute’s Project Management Body of Knowledge (PMBOK), 6th edition (2017, the only one I have) has a table distinguishing between leadership and management (p. 64) that I reproduce below:

To me the central point is the fourth row: leadership focuses on people, management focuses on systems and structure (I would say processes and tools). We needed to continue nurturing a leadership structure and culture that we thought had been successful until then, while improving on processes and workflows, and providing the team with tools (software, templates, established standards and procedures) that would enable us to gain in effectiveness and efficiency. This all may sound like jargon, but it is really what we thought we needed to do.

I believe we have advanced considerably in this direction, although there is still much to be done and it is an ongoing effort, so I will not provide details in this post (perhaps in another, future one). But I would like to highlight that, moving towards better established processes, standards and tools, does not replace the role of leadership. I have found that professional managers often dive into the PMI management jargon and guidance, while forgetting that the PMBOK itself distinguishes between management and leadership, and attention to both are necessary for the good functioning of a team. Our own efforts and time must be geared towards both management and leadership, This is the first balancing act.

A second and more recent direction we’ve been pursuing is in assessing what kinds of top-down authority we need to allow ourselves to exercise and enforce, unapologetically. As mentioned, our team relied largely on a supportive and collegial culture to function. That is a good thing. But, as such, establishing authorities of a more hierarchical nature is not always easy: it comes with a risk of creating unhealthy power relations. A common illustration that I have found in several places on the internet, mostly blogs, is the one below.

In several blogs, I have seen this picture accompanied by text that goes something like: “when the top guy looks down, they only see s***, when the bottom guy looks up, they only see a**h*****.” The top guy is portrayed as a manger or a CEO and the layers typically reflect layers of management. This is a common view of a top-down management structure. We wanted to avoid the negative relations that are often associated with such structures. However, we did find that some degree of top-down enforcement is needed to enforce minimum standards across a team, and minimum levels of accountability and fairness.

As with the effort to establish better management processes and tools, this effort to better establish authorities and accountability is also ongoing, and here too I will not get into detail in this post. But this is a second balancing act: establishing clear expectations, responsibility, accountability and a structure to enforce such accountability, without losing the supportive and collegial culture built collectively over time.

So, for whatever it’s worth, there is my personal account. I’m sure I will come back to this in the future, with the critical eyes of our ever transforming selves, as it should be, neither kind, nor mean, but hopefully as honest as self-assessments can possibly be. I also hope to, in the future, further develop how we’ve been rolling out our efforts to better balance leadership and management, bottom-up and top-down structures and processes, the extent to which our efforts succeeded and any insights from the experience.

References

Project Management Institute (PMI). 2017. A Guide to the Project Management Body of Knowledge. PMBOK Guide. Sixth Edition

All Things Shining Part II

Post author:Alex Uriarte
Post published:July 22, 2023
Post category:Kitchen Table

He deals the cards as a meditation

And those he plays never suspect

He doesn’t play for the money he wins

He don’t play for respect

He deals the cards to find the answer

The sacred geometry of chance

The hidden law of a probable outcome

The numbers lead a dance

– Shape of My Heart, Sting

So, even after skipping to the final chapter and registering my initial thoughts on this site (see below my post “Initial Thoughts on ‘All Things Shining’”), I went back to the other chapters and finished reading the book (yep, I actually did). Doing so gave me a lot more context on where the authors are coming from and simultaneously served as an organized introduction to some philosophers and literature pieces I knew little or nothing about.

Grossly oversimplifying, their main point, as I understand it, is that there are opportunities to experience the sacred in a Godless world. There is no need to believe in a God (or Gods) to do so, only the predisposition to perceive and experience “moods” in the world around us, be part of those moods that come and go like a wave (“whoosh”) and nurture our capacity to do so like an artisan nurture’s its craft. They contrast their view with that of other philosophers and writers, briefly describing a history of western thought that evolved from a polytheist experience of “moods,” such as those in Homer’s work, to a more monotheistic type worldview in classical Greece, and then through two paradigm shifts (Jesus Christ and Christianity and the Enlightenment and Renee Descartes) that gradually brought us to a nihilist reality, centered on the self-sufficient individual (I find my own pretense of summarizing much of a book in one paragraph astonishing, but…there it is!). For my own benefit, I attempted to organize my understanding of their portrayal of this history of philosophy in Figure 1 below (including a few of my unresolved questions).

Figure 1 – Notes that likely only I am able to follow

There seems to be: a) an essential assumption in the authors’ reasoning, and b) an essential observation.

The assumption is that meaning is to be found outside of ourselves. We cannot successfully impose it. They make this argument in the second chapter of the book, when discussing David Foster Wallace’s Nihilism and his proposition that we can impose meaning on our reality. They state this possibility is “the most demanding and the most impoverished all at once” (p 47):

Most demanding because:
- It raises the stakes for happiness and demands a kind of bliss that supersedes any kind of earthly condition (ps. 47-48). They later equate this with the Buddhist concept of Nirvana (see ps 163-164).
- It demands that bliss be constant and achieved at all times (p. 48)
Impoverished because there is no place for gratitude (p.48)

They reject this option as a source of meaning and re-emphasize on several occasions throughout the book. For example, on p 142: “The history of the last 150 years suggests that we are not the proper source for meaning in the world”

I, quite frankly, don’t see how they deduce any of the points above. The fact that they seem convinced that they have made their case and discuss the rest of the book as if meaning must be found outside the self, to me, unfortunately, weakens the entire book that, otherwise, I find very interesting, illuminating even. To top it off, as mentioned in my previous post below, they take the cheap shot of using the example of David Foster Wallace’s suicide to emphasize that this is the only possible ending. Thinking back, if all this had sunk in while I was reading the book, I probably would have stopped right there. But I am glad that I continued reading and can enjoy the rest of the book by thinking of the above as an assumption made by the authors, not something they were really set to demonstrate. I can live with that and address the assumption elsewhere.

The observation I can easily agree with. The observation is that we cannot control everything that happens outside of ourselves. We do not fully control our world and the events that surround us. This seems to me to be patently and observably true. It is even arguable that we do not control everything about ourselves (such as quick, intuitive thinking, and involuntarily biological processes).

Given the “assumption” and the “observation” (by the way, these designations are mine, not the authors’), this leads them through a path of wanting to be in-sync with, immersed in, aware of this uncontrolled and uncontrollable environment, and the external “moods” that are generated with the idea that, in doing so, we generate an opportunity to experience meaning, something we can call sacred. I see the appeal of this way of thinking and I see parallels with oriental philosophy that, for some reason the authors seem to want to negate.

Now here is where my thoughts go next: what if we look at the whooshing, the moods, the physis we are exposed to as being generated by random events? We can even choose to look at our own choices and the choices of those around us as generated by random events or being individual outcomes of random events (leaving aside the fact that, in that case, we probably wouldn’t discuss them as “choices”). If we think about the examples the author’s give in these terms, how would our outlook on meaning differ?

To use one of the author’s examples, say we are at a sporting event and collectively experience an athlete “in the zone.” We all realize what is happening and collectively feel we have been a part of something special. For this special moment to occur, we need to have gone to the sporting event, and we need to have been open to experience the athlete’s performance as something special. In addition, those around us need to have done so as well, and at least one athlete must have been “in the zone” on that day. We can look at the aspects of this experience that we do not control as random events: the choices and behaviors of others attending the game, and the likelihood that at least one athlete will be “in the zone.” In that case, our special experience, our experience of the “sacred,” would be one possible outcome of a joint probability distribution of the random events we are exposed to. Why is there a stronger argument to feel gratitude in this case, then in any other outcome of the joint distribution?

In the end, it seems to me what really matters is what the authors’ assumed away in the first two chapters of the book: whether meaning can be attributed from within, whether it needs to be defined in relation to something outside ourselves, and whether meaning should be attributed at all, at least if we look at the world as one that can be described by joint distributions of random events. It is as if the author’s efforts to propose an alternative to today’s nihilism relies on first simply assuming away nihilism as a viable outlook on life. It seems to me the central question is to what extent we think of what we experience in the world as being random or deterministic.

The discussion above takes me to a memory of my father. My father was an engineer. He was very intelligent and liked math. He was also very Catholic. I have this recollection of a period when he was logging into a table the results of the national lottery where he lived. The idea was that, perhaps, he could find a pattern in the lottery numbers. A pattern that would maybe signal some higher order or something he could be “in-sync” with, tuned into. I am sure he would have liked to win the lottery, and he would have interpreted it as a gift from God. Many Catholics view the world as a place where everything happens for a reason. But I honestly think that the discovery of a pattern itself would have been way more rewarding to him than any payments he would have received from winning the lottery. It would have proven the existence of a higher order and at the same time provided some personal intellectual gratification. It is possible to think of my father’s attempt, and it seems to me of the entire book discussed above, as ultimately stemming from a kind of wishful thinking. One that is not unappealing to me, yet difficult for me to accept. One that is difficult for me to accept, yet not unappealing to me.

I don’t think my father knew much statistics, or perhaps didn’t care much for it. And, spoiler alert, to my knowledge he never found a pattern in the lottery numbers, or at least not one that proved successful in winning the national lottery. Interestingly enough, he did many years later win a car in a lottery from the club he belonged to: it was close to Christmas time and he did quite desperately need a new car. He interpreted it as a gift, perhaps even as a reward for his faith, and I am sure he was thankful. Relevant to the discussion above: he did not pick the numbers.

Reference:

Dreyfus, Hubert and Sean Dorrance Kelly. 2011. All Things Shining. Reading the Western Classics to Find Meaning in a Secular Age. New York: Free Press.

Initial Thoughts on “All Things Shining”

Post author:Alex Uriarte
Post published:June 4, 2023
Post category:Kitchen Table

I started reading “All Things Shining: Reading the Western Classics to Find Meaning in a Secular Age,” by philosophy professors Herbert Dreyfus (Berkeley) and Sean Dorrance Kelly (Harvard). At the end of the second chapter, after using authors David Foster Wallace and Elizabeth Gilbert to illustrate contrasting views on how meaning is generated or attributed in our daily lives, they end with the following paragraph:

“The question that remains is whether Gilbert and Wallace between them have completely covered the terrain. In Wallace’s Nietschean view, we are the sole active agents in the universe, responsible for generating out of nothing whatever notion of the sacred and divine there can ever be. Gilbert, by contrast, takes a kind of mature Lutheran view. On her account we are purely passive recipients of God’s divine will, nothing but receptacles for the grace he may choose to offer. Is there anything in between? We think there is, and we will try to develop it in the final chapter of the book.” (p. 57)

Needless to say, I skipped to the final chapter.

In that chapter they suggest there is present, today, in our culture, opportunities to generate a kind of experience where, we neither need to impose meaning on our world, nor passively await for meaning to descend upon our lives.

They start by suggesting that there are often collective experiences of marvel, bliss, exhultation. Experiences like those where expectators attending a live sporting event witness an athlete “in the zone” and are collectively taken by the experience. Or when, also collectively, we rejoice in a skilled orator’s speech. They equate these experiences with moments of realization of Homer’s notion of physis: how the world is in as much as it presents itself to us. They also suggest the exhultation of these collective experiences come and go as a wave. They use the term “whoosh” to describe it.

They then recognize the dangers of “whooshing,” like when the collective experience is dominated by some type of mass mentality and pack behavior, and where rational self-control is obliterated.

But they claim there is another type of experience available to us, which also offers an opportunity for experiencing meaning in our physical connection to the world, and which, in addition, can be used to discern between “good” and “bad” whooshing. They equate that experience with the Aristotelian term: poiesis. Poiesis captures a craftsman’s like practice of developing an intimate understanding and relationship with some aspect of our world. A relationship characterized by a “feedback loop between craftsman and craft” (p.211) and through which meaning also arises.

But “poiesis” too has its limitations, in that, in our world, it is under attack by technology. Technology that reduced our need to deeply understand our world to reach our goals. An example they give is that of GPS, through which we can move from one place to another, simply by following orders, and with never building significant knowledge of our surroundings.

The authors then argue that it is up to us to discover, in what we already care about, the opportunities for poiesis. Whether it is in drinking a cup of coffee in the morning, enjoying a walk, or the company of a friend. In other words, discovering it in our relationship to the world, and then nurturing it, transforming routine into ritual. They argue that is the realm of the sacred that exists currently in our world: a rich polytheistic world where the sacred manifests itself through physis and poiesis, and where technology has its place, without completely erasing the opportunities for the sacred.

I am very much enjoying the book, and intend to read the chapters in between (really…at some point), but here are a few thoughts based on the three chapters I read already:

In trying to describe how we can discover what we care about and build rituals around routines to nourish our experience of being in this world through poiesis, they use the example of having coffee each morning. They suggest asking what we like about this routine, whether it is the warmth of the coffee, the striking black color, the aroma. That is, they appeal to the senses. This appeal to the senses is very similar to how I have learned to practice mindfulness, to be present in this world, based on my understanding of the practice and the teachings of Thich Naht Hahn. Other parallels to eastern ways of thinking could be made when discussing Wallace’s “This is Water” commencement speech in chapter two. Yet, the authors do not seem to be interested in discussing these potential parallels, at least not in the parts I have read.
In discussing Wallace’s nihilism, they propose that, to the extent that he can see any space for the sacred, his attempt to reach it is by imposing meaning on experience, creating this meaning out of nothing, and constantly doing so. There are no constraints on the meaning Wallace can impose on his experience. They suggest this self-imposed task is not humanly possible. In fact, they state that “in such a world, as Melville understood, grim perseverance is possible for a while; but in the end suicide is the only choice” (p.50). I a) do not see how or why “suicide is the only choice,” and b) find this statement actually in bad taste, to say the least, since Wallace did commit suicide. His example is rather an unfairly picked choice, unnecessarily making use of someone’s suicide to make their argument.
The authors describe Wallace and Gilbert as having a similar view of the purpose of writing: translating life, conveying what it means to be human, becoming less alone. If the interpretation of our world and our condition is actually an intrinsic and inseparable part of our human condition, one could argue that storytelling is not just interpreting what it means to be human, but actually contributing to creating that condition. Storytelling would then be a way of creating rather than just interpreting our world. I have for a while found this way of thinking appealing, and it is different from what the authors claim to be the views of Wallace and Gilbert. It is perhaps more in tune with Mario Vargas Llosa’s story in his book “El Hablador.” Perhaps something I will explore further in the future.

Reference:

Dreyfus, Hubert and Sean Dorrance Kelly. 2011. All Things Shining. Reading the Western Classics to Find Meaning in a Secular Age. New York: Free Press.

Similarities in GDP Per Capita Trajectories

Post author:Alex Uriarte
Post published:March 5, 2023
Post category:Engelberg Huller

[For the data and R code for this blog post, please visit my GitHub repository EngelbergHuller/GDP-Growth-Similarities]

In my previous Engelberg Huller post, “Catch-up,” January 20, GDP per capita growth data over a 60 year period seems to suggest similarities in growth trajectories of countries geographically close to each other, whether reflecting similar institutions and histories, economic integration and interdependency patterns, or some other factor.

In an attempt to further explore these similarities, but also teach myself a bit of the open source statistical software R, I decided to look at growth data using an R package called “Similarity Measures.” This package offers functions built to compare two vectors and assess the numerical proximity between the elements of those vectors. Functions such as these are often used to compare the distance between two geographical trajectories, such as those of migrating animals or traffic. But they can also be used to compare trajectories of single variables over time.

I used the same dataset of Gross Domestic Product in constant local currency units over the 60 year 1961-2020 period that I used in the “Catch-up” post. I had to exclude 3 of the 93 countries used in “Catch-up” for lack of complete data for all the 60 years and I transformed GDP in constant LCUs into an index with 1961 = 100 to be able to compare trajectories in the same unit of measurement.

I used a function called Longest Common Subsequence (LCSS). This function counts the number of elements that are considered equivalent under certain criteria. The criteria are determined by three parameters. The following is my understanding of these parameters:

The first establishes what elements in each vector are compared. In the R function, this is the “pointSpacing” argument, A value of 2 means that 2 intervals between the indexed elements of each vector are allowed.
The second parameter establishes the difference allowed in the values between elements compared, for those elements to be considered equivalent. In the R function, this is the “pointDistance” argument.
The third parameter I have less of an understanding but is a margin of error established for the algorithm calculations, and it influences the “accuracy and speed of the calculation.” In the R function, this is the “errorMarg” argument. In calibration, this parameter seemed to make little difference in the outcomes.

I initially applied the LCSS function to the country GDP per capita index trajectories where 1961 was set to 100 for all countries. Because the LCSS function compares years based on the distance between their values, there are many more years considered the same by the function in the early period of the trajectories (say, 1960s) than in the later period (say, 2010s). This is not what we would like. We would like all period of the trajectories to be valued the same when accessing similarity between two trajectories.

So I turned to applying the LCSS function on the growth rates themselves. Doing so would mean that, when one countries GDP per capita index trajectory goes up, say 3 percentage points, and another one does too, the two trajectories would be considered equivalent in that year, even if at that point in time their cumulative growth histories had distanced their growth trajectories.

To calibrate the LCSS function (choose the parameters to use), I used the trajectories for Argentina and Uruguay, two countries whose GDP per capita growth trajectories appeared to be closely related in my “Catch-up” post. I chose parameters that seemed intuitively reasonable and that didn’t seem to generate extreme outcomes (e.g. entire trajectories for two countries being considered the same or only the first year, 1961 = 100, being considered the same). I ended up with:

pointSpacing = 2
pointDistance = 2
errorMarg = 0.5

Running the LCSS function to compare 90 countries, 2 by 2, in all possible combinations, generates a matrix with 8100 results with diagonal = 59 (each country’s trajectory when compared to itself shows all 59 years being equivalent). This leaves 8100-90 = 8010 results that compare different countries. Because the function compares, say, Argentina to Uruguay and then Uruguay to Argentina, the number of unique results comparing two different countries is actually 8010/2 = 4005. Because it took my laptop a few seconds to compare each pair of trajectories, running the function for the entire set of 90 countries took me over 11 hours (and so I did each run overnight).

Out of the 90 countries, the two that had the closest GDP growth trajectories were France and Austria. With the parameters chosen, their growth rates were equivalent in 58 of the 59 years. The least similar trajectories had growth rates that were equivalent in 20 of the 59 years and there were three pairs of trajectories with that score: Burma and Bahamas, Greece and Chad, Iran and Indonesia.

The eight most similar pairs of trajectories were among five European countries: France, Belgium, Netherlands, Italy and Austria. Their growth trajectories are shown in Figure 1, below.

Figures 2 shows the most similar GDP per capita trajectories, those of France and Belgium, and Figure 3 shows their growth rates.

The South America Southern Cone had GDP per capita similarity scores in the 20s and 30s, i.e. their growth rates were similar in 20 to 40 of the 59 years compared (F4)

From the “Catch-up” post, we saw that the two highest growth countries in the 1960-2020 period were China and South Korea. Figures 5 and 6 below show how their growth trajectories compare. Their growth rates were comparable until the early 1990s, when South Korea’s growth rate slowed down and China continued its accelerated pace.

Two other interesting pairs of growth trajectories are the United States and the United Kingdom; and Bolivia and Guatemala. After the five aforementioned European countries, the next closest pair of growth trajectories are those of the United States and the United Kingdom (F7). All other countries with trajectories similar to others in 50 or more of the 59 years compared are rich countries (other European countries and Australia), the exception being the pair of trajectories for Guatemala and Bolivia (F8). Both these countries saw their GDP per capita fall in the first half of the 1980s. I will leave the reasons to explore in a potential future post.

The exercise above suggests strong connections between the growth trajectories of rich countries, not as much for the rest of the world. It also proved to be a nice little contribution to my own R learning. I intend to further explore growth data in future posts.

References

World Bank: World Development Indicators. Available from USAID IDEA: https://idea.usaid.gov/. Accessed: January 14, 2023

Catch-up

Post author:Alex Uriarte
Post published:January 20, 2023
Post category:Engelberg Huller

Look at the figure F1 below. It shows Gross National Income (GNI) per capita for countries in the Southern Cone of South America relative to that of the United States over a period of 26 years (1995-2020), as much data as I found available in Purchasing Power Parity (PPP). What do you see?

I see two main things:

Paraguay’s per capita income is pretty much the same share of the U.S.’s in 2020 as it was in 1995. Chile’s and Uruguay’s are slightly higher in 2020 than in 1995, Brazil’s is slightly lower than it was, and Argentina’s is quite lower than it was.
The biggest fluctuation in the ratio of GNI per capita’s relative to the U.S. was that of Argentina, particularly during the 10 year period between 1998 and 2008, when the ratio fell from around 0.4 to about 0.3 and then back up to 0.4 (interval shown by the vertical blue lines).

For someone interested in the economic development of the Southern Cone of South America, the two bullets are not very comforting. They suggest little to no “catch-up” happening relative to the United States. More generally, they show little movement at all in the ratio of national per capita incomes relative to the U.S., raising the question of how easy or hard it is to achieve some kind of catch-up. Even Argentina’s growth between 2002 and 2008 was likely mostly recovery from the decline between 1998 and 2002.

I looked at similar data for Central America, an area of particular importance to the U.S. and its foreign assistance, given the strong links of its population to the U.S. through migration flows.

Here too the main trend seems to be a relative stability in the ratio of national income of Central American countries relative to the U.S., the exception being some apparent progress being made by Panama since 2006.

Perhaps a secondary suggestion of both charts above, is that there seems to be stronger similarities in the trajectories of some countries in the same region relative to others. For example, Argentina and Uruguay. Or perhaps Brazil and Paraguay. Costa Rica, Panama’s neighbor, shows a slight upwards trend from 2006, potentially associated with Panama’s. In other words, it is worth exploring the strength of economic integration between neighboring countries (in a future post).

I decided to look at longer term growth trends. I used Gross Domestic Product (GDP) per capita data measured in constant local currency units (LCUs) for three reasons: the World Bank has data for over 90 countries starting in 1960 for this indicator, GDP is presumably a better indicator of productivity growth inside the country than GNI, and constant LCUs would circumvent the exchange rate issues that other units of measurement (like constant U.S. dollars or PPP international dollars) have to deal with. The drawback is that the absolute measures of output are not comparable between countries. It only makes sense to use LCUs to compare growth rates. I divided the average GDP per capita of a country between 2018-2020 by the average for that same country between 1961-1963. The result is how many times the GDP per capita of that country was multiplied over a 60 year period, in constant local currency. This is a measure of productivity growth.

Figure 3 below is a histogram for the 93 countries for which data were available. For 82 of those countries, the resulting growth factor in per capita GDP over the 60 year period was between 0 and 6. Three countries had a growth factor between 6 and 7 and the remaining 8 countries had higher growth factors, including factors of 58 for China, 28 for South Korea, 17 for Botswana and 15 for Singapore. I did not include a 94th country, Somalia, for which the factor was 551 and that seemed unreasonably large to me. I hope to explore in a future post.

Looking at these data, I again have two observations:

If the U.S. GDP per capita was almost 3 times higher in 2020 than in 1960, all the other countries who grew their GDP per capita by multiples between 0 and 4 or 5 did no or little catching up. If your GDP per capita is, say, a quarter of that of the U.S. and the U.S. grows its GDP per capita three times over a given period, you would need to grow yours by 3 x 4 = 12 times to catch-up. If your GDP per capita was a tenth of that of the U.S., you would need to grow your GDP per capita by a multiple of 30.
The countries that did some catching up seem geographically concentrated around China, with the exception of Botswana, as shown by the circle in the map below. The darker the red, the higher the GDP per capita growth factor. The darker the blue, the lower the GDP per capita growth factor.

F4. Map: countries in red shade are those in the first two buckets of the above histogram

Data Source: World Bank, World Development Indicators (WDI). GDP Constant LCUs and Population data. Map built using Tableau Public.

If China performed so well over the 60 year period, how come its GDP per capita today is not comparable or even larger than that of the U.S.? We do not have data in comparable units (e.g. PPP) going back that to 1960. But based on GNI data measured in current US$ (averaging exchange rates over a three year period – Atlas method). China’s GNI per capita in 1962 (oldest year) was approximately 2% (1/50) of that of the U.S. China would have had to have grown by a factor of 50 x 3 = 150 during that period to have caught-up with the U.S.

So here are some questions for potential exploration in future posts:

How common/rare is it for a country to catch up? Are there particular circumstances that are always/often present when countries do catch up? Are these circumstances different for countries at different levels of GDP/GNI per capita?
How good/bad are GDP and GNI as indicators of the standards of living and/or well-being of the population of a country?
To what extent do fluctuations in exchange rates affect standards of living and well-being? How well are the different units of measurement of GDP and GNI able to remove the effect of any share of those fluctuations that do not reflect standards of living or well-being?
What accounts for the apparent similarity in growth trajectories of some neighboring countries? Is it level of trade and/or economic integration? Is it similarity in their economies and exposure to similar external circumstances (shocks)?
On the Southern Cone countries: Uruguay’s GNI per capita fluctuations seem to follow somewhat those of Argentina up to around 2013 or so, but then not so much. Same thing seems to have happened to Paraguay’s relative to Brazil. Was that actually the case and what could explain that?
On Central America: what explains Panama’s performance after 2006?

References

World Bank: World Development Indicators. Available from USAID IDEA: https://idea.usaid.gov/. Accessed: January 14, 2023

End of content

No more pages to load

Older Posts