Similarities in GDP Per Capita Trajectories

[For the data and R code for this blog post, please visit my GitHub repository EngelbergHuller/GDP-Growth-Similarities]

In my previous Engelberg Huller post, “Catch-up,” January 20, GDP per capita growth data over a 60 year period seems to suggest similarities in growth trajectories of countries geographically close to each other, whether reflecting similar institutions and histories, economic integration and interdependency patterns, or some other factor.

In an attempt to further explore these similarities, but also teach myself a bit of the open source statistical software R, I decided to look at growth data using an R package called “Similarity Measures.” This package offers functions built to compare two vectors and assess the numerical proximity between the elements of those vectors. Functions such as these are often used to compare the distance between two geographical trajectories, such as those of migrating animals or traffic. But they can also be used to compare trajectories of single variables over time.

I used the same dataset of Gross Domestic Product in constant local currency units over the 60 year 1961-2020 period that I used in the “Catch-up” post. I had to exclude 3 of the 93 countries used in “Catch-up” for lack of complete data for all the 60 years and I transformed GDP in constant LCUs into an index with 1961 = 100 to be able to compare trajectories in the same unit of measurement.

I used a function called Longest Common Subsequence (LCSS). This function counts the number of elements that are considered equivalent under certain criteria. The criteria are determined by three parameters. The following is my understanding of these parameters:

  • The first establishes what elements in each vector are compared. In the R function, this is the “pointSpacing” argument, A value of 2 means that 2 intervals between the indexed elements of each vector are allowed.
  • The second parameter establishes the difference allowed in the values between elements compared, for those elements to be considered equivalent. In the R function, this is the “pointDistance” argument.
  • The third parameter I have less of an understanding but is a margin of error established for the algorithm calculations, and it influences the “accuracy and speed of the calculation.” In the R function, this is the “errorMarg” argument. In calibration, this parameter seemed to make little difference in the outcomes.

I initially applied the LCSS function to the country GDP per capita index trajectories where 1961 was set to 100 for all countries. Because the LCSS function compares years based on the distance between their values, there are many more years considered the same by the function in the early period of the trajectories (say, 1960s) than in the later period (say, 2010s). This is not what we would like. We would like all period of the trajectories to be valued the same when accessing similarity between two trajectories.

So I turned to applying the LCSS function on the growth rates themselves. Doing so would mean that, when one countries GDP per capita index trajectory goes up, say 3 percentage points, and another one does too, the two trajectories would be considered equivalent in that year, even if at that point in time their cumulative growth histories had distanced their growth trajectories.

To calibrate the LCSS function (choose the parameters to use), I used the trajectories for Argentina and Uruguay, two countries whose GDP per capita growth trajectories appeared to be closely related in my  “Catch-up” post. I chose parameters that seemed intuitively reasonable and that didn’t seem to generate extreme outcomes (e.g. entire trajectories for two countries being considered the same or only the first year, 1961 = 100, being considered the same). I ended up with:

  • pointSpacing = 2
  • pointDistance = 2
  • errorMarg = 0.5

Running the LCSS function to compare 90 countries, 2 by 2, in all possible combinations, generates a matrix with 8100 results with diagonal = 59 (each country’s trajectory when compared to itself shows all 59 years being equivalent). This leaves 8100-90 = 8010 results that compare different countries. Because the function compares, say, Argentina to Uruguay and then Uruguay to Argentina, the number of unique results comparing two different countries is actually 8010/2 = 4005. Because it took my laptop a few seconds to compare each pair of trajectories, running the function for the entire set of 90 countries took me over 11 hours (and so I did each run overnight).

Out of the 90 countries, the two that had the closest GDP growth trajectories were France and Austria. With the parameters chosen, their growth rates were equivalent in 58 of the 59 years. The least similar trajectories had growth rates that were equivalent in 20 of the 59 years and there were three pairs of trajectories with that score: Burma and Bahamas, Greece and Chad, Iran and Indonesia.

The eight most similar pairs of trajectories were among five European countries: France, Belgium, Netherlands, Italy and Austria. Their growth trajectories are shown in Figure 1, below.

Figures 2 shows the most similar GDP per capita trajectories, those of France and Belgium, and Figure 3 shows their growth rates.

The South America Southern Cone had GDP per capita similarity scores in the 20s and 30s, i.e. their growth rates were similar in 20 to 40 of the 59 years compared (F4)

From the “Catch-up” post, we saw that the two highest growth countries in the 1960-2020 period were China and South Korea. Figures 5 and 6 below show how their growth trajectories compare. Their growth rates were comparable until the early 1990s, when South Korea’s growth rate slowed down and China continued its accelerated pace.

Two other interesting pairs of growth trajectories are the United States and the United Kingdom; and Bolivia and Guatemala. After the five aforementioned European countries, the next closest pair of growth trajectories are those of the United States and the United Kingdom (F7). All other countries with trajectories similar to others in 50 or more of the 59 years compared are rich countries (other European countries and Australia), the exception being the pair of trajectories for Guatemala and Bolivia (F8). Both these countries saw their GDP per capita fall in the first half of the 1980s. I will leave the reasons to explore in a potential future post.

The exercise above suggests strong connections between the growth trajectories of rich countries, not as much for the rest of the world. It also proved to be a nice little contribution to my own R learning. I intend to further explore growth data in future posts.

References

World Bank: World Development Indicators. Available from USAID IDEA: https://idea.usaid.gov/.  Accessed: January 14, 2023

Continue ReadingSimilarities in GDP Per Capita Trajectories

Catch-up

Look at the figure F1 below. It shows Gross National Income (GNI) per capita for countries in the Southern Cone of South America relative to that of the United States over a period of 26 years (1995-2020), as much data as I found available in Purchasing Power Parity (PPP). What do you see?

I see two main things:

  • Paraguay’s per capita income is pretty much the same share of the U.S.’s in 2020 as it was in 1995. Chile’s and Uruguay’s are slightly higher in 2020 than in 1995, Brazil’s is slightly lower than it was, and Argentina’s is quite lower than it was.
  • The biggest fluctuation in the ratio of GNI per capita’s relative to the U.S. was that of Argentina, particularly during the 10 year period between 1998 and 2008, when the ratio fell from around 0.4 to about 0.3 and then back up to 0.4 (interval shown by the vertical blue lines).

For someone interested in the economic development of the Southern Cone of South America, the two bullets are not very comforting. They suggest little to no “catch-up” happening relative to the United States. More generally, they show little movement at all in the ratio of national per capita incomes relative to the U.S., raising the question of how easy or hard it is to achieve some kind of catch-up. Even Argentina’s growth between 2002 and 2008 was likely mostly recovery from the decline between 1998 and 2002.

I looked at similar data for Central America, an area of particular importance to the U.S. and its foreign assistance, given the strong links of its population to the U.S. through migration flows.

Here too the main trend seems to be a relative stability in the ratio of national income of Central American countries relative to the U.S., the exception being some apparent progress being made by Panama since 2006.

Perhaps a secondary suggestion of both charts above, is that there seems to be stronger similarities in the trajectories of some countries in the same region relative to others. For example, Argentina and Uruguay. Or perhaps Brazil and Paraguay. Costa Rica, Panama’s neighbor, shows a slight upwards trend from 2006, potentially associated with Panama’s. In other words, it is worth exploring the strength of economic integration between neighboring countries (in a future post).

I decided to look at longer term growth trends. I used Gross Domestic Product (GDP) per capita data measured in constant local currency units (LCUs) for three reasons: the World Bank has data for over 90 countries starting in 1960 for this indicator, GDP is presumably a better indicator of productivity growth inside the country than GNI, and constant LCUs would circumvent the exchange rate issues that other units of measurement (like constant U.S. dollars or PPP international dollars) have to deal with. The drawback is that the absolute measures of output are not comparable between countries. It only makes sense to use LCUs to compare growth rates. I divided the average GDP per capita of a country between 2018-2020 by the average for that same country between 1961-1963. The result is how many times the GDP per capita of that country was multiplied over a 60 year period, in constant local currency. This is a measure of productivity growth.

Figure 3 below is a histogram for the 93 countries for which data were available. For 82 of those countries, the resulting growth factor in per capita GDP over the 60 year period was between 0 and 6. Three countries had a growth factor between 6 and 7 and the remaining 8 countries had higher growth factors, including factors of 58 for China, 28 for South Korea, 17 for Botswana and 15 for Singapore. I did not include a 94th country, Somalia, for which the factor was 551 and that seemed unreasonably large to me. I hope to explore in a future post.

Looking at these data, I again have two observations:

  • If the U.S. GDP per capita was almost 3 times higher in 2020 than in 1960, all the other countries who grew their GDP per capita by multiples between 0 and 4 or 5 did no or little catching up. If your GDP per capita is, say, a quarter of that of the U.S. and the U.S. grows its GDP per capita three times over a given period, you would need to grow yours by 3 x 4 = 12 times to catch-up. If your GDP per capita was a tenth of that of the U.S., you would need to grow your GDP per capita by a multiple of 30.
  • The countries that did some catching up seem geographically concentrated around China, with the exception of Botswana, as shown by the circle in the map below. The darker the red, the higher the GDP per capita growth factor. The darker the blue, the lower the GDP per capita growth factor.

F4. Map: countries in red shade are those in the first two buckets of the above histogram

Data Source: World Bank, World Development Indicators (WDI). GDP Constant LCUs and Population data. Map built using Tableau Public.

If China performed so well over the 60 year period, how come its GDP per capita today is not comparable or even larger than that of the U.S.? We do not have data in comparable units (e.g. PPP) going back that to 1960. But based on GNI data measured in current US$ (averaging exchange rates over a three year period – Atlas method). China’s GNI per capita in 1962 (oldest year) was approximately 2% (1/50) of that of the U.S. China would have had to have grown by a factor of 50 x 3 = 150 during that period to have caught-up with the U.S.

So here are some questions for potential exploration in future posts:

  1. How common/rare is it for a country to catch up? Are there particular circumstances that are always/often present when countries do catch up? Are these circumstances different for countries at different levels of GDP/GNI per capita?
  2. How good/bad are GDP and GNI as indicators of the standards of living and/or well-being of the population of a country?
  3. To what extent do fluctuations in exchange rates affect standards of living and well-being? How well are the different units of measurement of GDP and GNI able to remove the effect of any share of those fluctuations that do not reflect standards of living or well-being?
  4. What accounts for the apparent similarity in growth trajectories of some neighboring countries? Is it level of trade and/or economic integration? Is it similarity in their economies and exposure to similar external circumstances (shocks)?
  5. On the Southern Cone countries: Uruguay’s GNI per capita fluctuations seem to follow somewhat those of Argentina up to around 2013 or so, but then not so much. Same thing seems to have happened to Paraguay’s relative to Brazil. Was that actually the case and what could explain that?
  6. On Central America: what explains Panama’s performance after 2006?

References

World Bank: World Development Indicators. Available from USAID IDEA: https://idea.usaid.gov/.  Accessed: January 14, 2023

Continue ReadingCatch-up

Poverty Traps and Foreign Aid

  • Post author:
  • Post category:Fan
Image by Nambasi. Downloaded from pixabay.com

I recently finished reading Esther Duflo and Abhijit Banerjee’s “Poor Economics.”

I have at least three topics in my mind from this reading and will need to discuss them in separate posts:

  • The concept of a Poverty Trap and its application.
  • The value and applicability of Randomized Control Trials (RCTs).
  • The value of foreign assistance for poor countries and the conditionality of that value on exactly what is done with foreign assistance resources.

I might be able to address the last two bullets jointly in a future post, given current trends in US foreign assistance towards evidence from RCTs. In this post, however, I will focus on Poverty Traps.

Duflo and Banerjee describe poverty traps as follows:

“There will be a poverty trap whenever the scope for growing income or wealth at a very fast rate is limited for those who have too little to invest, but expands dramatically for those who can invest some more. On the other hand, if the potential for fast growth is high among the poor, and then tapers off as one gets richer, there is no poverty trap” (p.11).

Adapting figures 1 and 2 from their book, in the figure below, function G (the S-shaped curve) represents a situation where an individual faces a poverty trap, if their income is currently below the level C. In that situation, a boost in income from say, income level A to income level B, is insufficient to support a sustainable increase in income over time and income levels will tend to fall back to A. Function F, on the other hand, represents a situation where an individual faces no poverty trap at any income level. A boost in income above the income level A would allow the individual to continue investing until income level D is reached (sustainable levels of income are those where Yt+1 = Yt, that is, where the function that maps Yt+1 to Yt crosses the 45° line).

The concept of a poverty trap is at the center of their book. The authors argue that, in a situation where there is a poverty trap, foreign aid can be of assistance to allow people to climb out of poverty. In a situation where there is no poverty trap, there is little justification for foreign aid.

They argue that whether there is or not a poverty trap – and, therefore, whether foreign aid is justified – is an empirical question that needs to be answered with evidence for any particular situation, and that it cannot be answered generally for all situations.

They contrast their position with economists that advocate more broadly for foreign assistance, or even greatly expanding it (e.g. Jeffrey Sachs in his well known book The End of Poverty) and other economists that largely oppose it (e.g. William Easterly in his also well known book The White Man’s Burden.

Much of the remainder of the book is spent looking at particular cases and whether they show evidence of a poverty trap or not. Those cases pertain to hunger, health, schooling, family planning, risk, access to credit, savings, and entrepreneurship. Evidence is collected typically from RCTs.

In some cases the authors find evidence of a poverty trap, in others they do not. Their main message seems to be that aid can be effective if directed to projects where there is evidence of impact, under conditions where there is evidence of impact, implemented based on evidence of what works and what doesn’t, and avoiding what they call the “three Is:” ideology, ignorance and inertia.

As previously mentioned, there is much more to this book and I hope to come back to it in future posts. But for this post, I want to register a few thoughts on the poverty trap concept and its application to foreign aid.

First, if poverty traps are looked at as circumstances that certain individuals face and that impede them from overcoming poverty, they could exist in both rich and poor countries. If so, the difference in the presence of poverty between rich and poor countries would need to be explained by: a) relatively more people facing poverty traps in poor countries than in rich countries; or b) better standards of living for the poor facing poverty traps in rich countries than those in poor countries, due to other factors such as government assistance; or c) both. The question is then whether foreign assistance should target helping people overcome poverty traps or helping reduce the existence of poverty traps in the first place. The book’s last chapter, on policies and politics, can be interpreted as arguing that there is space for direct assistance to the poor, and that these changes can, overtime, affect the prevalence of poverty traps (by affecting political and economic institutions from the bottom up).

Second, if we study a poor country and come to the conclusion that there are few poverty traps and that the poor are poor because of their own choices, Duflo and Banerjee would seem to suggest that there is little that foreign assistance can or should do and that most economists (from Easterly to Sachs) would likely agree: in this scenario, the poor were given a choice and they chose to be poor, it is their right, we have no place in imposing anything different. Is this really the case? What about the children that are being affected by the choices of their parents? What about inter-temporal inconsistencies in preferences: would the current behavioral science literature suggest there would still be a value in assisting the poor to address those inconsistencies? Are there externalities where the choice of someone to be poor affects the well-being of the rest of society, or would attempting to affect those choices just be stepping on individual liberties? I have not given any thought to these questions but leave them here for rhetorical purposes, potentially to address them in the future.

Third, having grown in poor countries with large inequalities of income (Brazil and Paraguay), I always thought that exposure to the consumption habits of the rich negatively affected the choices made by the poor. The greater the income inequality in a country, the more it would seem the poor are routinely tempted to make consumption choices they cannot afford. Similarly, those living in poor countries are routinely exposed to the living standards of rich countries through movies, television, commercials, social media, tourism, trade; with potentially a similar effect on the consumption and investment choices made by those living in poor countries. It would seem this would generate an added deterrent to poor country saving and investment choices, relative to those faced by countries who grew at the forefront of technological development. Is there anything rich countries can or should do to help poor countries deal with this effect that they may not have suffered themselves? Again, the question is rhetorical. I may give it some thought in the future.

  1. The debate between Easterly and Sachs has served as a reference for the discussion of the value of foreign aid for over a decade. I actually had the opportunity to provide both readings to students of mine in a course I gave in 2007 on Foreign Aid at the Catholic University of Rio de Janeiro, shortly after the books came out.

References

Duflo, Esther and Banerjee, Abhijit V. 2011. Poor Economics. A Radical Rethinking of the Way to Fight Global Poverty. New York: Public Affairs.

Continue ReadingPoverty Traps and Foreign Aid

Our Brains and Decision Making

  • Post author:
  • Post category:Fan
Image by Gordon Johnson. Downloaded from pixabay.com

I recently read David Eagleman’s book “The Brain. A Story of You.” I also watched the associated PBS documentary. The book and documentary follow each other very closely and the documentary allows you to see some of the people and experiments referred to in the book. A couple of points made by David Eagleman made me think again about the common use of the terms “data-driven decision making” and “data-informed decision making,” the extent to which our decisions are made based on evidence presented to us, and what evidence exactly do we make decisions based on. 

I will leave a more detailed review of the use of the terms “data-driven,” “data-informed” or “evidence-based,” for another post. But these terms are typically used without a recognition of how much we impose on our analysis of data our prior beliefs and assumptions. From the moment we ask ourselves a question, we are choosing what interests us. When we decide what data we need to look at, we are making assumptions about what data matters, based on our experience, reasoning and assumed knowledge of the world. When we actually obtain data, our analysis is now constrained by data availability and what they represent: how the data were defined and collected. The recognition of this dependency on prior beliefs and “learned” experiences should make the use of the term “data-driven” highly problematic. But it should also make us question what exactly we mean by “data-informed” or “evidence-based” decision making. What evidence exactly are we talking about and how exactly are we using it?

With this in mind, I found a couple of points made by Eagleman to be illuminating.

One of the points is that decisions often (perhaps most often) require connecting the analytical parts/networks of the brain to the emotional ones. Without the connections to our emotions, we are often unable to make decisions. The book provides a couple of examples, such as a woman who, due to a motorcycle accident, had these brain connections weakened and found herself unable to make daily decisions, such as what to wear, eat or to do during the day. Another example was an experiment where decisions were reverted when emotional factors were brought into play even though the choices were, analytically speaking, unaltered. The insight is that choices often involve many factors offering trade-offs and that our logical brain cannot often assign values to those trade-offs to make a decision. The values are assigned based on bodily/emotional signatures built from past experience. Without those, decisions are often not possible. These signals are often embodied in the release or suppression of hormones affecting transmission of stimuli between neurons. Hormones such as dopamine or oxytocin. As we acquire new experiences, the stimuli that these neurotransmitters produce in our brains are often adjusted based on the confirmation or frustration of past experiences (differences between expectations and reality). That is how we learn.

Another point made in Eagleman’s book is that our brains are primed for social interaction. We are wired to see social intention where it does not exist. This is exemplified by an experiment where a short film of geometrical objects moving around a screen tends to induce subjects to interpret the movements as if telling a story, where the objects would move intentionally as if they represented humans or animals. Further, in the same way as we tend to humanize objects, we also sometimes dehumanize other people, presumably when seeing them as humans creates a burden we consider too much to bear (e.g. experiments show this often happens when we are faced with the homeless).

The first point means that, we typically will not make decisions based on data or evidence presented to us alone. No matter how much we may want to make data and evidence based decisions, when factoring options we will likely bring to bear, consciously or unconsciously, our lifetime experiences, transmitted to our brain through chemical stimuli.

The second point means that, in interpreting events, occurrences, phenomena of all kinds, from social phenomena to purely physical ones, we tend to attribute intentionality to those events, we tend to attribute human characteristics to phenomena that may not have it or not be able to be reduced to such. Eagleman sees in this tendency, evidence of the importance of human interactions for our brains and for who we are. But it can also be seen as a potential factor in our tendency to see organizations, firms, governments behaving as if they were individual decision-making units rather than composed of people themselves. Perhaps this attribution of human intentionality to anything but a person could help explain conspiratory theories, where large networks are assumed to work in unison towards a common goal; and perhaps it could help explain situations where we see cause and effect where there is none, simply because we attributed agency to entities that do not have it.

Both points made by Eagleman, the role of emotions in decision-making and our tendency to attribute human intentionality to entities other than a person, should make us question the extent to which our minds are pre-conditioned to make decisions largely based on factors beyond the data and evidence put in front of us on any given decision-making occasion. They should make us think of ways to build into decision-making processes awareness of how our brain works and the possible implications for the decisions we end up making.

References

Eagleman, David. 2017. The Brain. The Story of You. Vintage Books.

Continue ReadingOur Brains and Decision Making

Illusions and Delusions

I recently read a bit about the Buddhist concept of “pratitya samutpada,” translated literally or liberally as “in dependence, things rise up,” “interdependent co-arising,” or simply “dependent rising” (Hahn 1998; Namgyel 2018). There seem to be two main aspects of the concept. The first is that what we perceive as separate entities are only so at a superficial level. In truth, they are part of a whole and, as part of that whole, they are connected, mutually affect each other, rather than one entity being the cause of the other or being independent of the other. The second aspect of the concept of pratitya samutpada is that those entities that we perceive as separate are constantly changing, morphing into other and new aspects of the whole. The consequence of this concept is that, if we focus on the separate entities that we perceive, we can fall into a kind of delusion, where we do not see the dynamic interdependence that governs the entities we perceive.

This concept seems to have similarities with other concepts in Asian philosophy such as that of yin and yang, where opposites are part of a whole, but also with common ideas in western science and philosophy: from Lavoisier’s formulation that in nature “nothing is lost, nothing is created, everything is transformed,” to Hegel’s dialectics and the concept of “aufheben,” often translated as “self-sublation,” a process that simultaneously negates and preserves forms or concepts that previously seemed well defined and stable (Maybee 2020; Wikipedia Contributors 2021)¹. How often and to what extent does this idea of a dynamic, interdependent world lead to conclusions about our capacity to see through the temporary, perhaps time and space specific formations, and grasp the whole of what is actually going on? How often are we “deluded” into thinking that the temporary and time-specific reality that we perceive is more permanent than it is or that it is all there is? How often does it matter?

The dictionary distinguishes between the terms illusion and delusion in subtle ways. Merriam-Websters definitions:

Illusion:

    1. something that looks or seems different from what it is : something that is false or not real but that seems to be true or real
    2.  an incorrect idea : an idea that is based on something that is not true

[Merriam-Webster. Undated (a)]

Delusion:

    1. a belief that is not true : a false idea
    2. a false idea or belief that is caused by mental illness

[Merriam-Webster. Undated (b)]

The definitions above seem to suggest illusion happens in the realm of perception and ideas; delusion is closer to beliefs and mental illness. One of my Buddist references for this post distinguishes between illusion and delusion by stating that “illusion refers to seeing through appearances by recognizing their independent nature. Delusion, on the other hand, refers to misapprehending things to have an independent reality from  their own side” (Namygel 2018, p. 25). In other words, illusions do not necessarily fool you into beliefs, delusions do.

Joni Mitchel’s beautifully mesmerizing song “Both sides now” uses the term illusion similarly, in the sense that the composer is aware that her recollections are illusions, whether they be about clouds, love or life, and concludes that she knows nothing about them at all. E.g.:

I've looked at life from both sides now
From win and lose and still somehow
It's life's illusions I recall
I really don't know life at all

The distinction between illusion and delusion brings to mind (for me, at least) the challenge of translating social science modeling into public policy without losing sight of model limitations. 

Social science models are often able to represent mathematically the two aspects of “pratitya samutpada:” interdependence and dynamics. But, as with all models, simplifications are needed for tractability and the consequences of the model will depend on those simplifications made, assumptions about what variables are more or less important, functional relationships, bounding of magnitudes, temporal lags. These assumptions can be informed or rejected by empirical work, to some extent. What exactly is that extent, how much certainty academics attribute to their models is, based on my humble experience, influenced early on by human flaws. Whether it is an overemphasis on quickly thinking within the confines of established methodological approaches that leads to a poor understanding of the limitations of those approaches themselves, or whether it is the difficulty of living with uncertainty, or perhaps just plain vanity, it is my impression that academics themselves often lose sight of the limitations of their models and fall into the temptation of making grand but unsupported statements about the world they live in.

When the next step is taken (whether by academics themselves. policy makers, or by mere practitioners like me) to translate conclusions of limited validity to policy that needs to be developed for a specific time and space, it seems like the assumptions, limitations and caveats of academic discourse are further forgotten. Before we know it, the illusion of general principles, guidelines, best practices and rules of thumb, that we would hope to be well understood as the illusions they are, morph into the delusion of ideological constructs, over-simplified, over-generalized, distorted by the influence of a kaleidoscope of interest groups, and imbued by a certainty they do not merit. 

In a world of unmerited certainty, Joni Mitchell’s illusions, the awareness of them, seems something to strive for, to appreciate in its melancholic beauty, and to sing in a song.

Footnote:

  1. Antoine Lavoisier, French chemist, and Georg Wilhelm Friedrich Hegel, German philosopher, were contemporaries during the late 18th century.

References

Hanh, Thich Nhat. 1998. The Heart of Buddha’s Teaching: Transforming Suffering into Peach, Joy, and Liberation. Harmony Books.

Maybee, Julie E., Hegel’s Dialectics. In: The Stanford Encyclopedia of Philosophy (Winter 2020 Edition), Edward N. Zalta (ed.). Available: https://plato.stanford.edu/entries/hegel-dialectics/. Accessed: February 13, 2022

Merriam-Webster. Undated (a). Illusion. In Merriam-Webster.com dictionary. Available: https://www.merriam-webster.com/dictionary/illusion. Accessed: February 13, 2022

Merriam-Webster. Undated (b). Delusion. In Merriam-Webster.com dictionary. Available: https://www.merriam-webster.com/dictionary/delusion. Accessed: February 13, 2022

Namgyel, Elizabeth Mattis. 2018. The Logic of Faith: A Buddhist Approach to Finding Certainty Beyond Belief and Doubt. Shambhala Publications. 

Wikipedia contributors. 2021. Aufheben. In Wikipedia, The Free Encyclopedia. Available: https://en.wikipedia.org/w/index.php?title=Aufheben&oldid=1050479001. Accessed: February 13, 2022

Continue ReadingIllusions and Delusions

On Data and Evidence in the Social Sciences

  • Post author:
  • Post category:Fan
Photo by Timothy Grindall. Downloaded from pexels.com

On a recent trip to the local public library I happened to find a copy of a book of collected works by Bertrand Russel that I used to own but that, as many other books, had been a victim of my international moves. I always admired Bertrand Russel’s clear, simple and straightforward way of discussing not so simple topics without distorting them (at least not in ways that were obvious to me). The library was selling the book as part of a used book sale and I bought it, together with a copy of Bertrand Russel’s “The Scientific Outlook.”

In reading this latter book, I found myself sucked into a web of interrelated methodological discussions; some old ones (e.g. how scientific the social sciences are or can be) and some newer ones (e.g. whether the huge amount and speed of data availability, and easy access to it, brought by information technology, has challenged traditional scientific methodology and put correlation – with no theory – front and center on the research agenda). I remember delving into economic methodological discussions some thirty years ago as an economics student but have distanced myself from economic theory since.  After getting lost in the rabbit hole, I decided I did not have enough time to dig deep enough into these discussions, but thought I would register what I found, perhaps for continuing/revisiting at a later date.

So here goes.

Both Mlodinow (2009) and Russel (1962) place the origins of the scientific revolution in the late sixteenth and early seventeenth century, pretty much on the shoulders of Galileo Galilei (1564-1642), his contemporaries and those coming soon after him (e.g. Isaac Newton – 1643-1727). They also both characterize the scientific revolution as being centered on induction and experimentation, as opposed to deduction, as a source of knowledge. Both deduction (theory) and induction (evidence) have a role in science and Russel describes the scientific method as including three stages:

  • Observing significant facts

  • Arriving at a hypothesis, which, if true, would account for these facts

  • Deducing from this hypothesis consequences which can be tested by observation (some, quoting Karl Popper, would say “refuted” by evidence)

This characterization of the scientific method (and its variations) seems to have been criticized over time as not adequately portraying how science evolves. The idea that science progresses by refuting hypotheses empirically, for example, seems to have been criticized repeatedly over time. A recent opinion article in Scientific American (Singham 2020) claims that it must be abandoned for good for at least two reasons: first, because empirical experiments are framed by many theories themselves making its results more reflective of comparisons between theories than between theory and evidence; second, because this is not really how science has advanced historically. Rather, the author claims, “It is the single-minded focus on finding what works that gives science its strength, not any philosophy.“ Similar arguments have been made by various philosophers of science, including Thomas Kuhn (Wikipedia 2021).

The use of empirical evidence may vary from one branch of science or research program to another. I particularly looked for discussions among economists, because that is an area I have more of a background in and because of its relevance to international development. In a well known paper, Larry Summers (1991) argues that elaborated statistical tests aimed at estimating model parameters have had little consequence to advance economic thinking, that most papers remembered as having advanced economic theory have little empirical content at all, and that successful empirical research in economics have relied mostly on attempts to gauge strength of association and on persuasiveness. He criticizes models that have been overspecified to enable testing under the argument that results tend be of little worth and, comparing economics to natural sciences he states that “The image of an economic theorist nervously awaiting the result of a decisive econometric test does not ring true.”

In general, the Popperian criteria of falsifiability through testing seems to be simultaneously nominally accepted and yet in practice not met in economics, with theory moving forward anyway, based on the use of empirical evidence to support argumentation. Hausman (2018) summarizes the challenges of application of Popperian criteria to economics (presumably applicable to the social sciences more generally) and how several authors have abandoned completely the criteria to argue that economics (and, again, presumably the social sciences more generally) advances by using a more comprehensive blend of theory and empirical evidence. Durlauf (2012) states that “while some empirical economics involves the full delineation of an economic environment, so that empirical analysis is conducted through the prism of a fully specified general equilibrium model, other forms of empirical work use economic theory in order to guide, as opposed to determine, statistical model specification. Further, a distinct body of empirical economics is explicitly atheoretical, employing so-called natural experiments to evaluate economic propositions and to measure objects such as incentives. 

A more recent discussion on the use of evidence to advance our knowledge of society gained traction with the rapid growth of “big data. ” Some claimed that data in large volume would make the scientific method obsolete and the correlation would suffice to advance our knowledge, even if these claims may come from outside the academic community. For example, an article in WIRED magazine, written by its Editor in Chief, claimed that “Petabytes allow us to say: ‘correlation is enough.’ We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot(Anderson 2000). These kinds of claims have been countered by others (e.g. Mazzochi 2015) and it is hard for me to imagine how computerized analysis of data would not be imbued with human theorizing, no matter how much one tries to step aside and let “data speak.” In addition, for my purposes, much of the data in international development is not “big data.” Even if it were, it is not clear to me how we would separate the many variables that in international development tend to move together (in the same or opposite direction) with just correlation as a criteria (and no theory). 

There is a large literature to review on this topic and I have not even looked at the random control trial based research that gave Esther Duflo, Abhijit Banerjee and Michael Kremer the 2019 Nobel prize in economics, and what that line of research means for the discussion above. But I am thinking (for now) that there may not be a clear rule in the use of evidence and theory for discussing international development knowledge, and I am satisfied (for now) with looking for the reasonable use of theory, evidence, skepticism and caution in thinking of development policy and practice. I am sure I will come back to this discussion at a later date.

References:

Anderson, Chris. 2000. The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. WIRED. June. Available: https://www.wired.com/2008/06/pb-theory/. Accessed: October 30, 2021.

Boland, Lawrence. 2006. Seven decades of economic methodology: a Popperian perspective. In: Karl Popper: a Centenary Assessment: Science, I. Jarvie, K. Milford and D. Miller (Eds), 2006, 219–27. Available: http://www.sfu.ca/~boland/wien02.pdf. Accessed: October 30, 2021

Durlauf, Steven. 2012. Complexity, Economics, and Public Policy. Politics, Philosophy & Economics 11(1) 45–75. Sage. Available: http://home.uchicago.edu/sdurlauf/includes/pdf/Durlauf%20-%20Complexity%20Economics%20and%20Public%20Policy.pdf. Accessed: October 30, 2021.

Hausman, Daniel. 2018. Philosophy of Economics. Stanford Encyclopedia of Philosophy. Available: https://plato.stanford.edu/entries/economics/#RhetEcon. Accessed: October 30, 2021

Mazzocchi, Fulvio. 2015. Could Big Data be the end of theory in science? A few remarks on the epistemology of data-driven science. EMBO reports. EMBO Press. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4766450/pdf/EMBR-16-1250.pdf. Accessed: October 30, 2021. 

Mlodinow, Leonard. 2009. The Drunkard’s Walk. How Randomness Rules Our Lives. New York: Vintage Books. A Division of Random House.

Russel, Bertrand. 1962 (first copyrighted in 1931). The Scientific Outlook. The Norton Library. W.W. Norton & Company.

Singham, Manu. 2020. The Idea That a Scientific Theory Can Be ‘Falsified’ Is a Myth. It’s time we abandoned the notion. In Scientific American, September 2020. Available: https://www.scientificamerican.com/article/the-idea-that-a-scientific-theory-can-be-falsified-is-a-myth/. Accessed: October 30, 2021

Summers, Larry. 1991. The Scientific Illusion in Empirical Macroeconomics. The Scandinavian Journal of Economics. Vol. 93, No. 2, Proceedings of a Conference on New Approaches to Empirical Macroeconomics (Jun., 1991), pp. 129-148 (20 pages). Wiley. Available: http://faculty.econ.ucdavis.edu/faculty/kdsalyer/LECTURES/Ecn200e/summers_illusion.pdf. Accessed: October 30, 2021.

Wikipedia contributors. 2021. Scientific Method. Available: https://en.wikipedia.org/wiki/Scientific_method. Accessed: October 30, 2021

Continue ReadingOn Data and Evidence in the Social Sciences

Descriptive and Inferential Statistics

Image by Michael Siebert. Downloaded from pixabay.com

Since I posted “Challenges in Exploratory Data Analysis” (February 1, 2021), I found myself struggling with the distinction between Exploratory Data Analysis and Confirmatory Data Analysis on one hand, and the distinction between Descriptive Statistics and Inferential Statistics on the other. The former distinction is relevant to what you can say with any one set of data and what you can say with more than one data set; while the latter distinction comes into play when deciding whether our interest lies in the sample at hand or on the process generating the sample we have (the population). 

Clarifying these distinctions is more than an academic exercise: doing so, and understanding how the terms are used, help us understand what we can say with the data and what we cannot, what assumptions we are making when inferring from the data and at what point in our analysis we are making those assumptions. It helps develop our own guidelines for disciplining our thought process when thinking with data.

According to Wikipedia (Wikipedia contributors 2021a), Exploratory Data Analysis was promoted by US mathematician John Tukey in the 60s and 70s, as a way of unearthing hypotheses to be tested with data before jumping onto testing hypotheses based on assumptions made. It was to be in contract with Confirmatory Data Analysis (hypothesis testing). It was a way of exploring what information was contained in the data, independent of any already existing hypotheses about the relevant subject matter. It included a myriad of techniques such as looking at variable maximums, minimums, means, medians and quartiles, but was characterized more by the attitude than the techniques. A number of techniques applied in exploratory data analysis can be applied whether our focus is on the sample at hand (descriptive statistics) or the underlying generating process (inferential statistics). In thinking about these concepts, I produced the diagram below, that is useful to me, may be useful to others as well (I used mostly my accumulated knowledge at this point, but suggest readers start with Wikipedia entries for Descriptive Statistics [2021b] and Statistical Inference [2021c] for further reading).

Source: author's take

Although exploratory data analysis techniques can be applied whether our focus is on the sample at hand or the underlying generating process, how things are done in each case may be different. In the table below I tried to establish some distinctions on how we would proceed with exploratory data analysis in descriptive and inferential statistics.

Source: author's take

 In either case, during exploratory data analysis, we do not talk about significance of correlation, causality or hypothesis testing. These require modeling and a second sample drawn from the same population (or treatment and control groups).

A final note on the terms used by Cassie Kozyrkov in her popular blogs and vlogs (Kozyrkov 2018; 2019a; 2019b; 2020).  She refers to data analytics as being used when there is no uncertainty (what I refer to as descriptive statistics) and refers generally to statistics when there is uncertainty (what I refer to inferential statistics). She also refers to data analytics as being for inspiration (what I refer here as exploratory data analysis), as opposed to hypothesis testing, that would require another sample. From what I can tell from the literature, these are less traditional uses of the terms and I find the traditional uses (what I believe I capture here) seem to better highlight the difference between analyzing sample and population data. 

References

Kozyrkov, Cassie. 2018. “Don’t Waste Your Time on Statistics.” Towards Data Science. May 29. Available: https://towardsdatascience.com/whats-the-point-of-statistics-8163635da56c. Accessed: May 23, 2021.

———-. 2019a. “Statistics for People in a Hurry.” Towards Data Science, May 29. Available: https://towardsdatascience.com/statistics-for-people-in-a-hurry-a9613c0ed0b. Accessed. May 23, 2021.

———-. 2019b.  “The Most Powerful Idea in Data Science.” Towards Data Science. August 09. Available: https://towardsdatascience.com/the-most-powerful-idea-in-data-science-78b9cd451e72. Accessed: May 23, 2021

———- 2020. “How to Spot a Data Charlatan.” Towards Data Science. October 09. Available: https://towardsdatascience.com/how-to-spot-a-data-charlatan-85785c991433. Accessed: May 23, 2020. 

Wikipedia contributors, 2021a.”Exploratory data analysis.”  In Wikipedia, The Free Encyclopedia. Available: https://en.wikipedia.org/w/index.php?title=Exploratory_data_analysis&oldid=1021890236. Accessed May 15, 2021.

———-. 2021b. “Descriptive Statistics.” In Wikipedia; The Free Encyclopedia. Available: https://en.wikipedia.org/wiki/Descriptive_statistics. Accessed May 23, 2021.

———-. 2021c. “Statistical Inference.” In Wikipedia; The Free Encyclopedia. Available: https://en.wikipedia.org/wiki/Statistical_inference. Accessed May 23, 2021.

Continue ReadingDescriptive and Inferential Statistics

Statistical Thinking

Image by Matthias Groeneveld. Downloaded from pexels.com

In my social media feed a couple of weeks ago, someone posted an image of a television news piece (from Detroit’s CW50) and a short paragraph under the headline “Former Detroit TV Anchor Dies One Day After Taking COVID Vaccine.” There was no actual link to a site, and the headline, image and paragraph seemed to not have been put together by the original source of the news. But the suggestion was clear: the COVID vaccine could be the cause of death. The paragraph referred to Karen Hudson-Samuels, a Detroit news producer and anchor who, indeed, seems to have died a day after taking the vaccine. Some articles on the internet quoted her husband as saying the immediate cause may have been a stroke with no clear relation to the vaccine (e.g. Nour Rahal 2021).

An off the cuff calculation would show that, given the number of people in the US who die every day and the number or COVID vaccinations taking place every day, particularly for the population over 65 (Karen Hudson-Samuels was 68), chances are that there will be people dying the day after taking a COVID vaccine, for completely unrelated reasons. For example, as of Feb 27 there were approximately 12 million 65-74 year olds with at least 1 dose of the vaccine (CDC 2021). If there are at least 31 million 65-74 year olds in the country (there were more in 2019 according to the Census Bureau, USCB 2020) and if there were 75 days of vaccination between December 14 (day of first year of vaccination in the US) and February 27, the chances of a 65-74 year old receiving the first dose of the vaccine on any given day during that period was approximately 0.5% (12 million divided by 75 days out of 31 million). The likelihood actually gets greater towards the end of the period, when Karen Hudson-Samuels received it, because the number of 65-74 year olds who have not yet received their first dose decreases (the pool left to receive the dose keep getting smaller), assuming vaccination continues at the same pace. According to the CDC, the mortality rate of 65-74 year olds in the US in 2019 was approximately 1.7% (Kochanek et al, 220). Divided by 365 days, that means approximately 0.0047% of 65-74 year olds died on any given day for causes unrelated to COVID (COVID was not present at the time). If the same holds true in 2021, the likelihood that a 64-75 year old died the day after receiving the first doses of the COVID vaccine is 0.5% (chance of receiving a vaccine on any given day) x 0.0047% (chance of dying of an unrelated cause on any given day). That equals 0.000024%, or approximately 1 in 4 million. That seems like a very low chance. But if there are over 30 million 65-74 year olds in the US, that likely happened in over 7 cases between December 14 and February 27. And there were likely 7 more who died the same day of the vaccine, and 7 more two days after and so on.

The numbers above may be a bit off and the calculations assume receiving a COVID vaccine and dying of a non-related cause are independent events. This may not be the case if, for example, someone with a life threatening underlying health condition would be more likely to get a vaccine. In this case the chances of observing someone dying the day after receiving a vaccine would be even larger. On the other hand, if the fact that someone is visibly about to die makes them less likely to receive a vaccine, the chances would be smaller. But the general point should remain valid, even if the actual numbers are a bit different under varying circumstances: chances are there are several cases like that of Karen Hudson-Samuels that can be explained by pure chance, with no relationship to the COVID vaccine at all. Yet, all you need is 1 case to make the news and to persuade some people that the vaccine is likely the cause.

The post seems a clear example of the point that Leonard Mlodinow is making in the book I am currently reading, The Drunkard’s Walk: How Randomness Rules Our Lives (Mlodinow 2009). In this book he argues that random processes are all around us and play an important role in daily events. Yet, we seem to be ill equipped to recognize them and we often don’t. I am about half way through the book and so far have found most interesting his account of the development of statistical concepts and understanding over time. It is also a very pleasant read. I’ll be referring to this book again in future posts.

References

CDC (Centers for Disease Control and Prevention). 2021.  Demographic Characteristics of People Receiving COVID-19 Vaccinations in the United States. Available: https://covid.cdc.gov/covid-data-tracker/#vaccination-demographic. Accessed: 02/28/2021

Kochanek, Kenneth D., M.A., Jiaquan Xu, M.D., and Elizabeth Arias, Ph.D. 2020. Mortality in the United States, 2019. National Center for Health Statistics (NCHS) Data Brief 395, December. CDC (Centers for Disease Control and Prevention). Available: https://www.cdc.gov/nchs/data/databriefs/db395-H.pdf. Accessed: 02/28/2021

Mlodinow, Leonard. 2009. The Drunkard’s Walk. How Randomness Rules Our Lives. New York: Vintage Books. A Division of Random House.

Nour Rahal. 2021. Karen Hudson-Samuels remembered as Black TV news pioneer and Detroit history promoter. Detroit Free Press, 02/20/2021. Available: https://www.freep.com/story/news/obituary/2021/02/20/karen-hudson-samuels-black-tv-news-pioneer/6784796002/ . Accessed: 02/28/2021.

USCB (United States Census Bureau). 2020. Annual Estimates of the Resident Population by Single Year of Age and Sex for the United States: April 1, 2010 to July 1, 2019 (NC-EST2019-AGESEX-RES). Available: https://www.census.gov/data/tables/time-series/demo/popest/2010s-national-detail.html. Accessed: 02/28/2021

Continue ReadingStatistical Thinking

A Thought After Reading Daniel Kahneman’s “Thinking, Fast and Slow”

  • Post author:
  • Post category:Fan
Image by Maryam62. Downloaded from pixabay.com

I can tell that Daniel Kahneman’s book, Thinking, Fast and Slow, is one of those books that I will find myself coming back to over and over again. Why? On one hand, it provides evidence from experimental psychology for phenomena that I was already inclined to believe, that is, it appeals to my own confirmation bias. At the same time, it offers a number of insights and explanations that are new to me and eye opening. Kahneman, a psychologist, won a Nobel Prize in Economics for his work on decision-making under uncertainty. His book reflects a lifetime of learning about human judgement and is an absolute delight to read.

It’s central framework seems to revolve around the idea that our minds address problems in two distinct ways, which he describes using the metaphors of System 1 and System 2. The table below summarizes some of the characteristics of each system.

System 1System 2
FastSlow
IntuitiveDeliberate
Automatic, cannot be turned off at willEffortful and/but/therefore lazy
ImpulsiveRequires self-control
Associative and causal, but does not have the capacity for statistical thinkingRequires training for statistical thinking
Prone to bias and to believingIn charge of doubting and unbelieving

Much of Kahneman’s book is then focused on System 1 and its characteristics and biases. We learn of its need for coherence and the central role of cognitive ease in driving belief. System 2 is much less explored in the book and I was left with the desire to learn more about what can be done to strengthen our use of System 2.

Why would we want to strengthen our reliance on System 2? It is not obvious that doing so would necessarily lead to better social outcomes, if we are willing to rely on expertise.System 1’s intuitive thought draws both on heuristics (shortcuts that replace a more complex problem with a simpler one) and on expertise (recognition of information previously acquired). Becoming an expert means you can draw on your acquired information to reach better conclusions with less effort. In other words, the more we rely on experts, the more likely we are to avoid incorrect conclusions in a world governed by System 1 thinkers.

However, we are unlikely (and it is probably impossible) to seek expertise in the myriad of problems and decisions we encounter on a daily basis. And as Kahneman points out in a reference to the work of another psychologist, Daniel Gilbert, System 1 is prone to believe (Kahneman, 2011, Chapter 7). “Unbelieving” is the effortful task of System 2. If we want to form opinions and draw conclusions about much of the information we encounter on any given day, we will need to spend energy, potentially a lot of energy: we will need to be willing to lead an effortful, even if potentially fulfilling life. Those less susceptible to the System 1 biases and more prone to calling on System 2 Kahneman calls “engaged.”

Should we strive to be more “engaged?” If so, how?

References

Kahneman, Daniel, 2011. Thinking, Fast and Slow. Farrar, Straus and Giroux.

Continue ReadingA Thought After Reading Daniel Kahneman’s “Thinking, Fast and Slow”

Challenges in Exploratory Data Analysis

  • Post author:
  • Post category:Fan
Image by bluebudgie. Downloaded from pixabay.com

You are given a dataset and asked: “what do the data tell us? Do not assume we know anything about the subject, just tell us what the data say?” This is often the task referred to as “exploratory data analysis,” and it is harder than might seem. I see two main challenges.

The first is the request to “not assume we know anything about the subject.” This request is easy to forget without realizing. For example, say you have a dataset with twenty variables. It is perfectly fine during exploratory analysis to want to look at, not just individual variables in your dataset, but also how variables fluctuate relative to each other, that is, correlation. Now, how easy is it to look at correlations within the dataset with no prior inclination to think some of the twenty variables will be more likely correlated than others? We can fight the urge to pay more attention to those by always including all twenty variables in any and all considerations about correlation, but this requires discipline. One could even argue that we should, indeed, spend more time exploring correlations that we have a basis to believe have a causal connection, and that focusing equally on other correlations are a waste of time and possibly misleading. In any case, how to explore data given the mental models we all approach them with is a potential issue to be dealt with. I will likely return to this in a future post.

The second challenge I see in exploratory data analysis is identifying, and keeping in mind at all times, the sources of uncertainty in our data. The sources of uncertainty are several: from what we don’t know about how the variables were chosen and the data were collected, cleaned, stored and checked, to whether we are, consciously or not, asking questions, not about the dataset itself, but about the underlying generating process, that is, about a population of which we can consider the dataset to be a sample.

This last point I find is often overlooked. In some cases, we know that we are looking at a sample and asking questions about a population. For example survey data is often clearly extracted from a broader population in which we are interested. This is the classic use of inferential statistics that we all learn about in college – although, even in this case, we often see analyses focusing on point estimates rather than the more appropriate confidence intervals. But there are cases where we lose track of the sources of uncertainty in our data (or sources of uncertainty in our analysis) and must maintain discipline to correctly assess what our analysis is actually telling us.

For example, say we have data for five characteristics (five variables) for every inhabitant of a community. We are only interested in that community, so we understand we have “population” data (not a sample). In looking at correlation between our five variables, we decide to look at linear correlation among them through a linear regression. Our statistical software spits out a summary of results from our linear regression that includes coefficients and p-values for those coefficients. But p-values assume a distribution for the observed coefficients. If there is a distribution, there is a source of uncertainty (a random variable). Where did that uncertainty come from? Aren’t we looking at population data and, therefore, what we see is all there is to know?

My answer is that the uncertainty stems from assuming there is a linear relationship with variables when what we observe does not perfectly fit that linear relationship. There is, therefore, an “error” term associated with each observation relative to the calculated linearly predicted relationship. The whole linear regression exercise is asking questions about an assumed underlying generating process in the data, not about the observed data itself. We started making assumptions about the data and asking questions about an underlying process, very possibly without noticing.

So here are my tentative initial guidelines for doing exploratory data analysis:

  1. Start by understanding the data: publishing source; when and where the data was collected and who collected it; what universe is it supposed to represent and was it intended as a sample of a larger population; definitions – are the variables well defined; what errors may have been inserted in the data during transmission, cleaning, storing or other manipulation.
  2. Go onto univariate analyses and then cover correlations, being mindful of any potential assumptions we are making and, if we feel we absolutely need to make these assumptions, be explicit about them, and keep them in mind when drawing conclusions.
  3. Keep in mind at all times whether our questions are focusing on the data at hand or on an underlying generating process, i.e., whether we are “going beyond the data.” Again, be explicit if doing so.
  4. Be aware that exploratory analysis is supposed to focus on extracting inspiration from our data. It is not sufficient to draw conclusions. These require a separate step:  testing hypotheses with a second set of data that can be assumed extracted through the same generating process (from the same population). We do not test hypotheses during exploratory data analysis, nor discuss causality and modelling, other than possibly as suggestions for the next step of hypothesis testing.
Continue ReadingChallenges in Exploratory Data Analysis

End of content

No more pages to load