Our Brains and Decision Making

  • Post author:
  • Post category:Fan
Image by Gordon Johnson. Downloaded from pixabay.com

I recently read David Eagleman’s book “The Brain. A Story of You.” I also watched the associated PBS documentary. The book and documentary follow each other very closely and the documentary allows you to see some of the people and experiments referred to in the book. A couple of points made by David Eagleman made me think again about the common use of the terms “data-driven decision making” and “data-informed decision making,” the extent to which our decisions are made based on evidence presented to us, and what evidence exactly do we make decisions based on. 

I will leave a more detailed review of the use of the terms “data-driven,” “data-informed” or “evidence-based,” for another post. But these terms are typically used without a recognition of how much we impose on our analysis of data our prior beliefs and assumptions. From the moment we ask ourselves a question, we are choosing what interests us. When we decide what data we need to look at, we are making assumptions about what data matters, based on our experience, reasoning and assumed knowledge of the world. When we actually obtain data, our analysis is now constrained by data availability and what they represent: how the data were defined and collected. The recognition of this dependency on prior beliefs and “learned” experiences should make the use of the term “data-driven” highly problematic. But it should also make us question what exactly we mean by “data-informed” or “evidence-based” decision making. What evidence exactly are we talking about and how exactly are we using it?

With this in mind, I found a couple of points made by Eagleman to be illuminating.

One of the points is that decisions often (perhaps most often) require connecting the analytical parts/networks of the brain to the emotional ones. Without the connections to our emotions, we are often unable to make decisions. The book provides a couple of examples, such as a woman who, due to a motorcycle accident, had these brain connections weakened and found herself unable to make daily decisions, such as what to wear, eat or to do during the day. Another example was an experiment where decisions were reverted when emotional factors were brought into play even though the choices were, analytically speaking, unaltered. The insight is that choices often involve many factors offering trade-offs and that our logical brain cannot often assign values to those trade-offs to make a decision. The values are assigned based on bodily/emotional signatures built from past experience. Without those, decisions are often not possible. These signals are often embodied in the release or suppression of hormones affecting transmission of stimuli between neurons. Hormones such as dopamine or oxytocin. As we acquire new experiences, the stimuli that these neurotransmitters produce in our brains are often adjusted based on the confirmation or frustration of past experiences (differences between expectations and reality). That is how we learn.

Another point made in Eagleman’s book is that our brains are primed for social interaction. We are wired to see social intention where it does not exist. This is exemplified by an experiment where a short film of geometrical objects moving around a screen tends to induce subjects to interpret the movements as if telling a story, where the objects would move intentionally as if they represented humans or animals. Further, in the same way as we tend to humanize objects, we also sometimes dehumanize other people, presumably when seeing them as humans creates a burden we consider too much to bear (e.g. experiments show this often happens when we are faced with the homeless).

The first point means that, we typically will not make decisions based on data or evidence presented to us alone. No matter how much we may want to make data and evidence based decisions, when factoring options we will likely bring to bear, consciously or unconsciously, our lifetime experiences, transmitted to our brain through chemical stimuli.

The second point means that, in interpreting events, occurrences, phenomena of all kinds, from social phenomena to purely physical ones, we tend to attribute intentionality to those events, we tend to attribute human characteristics to phenomena that may not have it or not be able to be reduced to such. Eagleman sees in this tendency, evidence of the importance of human interactions for our brains and for who we are. But it can also be seen as a potential factor in our tendency to see organizations, firms, governments behaving as if they were individual decision-making units rather than composed of people themselves. Perhaps this attribution of human intentionality to anything but a person could help explain conspiratory theories, where large networks are assumed to work in unison towards a common goal; and perhaps it could help explain situations where we see cause and effect where there is none, simply because we attributed agency to entities that do not have it.

Both points made by Eagleman, the role of emotions in decision-making and our tendency to attribute human intentionality to entities other than a person, should make us question the extent to which our minds are pre-conditioned to make decisions largely based on factors beyond the data and evidence put in front of us on any given decision-making occasion. They should make us think of ways to build into decision-making processes awareness of how our brain works and the possible implications for the decisions we end up making.

References

Eagleman, David. 2017. The Brain. The Story of You. Vintage Books.

Continue ReadingOur Brains and Decision Making

Illusions and Delusions

I recently read a bit about the Buddhist concept of “pratitya samutpada,” translated literally or liberally as “in dependence, things rise up,” “interdependent co-arising,” or simply “dependent rising” (Hahn 1998; Namgyel 2018). There seem to be two main aspects of the concept. The first is that what we perceive as separate entities are only so at a superficial level. In truth, they are part of a whole and, as part of that whole, they are connected, mutually affect each other, rather than one entity being the cause of the other or being independent of the other. The second aspect of the concept of pratitya samutpada is that those entities that we perceive as separate are constantly changing, morphing into other and new aspects of the whole. The consequence of this concept is that, if we focus on the separate entities that we perceive, we can fall into a kind of delusion, where we do not see the dynamic interdependence that governs the entities we perceive.

This concept seems to have similarities with other concepts in Asian philosophy such as that of yin and yang, where opposites are part of a whole, but also with common ideas in western science and philosophy: from Lavoisier’s formulation that in nature “nothing is lost, nothing is created, everything is transformed,” to Hegel’s dialectics and the concept of “aufheben,” often translated as “self-sublation,” a process that simultaneously negates and preserves forms or concepts that previously seemed well defined and stable (Maybee 2020; Wikipedia Contributors 2021)¹. How often and to what extent does this idea of a dynamic, interdependent world lead to conclusions about our capacity to see through the temporary, perhaps time and space specific formations, and grasp the whole of what is actually going on? How often are we “deluded” into thinking that the temporary and time-specific reality that we perceive is more permanent than it is or that it is all there is? How often does it matter?

The dictionary distinguishes between the terms illusion and delusion in subtle ways. Merriam-Websters definitions:

Illusion:

    1. something that looks or seems different from what it is : something that is false or not real but that seems to be true or real
    2.  an incorrect idea : an idea that is based on something that is not true

[Merriam-Webster. Undated (a)]

Delusion:

    1. a belief that is not true : a false idea
    2. a false idea or belief that is caused by mental illness

[Merriam-Webster. Undated (b)]

The definitions above seem to suggest illusion happens in the realm of perception and ideas; delusion is closer to beliefs and mental illness. One of my Buddist references for this post distinguishes between illusion and delusion by stating that “illusion refers to seeing through appearances by recognizing their independent nature. Delusion, on the other hand, refers to misapprehending things to have an independent reality from  their own side” (Namygel 2018, p. 25). In other words, illusions do not necessarily fool you into beliefs, delusions do.

Joni Mitchel’s beautifully mesmerizing song “Both sides now” uses the term illusion similarly, in the sense that the composer is aware that her recollections are illusions, whether they be about clouds, love or life, and concludes that she knows nothing about them at all. E.g.:

I've looked at life from both sides now
From win and lose and still somehow
It's life's illusions I recall
I really don't know life at all

The distinction between illusion and delusion brings to mind (for me, at least) the challenge of translating social science modeling into public policy without losing sight of model limitations. 

Social science models are often able to represent mathematically the two aspects of “pratitya samutpada:” interdependence and dynamics. But, as with all models, simplifications are needed for tractability and the consequences of the model will depend on those simplifications made, assumptions about what variables are more or less important, functional relationships, bounding of magnitudes, temporal lags. These assumptions can be informed or rejected by empirical work, to some extent. What exactly is that extent, how much certainty academics attribute to their models is, based on my humble experience, influenced early on by human flaws. Whether it is an overemphasis on quickly thinking within the confines of established methodological approaches that leads to a poor understanding of the limitations of those approaches themselves, or whether it is the difficulty of living with uncertainty, or perhaps just plain vanity, it is my impression that academics themselves often lose sight of the limitations of their models and fall into the temptation of making grand but unsupported statements about the world they live in.

When the next step is taken (whether by academics themselves. policy makers, or by mere practitioners like me) to translate conclusions of limited validity to policy that needs to be developed for a specific time and space, it seems like the assumptions, limitations and caveats of academic discourse are further forgotten. Before we know it, the illusion of general principles, guidelines, best practices and rules of thumb, that we would hope to be well understood as the illusions they are, morph into the delusion of ideological constructs, over-simplified, over-generalized, distorted by the influence of a kaleidoscope of interest groups, and imbued by a certainty they do not merit. 

In a world of unmerited certainty, Joni Mitchell’s illusions, the awareness of them, seems something to strive for, to appreciate in its melancholic beauty, and to sing in a song.

Footnote:

  1. Antoine Lavoisier, French chemist, and Georg Wilhelm Friedrich Hegel, German philosopher, were contemporaries during the late 18th century.

References

Hanh, Thich Nhat. 1998. The Heart of Buddha’s Teaching: Transforming Suffering into Peach, Joy, and Liberation. Harmony Books.

Maybee, Julie E., Hegel’s Dialectics. In: The Stanford Encyclopedia of Philosophy (Winter 2020 Edition), Edward N. Zalta (ed.). Available: https://plato.stanford.edu/entries/hegel-dialectics/. Accessed: February 13, 2022

Merriam-Webster. Undated (a). Illusion. In Merriam-Webster.com dictionary. Available: https://www.merriam-webster.com/dictionary/illusion. Accessed: February 13, 2022

Merriam-Webster. Undated (b). Delusion. In Merriam-Webster.com dictionary. Available: https://www.merriam-webster.com/dictionary/delusion. Accessed: February 13, 2022

Namgyel, Elizabeth Mattis. 2018. The Logic of Faith: A Buddhist Approach to Finding Certainty Beyond Belief and Doubt. Shambhala Publications. 

Wikipedia contributors. 2021. Aufheben. In Wikipedia, The Free Encyclopedia. Available: https://en.wikipedia.org/w/index.php?title=Aufheben&oldid=1050479001. Accessed: February 13, 2022

Continue ReadingIllusions and Delusions

On Data and Evidence in the Social Sciences

  • Post author:
  • Post category:Fan
Photo by Timothy Grindall. Downloaded from pexels.com

On a recent trip to the local public library I happened to find a copy of a book of collected works by Bertrand Russel that I used to own but that, as many other books, had been a victim of my international moves. I always admired Bertrand Russel’s clear, simple and straightforward way of discussing not so simple topics without distorting them (at least not in ways that were obvious to me). The library was selling the book as part of a used book sale and I bought it, together with a copy of Bertrand Russel’s “The Scientific Outlook.”

In reading this latter book, I found myself sucked into a web of interrelated methodological discussions; some old ones (e.g. how scientific the social sciences are or can be) and some newer ones (e.g. whether the huge amount and speed of data availability, and easy access to it, brought by information technology, has challenged traditional scientific methodology and put correlation – with no theory – front and center on the research agenda). I remember delving into economic methodological discussions some thirty years ago as an economics student but have distanced myself from economic theory since.  After getting lost in the rabbit hole, I decided I did not have enough time to dig deep enough into these discussions, but thought I would register what I found, perhaps for continuing/revisiting at a later date.

So here goes.

Both Mlodinow (2009) and Russel (1962) place the origins of the scientific revolution in the late sixteenth and early seventeenth century, pretty much on the shoulders of Galileo Galilei (1564-1642), his contemporaries and those coming soon after him (e.g. Isaac Newton – 1643-1727). They also both characterize the scientific revolution as being centered on induction and experimentation, as opposed to deduction, as a source of knowledge. Both deduction (theory) and induction (evidence) have a role in science and Russel describes the scientific method as including three stages:

  • Observing significant facts

  • Arriving at a hypothesis, which, if true, would account for these facts

  • Deducing from this hypothesis consequences which can be tested by observation (some, quoting Karl Popper, would say “refuted” by evidence)

This characterization of the scientific method (and its variations) seems to have been criticized over time as not adequately portraying how science evolves. The idea that science progresses by refuting hypotheses empirically, for example, seems to have been criticized repeatedly over time. A recent opinion article in Scientific American (Singham 2020) claims that it must be abandoned for good for at least two reasons: first, because empirical experiments are framed by many theories themselves making its results more reflective of comparisons between theories than between theory and evidence; second, because this is not really how science has advanced historically. Rather, the author claims, “It is the single-minded focus on finding what works that gives science its strength, not any philosophy.“ Similar arguments have been made by various philosophers of science, including Thomas Kuhn (Wikipedia 2021).

The use of empirical evidence may vary from one branch of science or research program to another. I particularly looked for discussions among economists, because that is an area I have more of a background in and because of its relevance to international development. In a well known paper, Larry Summers (1991) argues that elaborated statistical tests aimed at estimating model parameters have had little consequence to advance economic thinking, that most papers remembered as having advanced economic theory have little empirical content at all, and that successful empirical research in economics have relied mostly on attempts to gauge strength of association and on persuasiveness. He criticizes models that have been overspecified to enable testing under the argument that results tend be of little worth and, comparing economics to natural sciences he states that “The image of an economic theorist nervously awaiting the result of a decisive econometric test does not ring true.”

In general, the Popperian criteria of falsifiability through testing seems to be simultaneously nominally accepted and yet in practice not met in economics, with theory moving forward anyway, based on the use of empirical evidence to support argumentation. Hausman (2018) summarizes the challenges of application of Popperian criteria to economics (presumably applicable to the social sciences more generally) and how several authors have abandoned completely the criteria to argue that economics (and, again, presumably the social sciences more generally) advances by using a more comprehensive blend of theory and empirical evidence. Durlauf (2012) states that “while some empirical economics involves the full delineation of an economic environment, so that empirical analysis is conducted through the prism of a fully specified general equilibrium model, other forms of empirical work use economic theory in order to guide, as opposed to determine, statistical model specification. Further, a distinct body of empirical economics is explicitly atheoretical, employing so-called natural experiments to evaluate economic propositions and to measure objects such as incentives. 

A more recent discussion on the use of evidence to advance our knowledge of society gained traction with the rapid growth of “big data. ” Some claimed that data in large volume would make the scientific method obsolete and the correlation would suffice to advance our knowledge, even if these claims may come from outside the academic community. For example, an article in WIRED magazine, written by its Editor in Chief, claimed that “Petabytes allow us to say: ‘correlation is enough.’ We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot(Anderson 2000). These kinds of claims have been countered by others (e.g. Mazzochi 2015) and it is hard for me to imagine how computerized analysis of data would not be imbued with human theorizing, no matter how much one tries to step aside and let “data speak.” In addition, for my purposes, much of the data in international development is not “big data.” Even if it were, it is not clear to me how we would separate the many variables that in international development tend to move together (in the same or opposite direction) with just correlation as a criteria (and no theory). 

There is a large literature to review on this topic and I have not even looked at the random control trial based research that gave Esther Duflo, Abhijit Banerjee and Michael Kremer the 2019 Nobel prize in economics, and what that line of research means for the discussion above. But I am thinking (for now) that there may not be a clear rule in the use of evidence and theory for discussing international development knowledge, and I am satisfied (for now) with looking for the reasonable use of theory, evidence, skepticism and caution in thinking of development policy and practice. I am sure I will come back to this discussion at a later date.

References:

Anderson, Chris. 2000. The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. WIRED. June. Available: https://www.wired.com/2008/06/pb-theory/. Accessed: October 30, 2021.

Boland, Lawrence. 2006. Seven decades of economic methodology: a Popperian perspective. In: Karl Popper: a Centenary Assessment: Science, I. Jarvie, K. Milford and D. Miller (Eds), 2006, 219–27. Available: http://www.sfu.ca/~boland/wien02.pdf. Accessed: October 30, 2021

Durlauf, Steven. 2012. Complexity, Economics, and Public Policy. Politics, Philosophy & Economics 11(1) 45–75. Sage. Available: http://home.uchicago.edu/sdurlauf/includes/pdf/Durlauf%20-%20Complexity%20Economics%20and%20Public%20Policy.pdf. Accessed: October 30, 2021.

Hausman, Daniel. 2018. Philosophy of Economics. Stanford Encyclopedia of Philosophy. Available: https://plato.stanford.edu/entries/economics/#RhetEcon. Accessed: October 30, 2021

Mazzocchi, Fulvio. 2015. Could Big Data be the end of theory in science? A few remarks on the epistemology of data-driven science. EMBO reports. EMBO Press. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4766450/pdf/EMBR-16-1250.pdf. Accessed: October 30, 2021. 

Mlodinow, Leonard. 2009. The Drunkard’s Walk. How Randomness Rules Our Lives. New York: Vintage Books. A Division of Random House.

Russel, Bertrand. 1962 (first copyrighted in 1931). The Scientific Outlook. The Norton Library. W.W. Norton & Company.

Singham, Manu. 2020. The Idea That a Scientific Theory Can Be ‘Falsified’ Is a Myth. It’s time we abandoned the notion. In Scientific American, September 2020. Available: https://www.scientificamerican.com/article/the-idea-that-a-scientific-theory-can-be-falsified-is-a-myth/. Accessed: October 30, 2021

Summers, Larry. 1991. The Scientific Illusion in Empirical Macroeconomics. The Scandinavian Journal of Economics. Vol. 93, No. 2, Proceedings of a Conference on New Approaches to Empirical Macroeconomics (Jun., 1991), pp. 129-148 (20 pages). Wiley. Available: http://faculty.econ.ucdavis.edu/faculty/kdsalyer/LECTURES/Ecn200e/summers_illusion.pdf. Accessed: October 30, 2021.

Wikipedia contributors. 2021. Scientific Method. Available: https://en.wikipedia.org/wiki/Scientific_method. Accessed: October 30, 2021

Continue ReadingOn Data and Evidence in the Social Sciences

Descriptive and Inferential Statistics

Image by Michael Siebert. Downloaded from pixabay.com

Since I posted “Challenges in Exploratory Data Analysis” (February 1, 2021), I found myself struggling with the distinction between Exploratory Data Analysis and Confirmatory Data Analysis on one hand, and the distinction between Descriptive Statistics and Inferential Statistics on the other. The former distinction is relevant to what you can say with any one set of data and what you can say with more than one data set; while the latter distinction comes into play when deciding whether our interest lies in the sample at hand or on the process generating the sample we have (the population). 

Clarifying these distinctions is more than an academic exercise: doing so, and understanding how the terms are used, help us understand what we can say with the data and what we cannot, what assumptions we are making when inferring from the data and at what point in our analysis we are making those assumptions. It helps develop our own guidelines for disciplining our thought process when thinking with data.

According to Wikipedia (Wikipedia contributors 2021a), Exploratory Data Analysis was promoted by US mathematician John Tukey in the 60s and 70s, as a way of unearthing hypotheses to be tested with data before jumping onto testing hypotheses based on assumptions made. It was to be in contract with Confirmatory Data Analysis (hypothesis testing). It was a way of exploring what information was contained in the data, independent of any already existing hypotheses about the relevant subject matter. It included a myriad of techniques such as looking at variable maximums, minimums, means, medians and quartiles, but was characterized more by the attitude than the techniques. A number of techniques applied in exploratory data analysis can be applied whether our focus is on the sample at hand (descriptive statistics) or the underlying generating process (inferential statistics). In thinking about these concepts, I produced the diagram below, that is useful to me, may be useful to others as well (I used mostly my accumulated knowledge at this point, but suggest readers start with Wikipedia entries for Descriptive Statistics [2021b] and Statistical Inference [2021c] for further reading).

Source: author's take

Although exploratory data analysis techniques can be applied whether our focus is on the sample at hand or the underlying generating process, how things are done in each case may be different. In the table below I tried to establish some distinctions on how we would proceed with exploratory data analysis in descriptive and inferential statistics.

Source: author's take

 In either case, during exploratory data analysis, we do not talk about significance of correlation, causality or hypothesis testing. These require modeling and a second sample drawn from the same population (or treatment and control groups).

A final note on the terms used by Cassie Kozyrkov in her popular blogs and vlogs (Kozyrkov 2018; 2019a; 2019b; 2020).  She refers to data analytics as being used when there is no uncertainty (what I refer to as descriptive statistics) and refers generally to statistics when there is uncertainty (what I refer to inferential statistics). She also refers to data analytics as being for inspiration (what I refer here as exploratory data analysis), as opposed to hypothesis testing, that would require another sample. From what I can tell from the literature, these are less traditional uses of the terms and I find the traditional uses (what I believe I capture here) seem to better highlight the difference between analyzing sample and population data. 

References

Kozyrkov, Cassie. 2018. “Don’t Waste Your Time on Statistics.” Towards Data Science. May 29. Available: https://towardsdatascience.com/whats-the-point-of-statistics-8163635da56c. Accessed: May 23, 2021.

———-. 2019a. “Statistics for People in a Hurry.” Towards Data Science, May 29. Available: https://towardsdatascience.com/statistics-for-people-in-a-hurry-a9613c0ed0b. Accessed. May 23, 2021.

———-. 2019b.  “The Most Powerful Idea in Data Science.” Towards Data Science. August 09. Available: https://towardsdatascience.com/the-most-powerful-idea-in-data-science-78b9cd451e72. Accessed: May 23, 2021

———- 2020. “How to Spot a Data Charlatan.” Towards Data Science. October 09. Available: https://towardsdatascience.com/how-to-spot-a-data-charlatan-85785c991433. Accessed: May 23, 2020. 

Wikipedia contributors, 2021a.”Exploratory data analysis.”  In Wikipedia, The Free Encyclopedia. Available: https://en.wikipedia.org/w/index.php?title=Exploratory_data_analysis&oldid=1021890236. Accessed May 15, 2021.

———-. 2021b. “Descriptive Statistics.” In Wikipedia; The Free Encyclopedia. Available: https://en.wikipedia.org/wiki/Descriptive_statistics. Accessed May 23, 2021.

———-. 2021c. “Statistical Inference.” In Wikipedia; The Free Encyclopedia. Available: https://en.wikipedia.org/wiki/Statistical_inference. Accessed May 23, 2021.

Continue ReadingDescriptive and Inferential Statistics

Statistical Thinking

Image by Matthias Groeneveld. Downloaded from pexels.com

In my social media feed a couple of weeks ago, someone posted an image of a television news piece (from Detroit’s CW50) and a short paragraph under the headline “Former Detroit TV Anchor Dies One Day After Taking COVID Vaccine.” There was no actual link to a site, and the headline, image and paragraph seemed to not have been put together by the original source of the news. But the suggestion was clear: the COVID vaccine could be the cause of death. The paragraph referred to Karen Hudson-Samuels, a Detroit news producer and anchor who, indeed, seems to have died a day after taking the vaccine. Some articles on the internet quoted her husband as saying the immediate cause may have been a stroke with no clear relation to the vaccine (e.g. Nour Rahal 2021).

An off the cuff calculation would show that, given the number of people in the US who die every day and the number or COVID vaccinations taking place every day, particularly for the population over 65 (Karen Hudson-Samuels was 68), chances are that there will be people dying the day after taking a COVID vaccine, for completely unrelated reasons. For example, as of Feb 27 there were approximately 12 million 65-74 year olds with at least 1 dose of the vaccine (CDC 2021). If there are at least 31 million 65-74 year olds in the country (there were more in 2019 according to the Census Bureau, USCB 2020) and if there were 75 days of vaccination between December 14 (day of first year of vaccination in the US) and February 27, the chances of a 65-74 year old receiving the first dose of the vaccine on any given day during that period was approximately 0.5% (12 million divided by 75 days out of 31 million). The likelihood actually gets greater towards the end of the period, when Karen Hudson-Samuels received it, because the number of 65-74 year olds who have not yet received their first dose decreases (the pool left to receive the dose keep getting smaller), assuming vaccination continues at the same pace. According to the CDC, the mortality rate of 65-74 year olds in the US in 2019 was approximately 1.7% (Kochanek et al, 220). Divided by 365 days, that means approximately 0.0047% of 65-74 year olds died on any given day for causes unrelated to COVID (COVID was not present at the time). If the same holds true in 2021, the likelihood that a 64-75 year old died the day after receiving the first doses of the COVID vaccine is 0.5% (chance of receiving a vaccine on any given day) x 0.0047% (chance of dying of an unrelated cause on any given day). That equals 0.000024%, or approximately 1 in 4 million. That seems like a very low chance. But if there are over 30 million 65-74 year olds in the US, that likely happened in over 7 cases between December 14 and February 27. And there were likely 7 more who died the same day of the vaccine, and 7 more two days after and so on.

The numbers above may be a bit off and the calculations assume receiving a COVID vaccine and dying of a non-related cause are independent events. This may not be the case if, for example, someone with a life threatening underlying health condition would be more likely to get a vaccine. In this case the chances of observing someone dying the day after receiving a vaccine would be even larger. On the other hand, if the fact that someone is visibly about to die makes them less likely to receive a vaccine, the chances would be smaller. But the general point should remain valid, even if the actual numbers are a bit different under varying circumstances: chances are there are several cases like that of Karen Hudson-Samuels that can be explained by pure chance, with no relationship to the COVID vaccine at all. Yet, all you need is 1 case to make the news and to persuade some people that the vaccine is likely the cause.

The post seems a clear example of the point that Leonard Mlodinow is making in the book I am currently reading, The Drunkard’s Walk: How Randomness Rules Our Lives (Mlodinow 2009). In this book he argues that random processes are all around us and play an important role in daily events. Yet, we seem to be ill equipped to recognize them and we often don’t. I am about half way through the book and so far have found most interesting his account of the development of statistical concepts and understanding over time. It is also a very pleasant read. I’ll be referring to this book again in future posts.

References

CDC (Centers for Disease Control and Prevention). 2021.  Demographic Characteristics of People Receiving COVID-19 Vaccinations in the United States. Available: https://covid.cdc.gov/covid-data-tracker/#vaccination-demographic. Accessed: 02/28/2021

Kochanek, Kenneth D., M.A., Jiaquan Xu, M.D., and Elizabeth Arias, Ph.D. 2020. Mortality in the United States, 2019. National Center for Health Statistics (NCHS) Data Brief 395, December. CDC (Centers for Disease Control and Prevention). Available: https://www.cdc.gov/nchs/data/databriefs/db395-H.pdf. Accessed: 02/28/2021

Mlodinow, Leonard. 2009. The Drunkard’s Walk. How Randomness Rules Our Lives. New York: Vintage Books. A Division of Random House.

Nour Rahal. 2021. Karen Hudson-Samuels remembered as Black TV news pioneer and Detroit history promoter. Detroit Free Press, 02/20/2021. Available: https://www.freep.com/story/news/obituary/2021/02/20/karen-hudson-samuels-black-tv-news-pioneer/6784796002/ . Accessed: 02/28/2021.

USCB (United States Census Bureau). 2020. Annual Estimates of the Resident Population by Single Year of Age and Sex for the United States: April 1, 2010 to July 1, 2019 (NC-EST2019-AGESEX-RES). Available: https://www.census.gov/data/tables/time-series/demo/popest/2010s-national-detail.html. Accessed: 02/28/2021

Continue ReadingStatistical Thinking

A Thought After Reading Daniel Kahneman’s “Thinking, Fast and Slow”

  • Post author:
  • Post category:Fan
Image by Maryam62. Downloaded from pixabay.com

I can tell that Daniel Kahneman’s book, Thinking, Fast and Slow, is one of those books that I will find myself coming back to over and over again. Why? On one hand, it provides evidence from experimental psychology for phenomena that I was already inclined to believe, that is, it appeals to my own confirmation bias. At the same time, it offers a number of insights and explanations that are new to me and eye opening. Kahneman, a psychologist, won a Nobel Prize in Economics for his work on decision-making under uncertainty. His book reflects a lifetime of learning about human judgement and is an absolute delight to read.

It’s central framework seems to revolve around the idea that our minds address problems in two distinct ways, which he describes using the metaphors of System 1 and System 2. The table below summarizes some of the characteristics of each system.

System 1System 2
FastSlow
IntuitiveDeliberate
Automatic, cannot be turned off at willEffortful and/but/therefore lazy
ImpulsiveRequires self-control
Associative and causal, but does not have the capacity for statistical thinkingRequires training for statistical thinking
Prone to bias and to believingIn charge of doubting and unbelieving

Much of Kahneman’s book is then focused on System 1 and its characteristics and biases. We learn of its need for coherence and the central role of cognitive ease in driving belief. System 2 is much less explored in the book and I was left with the desire to learn more about what can be done to strengthen our use of System 2.

Why would we want to strengthen our reliance on System 2? It is not obvious that doing so would necessarily lead to better social outcomes, if we are willing to rely on expertise.System 1’s intuitive thought draws both on heuristics (shortcuts that replace a more complex problem with a simpler one) and on expertise (recognition of information previously acquired). Becoming an expert means you can draw on your acquired information to reach better conclusions with less effort. In other words, the more we rely on experts, the more likely we are to avoid incorrect conclusions in a world governed by System 1 thinkers.

However, we are unlikely (and it is probably impossible) to seek expertise in the myriad of problems and decisions we encounter on a daily basis. And as Kahneman points out in a reference to the work of another psychologist, Daniel Gilbert, System 1 is prone to believe (Kahneman, 2011, Chapter 7). “Unbelieving” is the effortful task of System 2. If we want to form opinions and draw conclusions about much of the information we encounter on any given day, we will need to spend energy, potentially a lot of energy: we will need to be willing to lead an effortful, even if potentially fulfilling life. Those less susceptible to the System 1 biases and more prone to calling on System 2 Kahneman calls “engaged.”

Should we strive to be more “engaged?” If so, how?

References

Kahneman, Daniel, 2011. Thinking, Fast and Slow. Farrar, Straus and Giroux.

Continue ReadingA Thought After Reading Daniel Kahneman’s “Thinking, Fast and Slow”

Challenges in Exploratory Data Analysis

  • Post author:
  • Post category:Fan
Image by bluebudgie. Downloaded from pixabay.com

You are given a dataset and asked: “what do the data tell us? Do not assume we know anything about the subject, just tell us what the data say?” This is often the task referred to as “exploratory data analysis,” and it is harder than might seem. I see two main challenges.

The first is the request to “not assume we know anything about the subject.” This request is easy to forget without realizing. For example, say you have a dataset with twenty variables. It is perfectly fine during exploratory analysis to want to look at, not just individual variables in your dataset, but also how variables fluctuate relative to each other, that is, correlation. Now, how easy is it to look at correlations within the dataset with no prior inclination to think some of the twenty variables will be more likely correlated than others? We can fight the urge to pay more attention to those by always including all twenty variables in any and all considerations about correlation, but this requires discipline. One could even argue that we should, indeed, spend more time exploring correlations that we have a basis to believe have a causal connection, and that focusing equally on other correlations are a waste of time and possibly misleading. In any case, how to explore data given the mental models we all approach them with is a potential issue to be dealt with. I will likely return to this in a future post.

The second challenge I see in exploratory data analysis is identifying, and keeping in mind at all times, the sources of uncertainty in our data. The sources of uncertainty are several: from what we don’t know about how the variables were chosen and the data were collected, cleaned, stored and checked, to whether we are, consciously or not, asking questions, not about the dataset itself, but about the underlying generating process, that is, about a population of which we can consider the dataset to be a sample.

This last point I find is often overlooked. In some cases, we know that we are looking at a sample and asking questions about a population. For example survey data is often clearly extracted from a broader population in which we are interested. This is the classic use of inferential statistics that we all learn about in college – although, even in this case, we often see analyses focusing on point estimates rather than the more appropriate confidence intervals. But there are cases where we lose track of the sources of uncertainty in our data (or sources of uncertainty in our analysis) and must maintain discipline to correctly assess what our analysis is actually telling us.

For example, say we have data for five characteristics (five variables) for every inhabitant of a community. We are only interested in that community, so we understand we have “population” data (not a sample). In looking at correlation between our five variables, we decide to look at linear correlation among them through a linear regression. Our statistical software spits out a summary of results from our linear regression that includes coefficients and p-values for those coefficients. But p-values assume a distribution for the observed coefficients. If there is a distribution, there is a source of uncertainty (a random variable). Where did that uncertainty come from? Aren’t we looking at population data and, therefore, what we see is all there is to know?

My answer is that the uncertainty stems from assuming there is a linear relationship with variables when what we observe does not perfectly fit that linear relationship. There is, therefore, an “error” term associated with each observation relative to the calculated linearly predicted relationship. The whole linear regression exercise is asking questions about an assumed underlying generating process in the data, not about the observed data itself. We started making assumptions about the data and asking questions about an underlying process, very possibly without noticing.

So here are my tentative initial guidelines for doing exploratory data analysis:

  1. Start by understanding the data: publishing source; when and where the data was collected and who collected it; what universe is it supposed to represent and was it intended as a sample of a larger population; definitions – are the variables well defined; what errors may have been inserted in the data during transmission, cleaning, storing or other manipulation.
  2. Go onto univariate analyses and then cover correlations, being mindful of any potential assumptions we are making and, if we feel we absolutely need to make these assumptions, be explicit about them, and keep them in mind when drawing conclusions.
  3. Keep in mind at all times whether our questions are focusing on the data at hand or on an underlying generating process, i.e., whether we are “going beyond the data.” Again, be explicit if doing so.
  4. Be aware that exploratory analysis is supposed to focus on extracting inspiration from our data. It is not sufficient to draw conclusions. These require a separate step:  testing hypotheses with a second set of data that can be assumed extracted through the same generating process (from the same population). We do not test hypotheses during exploratory data analysis, nor discuss causality and modelling, other than possibly as suggestions for the next step of hypothesis testing.
Continue ReadingChallenges in Exploratory Data Analysis

Repetition and Belief

  • Post author:
  • Post category:Fan
Image by Pham Thoai. Downloaded from pexels.com

“You can fool all the people some of the time and some of the people all the time, but you cannot fool all the people all the time.” Respond quickly: who is the author of this statement? The most common attribution is Abraham Lincoln, although, as often happens with quotes, there is little evidence to support this attribution. There seems to be some evidence that the statement was actually made by the seventeenth century French protestant, Jacques Abbadie (Quote Investigator, 2013; Parker, 2016). However, we have heard the attribution to Abraham Lincoln so often, that we assume it to be true.

Over the end of the year holidays I read (most of) Daniel Kahneman’s book, “Thinking, Fast and Slow.” Kahneman, a psychologist, won the Nobel Prize for Economics in 2002 (shared with Vernon Smith) for the contributions to economics from his research on human judgement and decision-making under uncertainty. One aspect of human cognition that he describes in his book, is that we are more likely to believe what we find easy to. Various factors can contribute to our “cognitive ease”, including a clear display of the information we are exposed to, having been “primed” by association with a prior piece of information, being in a good mood, and also by repeated exposure to the information, whether that information is true or not (Kahneman, 2011, Chapter 5). 

Repetition is also often discussed as an important component in learning (see, for example, the literature on spaced repetition), change management and other aspects of our daily life that depend on our understanding and perception of reality. I once read (and never forgot) a passage in Machiavelli’s “The Prince” where he states that injuries should be inflicted at once, while benefits should be provided piecemeal, overtime, if a ruler is to ensure permanence in power. I always understood this as reflecting how repetition affects perception (Machiavelli, 1998, chapter VIII). 

Recent political discussions in the U.S. have referred to the idea of the “Big Lie,” the idea that less plausible lies are often easier to sell to the public than small ones… if sufficiently repeated (RationalWiki contributors, 2021). A recent paper by Fazio et al (2015) argues, based on a couple of experiments, that repetition of false information impacts belief even when those exposed know better, in an effect they call “knowledge neglect,” and that reflects a primacy of processing fluency (cognitive ease) over retrieval of knowledge under certain conditions which include repetition.

What do I take from the above? The more confident I am in new acquired knowledge, the more I will repeat, remind myself, to better internalize, garner the power of repetition. The less confident I am about new knowledge, the more suspicious I will be when seeing it repeated. I guess that is a new years resolution and, yes, those work: I heard it many times.

References

Fazio, L. K; N. M. Brashier; B. K. Payne and E. J. Marsh. 2015. “Knowledge Does Not Protect Against Illusory Truth.” In Journal of Experimental Psychology 144(5): 993-1002. Available: https://www.apa.org/pubs/journals/features/xge-0000098.pdf. Accessed: January 18, 2021

Kahneman, Daniel, 2011. Thinking, Fast and Slow. Farrar, Straus and Giroux.

Machiavelli, Nicolo, 1998. The Prince. The Project Gutenberg eBook. Translated by W.K.Marriott. Available: https://www.gutenberg.org/files/1232/1232-h/1232-h.htm#chap08. Accessed: January 09, 2021.

Parker, David B., 2016. “You Can Fool All the People”: Did Lincoln Say It?. History News Network. Available: https://historynewsnetwork.org/article/161924. Accessed: January 09, 2021

Quote Investigator, 2013. You cannot fool all the people all the time. Available: https://quoteinvestigator.com/2013/12/11/cannot-fool/#return-note-7793-2. Accessed: January 09, 2021.

RationalWiki contributors, 2021. “Big Lie,” In RationalWiki. Available: https://rationalwiki.org/wiki/Big_lie. Accessed: January 09, 2021.

Continue ReadingRepetition and Belief

Defining Data

  • Post author:
  • Post category:Fan
Image by Alex Uriarte

A few weeks ago I watched a few of Crash Course’s Data Literacy elearning videos on YouTube (Arizona State University and Crash Course 2020). It’s first episode defines “data” as “specific information we collect to make decisions.” This is a different definition from others I have heard. It does have some interesting aspects to it. Under this definition:

  • Data would be a subset of information. That is, all data would be information but not all information data.
  • It uses collection and decision making to define what information is data and what is not.

Other definitions are very different.

A common distinction between data and information is that found in the so-called DIKW pyramid or similar representations. DIKW stands for Data – Information – Knowledge – Wisdom, and usually suggests a hierarchy where data is the broader concept that is then filtered as information, that is in turn filtered as knowledge and finally as wisdom. This seems to be commonly used in the knowledge management community and is often attributed to an article by Russell Lincoln Ackoff in the Journal of Applied Systems Analysis in 1989 (e.g., see Bernstein 2009)

Under this representation, data are often interpreted as facts, noise or signals. There are many criticisms to this representation, from whether “filtering” is actually a good way of thinking about the connections between these concepts, to proposed changes to the pyramid, to what is actually the broader concept, data or information (for just a few examples of a relatively large literature, see Weinberger 2010; Tuommi 1999; and Dammann 2018).

Yet a third way of thinking about data is the definition contained in US law. US Federal statutes define data as “recorded information, regardless of form or the media on which the data is recorded” (44 U.S. Code § 3502). The definition is less innocuous than what it may seem at first. Recording information is in good part what distinguishes our handling of information from cultures who rely (or relied) on voice of mouth transmission and the potential loss of content associated with such practices: think of the telephone game that kids play, whispering a sentence in another one’s ear, who then whispers to another one, and so on until the last child states out loud what his/her understanding of the sentence is, often to find the sentence arrived at the end of the communication chain completely altered or distorted. Under this definition, however, information is the broader concept.

The table below summarizes the three different definitions of “data.” 

 

ASU and Crash Course 2020

U.S. Federal Statutes

DIKW pyramid

Definition or understanding

Specific information we collect to make decisions

Recorded information, regardless of form or the media on which the data is recorded

Facts, noise, signals

Highlight

Data has a purpose: decision-making

Data must be recorded

Data as facts, no specific purpose or characteristic

Data relative to information

Information > Data

Information > Data

Data > Information

I do not find the last row – showing the relation between data and information – particularly useful in understanding data: it is a result of how we define not just data but also information, and may be more useful for discussions focused on knowledge. I include it in the table only for the sake of comparison and may explore it in other posts. I do find the “highlight” of each definition useful in thinking about data, how to manage and use them: 

  • Data should reflect facts. Whether it does or not, depends on how it was collected and managed. This is important to keep in mind in discussions about data collection, data curation and trusted repositories.
  • Data should be recorded. This reinforces the importance of data curation and particularly of metadata in enabling us to understand what “facts” do the data actual capture.
  • Data may be used for decision making. Hence, it is important to keep this in mind the many considerations around data bias, completeness, presentation and interpretation.

In this blog, I will use the highlight of each of the three definitions to discuss data.

References

44 U.S. Code § 3502. Legal Information Institute. Cornell Law School. Available: https://www.law.cornell.edu/uscode/text/44/3502#:~:text=(A)%20means%20the%20obtaining%2C,or%20format%2C%20calling%20for%20either%E2%80%94. Accessed: November 14, 2020

Arizona State University and Crash Course, 2020. Study Hall: Data Literacy. Available: https://www.youtube.com/watch?v=0H8awA3GBPg&list=PLNrrxHpJhC8m_ifiOWl1hquDmdgvcviOt&index=14. Accessed: November 27, 2020

Bernstein, J. H., 2009. The Data-Information-Knowledge-Wisdom Hierarchy and its Antithesis. In: Proceedings from North American Symposium on Knowledge Organization. Vol. 2.  Available: https://journals.lib.washington.edu/index.php/nasko/article/viewFile/12806/11288. Accessed: November 27, 2020.

Dammann, Olaf. 2018. Data, Information, Evidence, and Knowledge: A Proposal for Health Informatics and Data Science. In: Online Journal of Public Health Informatics10(3):e224. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6435353/pdf/ojphi-10-e224.pdf. Accessed: November 27, 2020.

Tuommi, Ikka. 1999. Data is more than knowledge: Implications of the reversed knowledge hierarchy for knowledge management and organizational memory. In: Journal of Management Information Systems 16(3):103-117.Available: https://www.researchgate.net/publication/328803142_Data_is_more_than_knowledge_Implications_of_the_reversed_knowledge_hierarchy_for_knowledge_management_and_organizational_memory. Accessed: November 27, 2020

Weinberger, 2010. The Problem with the Data-Information-Knowledge-Wisdom Hierarchy. In: Harvard Business Review. Available: https://hbr.org/2010/02/data-is-to-info-as-info-is-not. Accessed: November 27, 2020.

Continue ReadingDefining Data

On Using and Citing Wikipedia

  • Post author:
  • Post category:Fan
Image by Unattributed. Downloaded from pexels.com

I am a big fan of Wikipedia. Although I expect to never use it as a sole source of information in any of my posts (ahem: with the exception of this article), I do often use it as a starting point when researching any subject, and I often click on some of the references cited in Wikipedia entries to follow-up on whatever I am researching. You will see me referencing Wikipedia articles often.

Wikipedia warns that “nothing found here has necessarily been reviewed by people with the expertise required to provide you with complete, accurate or reliable information” (Wikipedia contributors, 2020a). In addition, they warn that, as a community-built reference, it may contain errors. Yet, I often cite Wikipedia for two reasons:

  • First, I do typically use it as a starting point and in combination with other sources in my posts. It’s encyclopedic nature – comprehensive, even if not necessarily in-depth – make it exactly the great starting point it is intended to be;
  • Second, because it is a community-built source and because it has come to occupy such a central role as reference on the internet, it is actually very subject to spot checking and reviews, even if not necessarily by recognized specialists in any given area, and not consistently for all entries. I do use my judgment on what entries I am more likely to trust. I am more likely to trust one on, say, butterflies, than one on a small country’s minor dictator from a hundred years ago, where there may be less people capable of or interested in verifying, as well as less reliable sources. A subject for which there is less information and is of less general interest will not receive the same amount and frequency of authoritative checks. 

Some minimum parameters do exist for Wikipedia entries: based on its “About” page: material “must fit within Wikipedia’s policies, including being verifiable against a published reliable source. Editors’ opinions and beliefs and unreviewed research will not remain.” And “many experienced editors are watching to ensure that edits are improvements. […] its contributors work on improving quality, removing or repairing misinformation and other errors” (Wikipedia contributors, 2020b).

As food for thought: compare what you find on Wikipedia to what you would typically find circulating on social media about any specific subject. I know, that is a very low bar but, nonetheless, it illustrates that it is possible to build value collaboratively under certain rules and within defined processes (I am now thinking of open source software, but will leave that for another post). 

For those ready to never read one of my posts again, check the article on “Criticism of Wikipedia” on ummm…shhh, Wikipedia: Wikipedia contributors, 2020.

References

Wikipedia contributors, 2020a. “Wikipedia: General Disclaimer,” In Wikipedia, The Free Encyclopedia. Available:https://en.wikipedia.org/wiki/Wikipedia:General_disclaimer. Accessed: October 25, 2020.

Wikipedia contributors, 2020b. “Wikipedia: About,” In Wikipedia, The Free Encyclopedia. Available: https://en.wikipedia.org/wiki/Wikipedia:About. Accessed: October 25, 2020.

Wikipedia contributors, 2020c. “Criticism of Wikipedia,” Wikipedia, The Free Encyclopedia/ Available:  https://en.wikipedia.org/w/index.php?title=Criticism_of_Wikipedia&oldid=985922235. Accessed: November 1, 2020.

Continue ReadingOn Using and Citing Wikipedia

End of content

No more pages to load