CAUSAL LANGUAGE AND STATISTICS INSTRUCTION: EVIDENCE FROM A RANDOMIZED EXPERIMENT
DOI:
https://doi.org/10.52041/serj.v23i1.673Keywords:
Statistics education research, Causal inference, Causal language, Introductory statistics, Statistics instructionAbstract
Most current statistics courses include some instruction relevant to causal inference. Whether this instruction is incorporated as material on randomized experiments or as an interpretation of associations measured by correlation or regression coefficients, the way in which this material is presented may have important implications for understanding causal inference fundamentals. Although the connection between study design and the ability to infer causality is often described well, the link between the language used to describe study results and causal attribution typically is not well defined. The current study investigates this relationship experimentally using a sample of students in a statistics course at a large western university in the United States. It also provides (non-experimental) evidence about the association between statistics instruction and the ability to understand appropriate causal attribution. The results from our experimental vignette study suggest that the wording of study findings impacts causal attribution by the reader, and, perhaps more surprisingly, that this variation in level of causal attribution across different wording conditions seems to pale in comparison to the variation across study contexts. More research, however, is needed to better understand how to tailor statistics instruction to make students sufficiently wary of unwarranted causal interpretation.
References
Adams, R. C., Sumner, P., Vivian-Griffiths, S., Barrington, A., Williams, A., Boivin, J., Chambers, C. D., & Bott, L. (2017). How readers understand causal and correlational expressions used in news headlines. Journal of Experimental Psychology: Applied, 23(1), 1–14. https://doi.org/10.1037/xap0000100
Adams, R. C., Challenger, A., Bratton, L., Boivin, J., Bott, L., Powell, G., Williams, A. Chambers, C. D., & Sumner, P. (2019). Claims of causality in health news: A randomized trial. BMC Medicine, 17(1), Article 91. https://doi.org/10.1186/s12916-019-1324-7
Ancker, J. S. (2006). The language of conditional probability. Journal of Statistics Education, 14(2), Article 5. https://doi.org/10.1080/10691898.2006.11910584
Bennett, K. A. (2014). Using a discussion about scientific controversy to teach central concepts in experimental design. Teaching Statistics, 37(3), 71–77. https://doi.org/10.1111/test.12071
Ben-Zvi, D., & Garfield, J. (2004). Statistical literacy, reasoning and thinking: Goals, definitions and challenges. In D. Ben-Zvi & J. Garfield (Eds.), The challenge of developing statistical literacy, reasoning, and thinking (pp. 3–15). Kluwer Academic Publishers. https://doi.org/10.1007/1-4020-2278-6_1
Çetinkaya-Rundel, M., & Hardin, J. (2021). Introductory to modern statistics. OpenIntro. https://openintro-ims.netlify.app/
College Board. (2024a). AP Central: AP Statistics course audit. https://apcentral.collegeboard.org/courses/ap-statistics/course-audit
College Board. (2024b). AP Central: The course AP Statistics. https://apcentral.collegeboard.org/courses/ap-statistics
Cooper, L. L., & Shore, F. S. (2008). Students’ misconceptions in interpreting center and variability of data represented via histograms and stem-and-leaf plots. Journal of Statistics Education, 16(2), Article 1. https://doi.org/10.1080/10691898.2008.11889559
Cummiskey, K., Adams, B., Pleuss, J., Turner, D., Clark, N., & Watts, K. (2020). Causal inference in introductory statistics courses. Journal of Statistics Education, 28(1), 2–8. https://doi.org/10.1080/10691898.2020.1713936
Delport, D. H. (2023). The development of statistical literacy among students: Analyzing messages in media articles with Gal’s worry questions. Teaching Statistics, 45(2), 61–68. https://doi.org/10.1111/test.12308
Diez, D., Çetinkaya-Rundel, M., & Barr, C. D. (2019). OpenIntro statistics (4th ed.). OpenIntro. https://www.openintro.org/book/os/
Fausey, C. M., Long, B. L., Aya, I., & Boroditsky, L. (2010). Constructing agency: The role of language. Frontiers in Psychology, 1, Article 162. https://doi.org/10.3389/fpsyg.2010.00162
Fleming, T. R., Demets, D. L., & McShane, L. M. (2017) Discussion: The role, position, and function of the FDA: The past, present, and future. Biostatistics, 18(3), 417–421. https://doi.org/10.1093/biostatistics/kxx023
Fry, E. (2018). Introductory statistics students’ conceptual understanding of study design and conclusions (Publication No. 10689030). [Doctoral dissertation, University of Minnesota]. ProQuest Dissertations Publishing.
GAISE College Report ASA Revision Committee. (2016). Guidelines for assessment and instruction in statistics education college report 2016. American Statistical Association. http://www.amstat.org/education/gaise
Gelman, A. & Hill, J. (2007). Data analysis using regression and multilevel models. Cambridge University Press.
Gelman, A., Hill, J., & Vehtari, A. (2020). Regression and other stories. Cambridge University Press. https://doi.org/10.1017/9781139161879
Gelman, A., Hill, J. & Yajima, M. (2012). Why we (usually) don't have to worry about multiple comparisons. Journal of Research on Educational Effectiveness, 5(2), 189–211. https://doi.org/10.1080/19345747.2011.618213
Gelman, A., Nolan, D., Men, A., Warmerdam, S., & Bautista, M. (1998). Student projects on statistical literacy and the media. The American Statistician, 52(2), 160–166. https://doi.org/10.1080/00031305.1998.10480556
Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–472. https://doi.org/10.1214/ss/1177011136
Gerstenberg, T., Peterson, M. F., Goodman, N. D., Lagnado, D. A., & Tenenbaum, J. B. (2017). Eye-tracking causality. Psychological Science, 28(12), 1731–1744. https://doi.org/10.1177/0956797617713053
Goodrich, B., Gabry, J., Ali, I., & Brilleman, S. (2022). rstanarm: Bayesian applied regression modeling via Stan. R package version 2.21.3. https://mc-stan.org/rstanarm/
Haber, N., Smith, E. R., Moscoe, E., Andrews, K., Audy, R., Bell, W., Brennan, A. T., Breskin, A., Kane, J. C., Karra, M., McClure, E. S., & Suarez, E. A. (2018). Causal language and strength of inference in academic and media articles shared in social media (CLAIMS): A systematic review. PLoS ONE, 13(5), Article e0196346. https://doi.org/10.1371/journal.pone.0196346
Haber, N. A., Wieten, S. E., Rohrer, J. M., Onyebuchi, A. A., Tennant, P. W. G., Stuart, E. A., Murray, E. J., Pilleron, S., Lam, S. T., Riederer, E., Howcutt, S. J., Simmons, A. E., Leyrat, C., Schoenegger, P., Booman, A., Dufour, M.-S., K., O’Donoghue, A. L., Baglini, R., Do, S., … Fox, M. P. (2022). Causal and associational language in observational health research: A systematic evaluation. American Journal of Epidemiology, 191(12), 2084–2097. https://doi.org/10.1093/aje/kwac137
Hancock, S., Carnegie, N., Meyer, E., Schmidt, J., & Yager, M. (2021). Montana State introductory statistics with R. Montana State University. https://mtstateintrostats.github.io/IntroStatTextbook/. [Adapted from Çetinkaya-Rundel, M. & Hardin, J. (2021). Introduction to modern statistics.] OpenIntro. https://openintro-ims.netlify.app/
Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81(396), 945–960. https://doi.org/10.2307/2289064
Horton, N. J. (2022, September 8). Collection of papers on teaching Simpson’s paradox, confounding, and causal inference. Taylor & Francis Online. https://www.tandfonline.com/journals/ujse20/collections/teaching-simpsons-paradox
Horton, N. J. (2023). Teaching causal inference: Moving beyond “correlation does not imply causation.” Journal of Statistics and Data Science Education, 31(1), 1–2. https://doi.org/10.1080/26939169.2023.2178778
Hume, D. (1748). An enquiry concerning human understanding. A. Millar.
Ismail, Z., & Chan, S. W. (2015). Malaysian students’ misconceptions about measures of central tendency: An error analysis. AIP Conference Proceedings, 1643, 93–100. https://doi.org/10.1063/1.4907430
Kahneman, D. (2011). Thinking, fast and slow. Farrar, Straus and Giroux.
Kaplan, J. J. (2009). Effect of belief bias on the development of undergraduate students' reasoning about inference. Journal of Statistics Education, 17(1), Article 3. https://doi.org/10.1080/10691898.2009.11889501
Kaplan, J. J., Fisher, D. G., & Rogness, N. T. (2009). Lexical ambiguity in statistics: What do students know about the words association, average, confidence, random and spread? Journal of Statistics Education, 17(3), Article 6. https://doi.org/10.1080/10691898.2009.11889535
Kaplan, J. J., Fisher, D. G., & Rogness, N. T. (2010). Lexical ambiguity in statistics: How students use and define the words: association, average, confidence, random and spread. Journal of Statistics Education, 18(2), Article 6. https://doi.org/10.1080/10691898.2010.11889491
Kaplan, J. J., Rogness, N. T. & Fisher, D. G. (2012). Lexical ambiguity: Making a case against spread. Teaching Statistics, 34(2), 56–60. https://doi.org/10.1111/j.1467-9639.2011.00477.x
Langrall, C., Nisbet, S., & Mooney, E. (2006). The interplay between students’ statistical knowledge and context knowledge in analyzing data. In A. Rossman & B. Chance (Ed.), Working cooperatively in statistics education. Proceedings of the Seventh International Conference on Teaching Statistics, Salvador, Bahia, Brazil. International Statistical Institute.
Lavy, I. & Mashiach-Eizenberg, M. (2009). The interplay between spoken language and informal definitions of statistical concepts. Journal of Statistics Education, 17(1), Article 4. https://doi.org/10.1080/10691898.2009.11889502
Lewis, D. (1973a). Counterfactuals. Blackwell.
Lewis, D. (1973b). Causation. The Journal of Philosophy, 70(17), 556–567.
Lock, R. H., Lock, P. F., Morgan, K. L., Lock, E. F., & Lock, D. F. (2017). Statistics: Unlocking the power of data (2nd ed.). John Wiley & Sons.
Lübke, K., Gehrke, M., Horst, J., & Szepannek, G. (2020), Why we should teach causal inference: Examples in linear regression with simulated data. Journal of Statistics Education, 28(2), 133–139. https://doi.org/10.1080/10691898.2020.1752859
Lu, Y., Zheng, Q., & Quinn, D. (2023). Introducing causal inference using Bayesian networks and do-calculus. Journal of Statistics and Data Science Education, 31(1), 3–17. https://doi.org/10.1080/26939169.2022.2128118
McCormick, K., & Salcedo, J. (2015). SPSS statistics for dummies (3rd ed.). John Wiley & Sons.
Moore, D. S., McCabe, G. P., & Craig, B. A. (2012). Introduction to the practice of statistics (7th ed.). W. H. Freeman and Company.
Morling, B. (2017). Research methods in psychology (3rd ed.). W. W. Norton.
Mueller, J. F. & Coon, H. M. (2013). Undergraduates’ ability to recognize correlational and causal language before and after explicit instruction. Teaching of Psychology, 40(4), 288–293. https://doi.org/10.1177/0098628313501038
Nadathur, P. & Lauer, S. (2020). Causal necessity, causal sufficiency, and the implications of causative verbs. Glossa: A Journal of General Linguistics, 5(1), Article 49. https://doi.org/10.5334/gjgl.497
Nichols, A. L., & Maner, J. K. (2008). The good-subject effect: Investigating participant demand characteristics. Journal of General Psychology, 135(2), 151–65. https://doi.org/10.3200/genp.135.2.151-166
O’Brien, E. J., & Myers, J. L. (1987). The role of causal connections in the retrieval of text. Memory & Cognition, 15(5), 419–427. https://doi.org/10.3758/BF03197731
Owens, L. (2018). Identifying student difficulties in causal reasoning for college-aged students in introductory physics laboratory classes (Publication No. 10901947) [Doctoral dissertation, University of Cincinnati]. ProQuest Dissertations Publishing.
Parra, C. O., Bertizzolo, L., Schroter, S., Dechartres, A., & Goetghebeur, E. (2021). Consistency of causal claims in observational studies: A review of papers published in a general medical journal. BMJ Open, 11(5). Article e043339. https://doi.org/10.1136/bmjopen-2020-043339
Pearl, J. (2009). Causality: Models, reasoning, and inference (2nd ed.). Cambridge University Press.
Peck, R., Olsen, C., & Devore, J. L. (2016). Introduction to statistics and data analysis (5th ed.). Cengage Learning.
Read, S. J. (1987). Constructing causal scenarios: A knowledge structure approach to causal reasoning. Journal of Personality and Social Psychology, 52(2), 288–302. https://doi.org/10.1037/0022-3514.52.2.288
Richardson, A. M., Dunn, P. K., & Hutchins, R. (2013). Identification and definition of lexically ambiguous words in statistics by tutors and students. International Journal of Mathematical Education in Science and Technology, 44(7), 1007–1019. https://doi.org/10.1080/0020739X.2013.830781
Richardson, T., & Robins, J. (2013). Single world intervention graphs (SWIGs): A unification of the counterfactual and graphical approaches to causality [Working Paper]. Center for Statistics and the Social Sciences at the University of Washington. http://www.csss.washington.edu/Papers/wp128.pdf
Roster, C., Lucianetti, L. & Albaum, G. (2015). Exploring slider vs. categorical response formats in web-based surveys. Journal of Research Practice, 11(1), Article D1. http://jrp.icaap.org/index.php/jrp/article/view/509/413
Rossman, A., & Chance, B. (2021). Rossman/Chance applet collection 2021. http://www.rossmanchance.com/applets/index2021.html
Rubin, D. (1978). Bayesian inference for causal effects: The role of randomization. The Annals of Statistics, 1(6), 34–58. https://doi.org/10.1214/aos/1176344064
Rumsey, D. (2016). Statistics for dummies (2nd ed.). Wiley Publishing.
Sibulkin, A. E. & Butler, J. S. (2019). Learning to give reverse causality explanations for correlations: Still hard after all these tries. Teaching of Psychology, 46(3), 233–229. https://doi.org/10.1177/0098628319853936
Simms, L. J., Zelazny, K., Williams, T. F., & Bernstein, L. (2019). Does the number of response options matter? Psychometric perspectives using personality questionnaire data. Psychological Assessment, 31(4), 557–566. https://doi.org/10.1037/pas0000648
Solstad, T., & Bott, O. (2017). Causality and causal reasoning in natural language. In M. R. Waldmann (Ed.), The Oxford handbook of causal reasoning (pp. 619–644). Oxford University Press.
Starnes, D. S., & Tabor, J. (2014). The practice of statistics (5th ed.). W. H. Freeman.
Thapa, D. K., Visentin, D. C., Hunt, G. E., Watson, R., & Cleary, M. (2020). Being honest with causal language in writing for publication. Journal of American Nursing, 76(6), 1285–1288. https://doi.org/10.1111/jan.14311
Tintle, N., Chance, B. L., Cobb, G. W., Rossman, A. J., Roy, S., Swanson, T., & VanderStoep, J. (2020). Introduction to statistical investigations (2nd ed.). Wiley.
Tunstall, S. L. (2016). Fostering comprehension of risk and causation through media case studies. Teaching Statistics: An International Journal for Teachers, 38(2), 65–66. https://doi.org/10.1111/test.12099
Tunstall, S. L. (2018). Investigating college students’ reasoning with messages of risk and causation. Journal of Statistics Education, 26(2), 76–86. https://doi.org/10.1080/10691898.2018.1456989
Utts, J. M., & Heckard, R. F. (2015). Mind on statistics (5th ed.). Cengage Learning.
van den Broek, P. (2010). Using texts in science education: Cognitive processes and knowledge representation. Science, 328, 453–456. https://doi.org/10.1126/science.1182594
Velleman, P. F. (2008). Truth, damn truth, and statistics. Journal of Statistics Education, 16(2), Article 7. https://doi.org/10.1080/10691898.2008.11889565
Witmer, J. (2021). Simpson’s paradox, visual displays, and causal diagrams. The American Mathematical Monthly, 128(7), 598–610. https://doi.org/10.1080/00029890.2021.1932237
Wroughton, J. R., McGowan, H. M., Weiss, L. V., & Cope, T. M. (2013). Exploring the role of context in students’ understanding of sampling. Statistics Education Research Journal, 12(2), 32–58. https://doi.org/10.52041/serj.v12i2.303
Yilmaz, Z., Ergül, K., & Asik, G. (2023). Role of context in statistics: Interpreting social and historical events. Statistics Education Research Journal, 22(1), Article 6. https://doi.org/10.52041/serj.v22i1.72
Zapata-Cardona, L. (2023). The role of contexts in supporting early statistical reasoning in data modeling. Statistics Education Research Journal, 22(2), Article 5. https://doi.org/10.52041/serj.v22i2.448