TOWARD HOLISTIC DATA SCIENCE EDUCATION

Authors

  • RICHARD DE VEAUX Williams College
  • ROGER HOERL Union College
  • RON SNEE Snee Associates, LLC
  • PAUL VELLEMAN Cornell University

DOI:

https://doi.org/10.52041/serj.v21i2.40

Keywords:

Statistics education research, Data science, Data provenance, Human-machine interaction , Data analysis ethics, Problem-solving

Abstract

Holistic data science education places data science in the context of real world applications, emphasizing the purpose for which data were collected, the pedigree of the data, the meaning inherent in the data, the deployiment of sustainable solutions, and the communication of key findings for addressing the original problem. As such it spends less emphasis on coding, computing, and high-end black-box algorithms. We argue that data science education must move toward a holistic curriculum, and we provide examples and reasons for this emphasis.

 

References

Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016, May 23). Machine bias: There’s software used across the country to predict future criminals. And it’s biased against blacks. ProPublica. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

Box, G. E. P., Hunter, W. G., & Hunter, J. S. (1978), Statistics for experimenters. John Wiley & Sons.

Danyluk, A., Leidig, P., Cassel, L., & Servin, C. (2019). ACM task force on data science education: Draft report and opportunity for feedback. SIGCSE 19: The 50th ACM Technical Symposium on Computer Science Education, Minneapolis, February 27–March 2 (pp. 496–497). https://doi.org/10.1145/3287324.3287522

De Veaux, R. D., & Hand, J. L. (2005). How to lie with bad data, Statistical Science, 20(3), 231–238.

De Veaux, R. D., Hoerl, R. W., & Snee, R. D. (2016). Big data and the missing links. Statistical Analysis and Data Mining, 9(6), 411–416.

De Veaux, R. D. et al. (2017). Curriculum guidelines for undergraduate programs in data science. Annual Review of Statistics and Its Application, 4, 15–30.

Donoho, D. (2017). 50 years of data science. Journal of Computational and Graphical Statistics, 26(4), 745–766.

Franklin, C., & Bargagliotti, A. (2020). Introducing GAISE II: A guideline for precollege statistics and data science education. Harvard Data Science Review, Issue 2.4. https://doi.org/10.1162/99608f92.246107bb

Hoerl, R. W., & Snee, R. D. (2019, January). Show me the pedigree: Evaluating data quality includes analyzing its origin and history. Quality Progress, pp. 16–23.

Hutson, M. (2018, May 3). AI researchers allege that machine learning is alchemy. Science.org. https://www.science.org/content/article/ai-researchers-allege-machine-learning-alchemy

Kennet, R. S., & Redman, T. C. (2019). The real work of data science. Wiley and Sons.

Larson, J., Mattu, S., Kirchner, L., & Angwin, J. (2016, May 23). How we analyzed the COMPAS recidivism algorithm. ProPublica. https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm

MacKay, R. J., & Oldford, W. (1994). Stat 231 course notes full 1994. University of Waterloo.

O’Neil, C. (2017). Weapons of math destruction. Broadway Books.

Ransbotham S., Kiron D. & Prentice, P. K. (2016). Beyond the hype: The hard work behind analytics success: Why competitive advantage from analytics is declining and what to do about it. MITSloan Management Review. https://sloanreview.mit.edu/projects/the-hard-work-behind-data-analytics-strategy/

Rosenberg, S. (2017, November 1). Why AI is still waiting for its ethics transplant. Wired. https://www.wired.com/story/why-ai-is-still-waiting-for-its-ethics-transplant/

Rudin, C., Wang, C. & Coker, B. (2020). The age of secrecy and unfairness in recidivism prediction. Harvard Data Science Review, Issue 2.1. https://doi.org/10.1162/99608f92.6ed64b30

Sambasivan, N., Kapania, S., Highfill, H., Akrong, D., Paritosh, P. K., & Aroyo, L. M. (2021). Everyone wants to do the model work, not the data work: Data cascades in high-stakes AI. Proceedings of CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, May 8–13. https://research.google/pubs/pub49953/

Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289–310.

Shearer C., (2000). The CRISP-DM model: The new blueprint for data mining. Journal of Data Warehousing, 5, 13–22.

Snee, R. D., & Hoerl, R. W. (2020, July). It’s not about the tools. Quality Progress, pp. 44–46

Tukey, J. W. (1962). The future of data analysis. In L. V. Jones (Ed.), The collected works of John W. Tukey. Vol. III (1986). Wadsworth.

Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley Publishing.

Velleman, P. F., & Hoaglin, D. C. (1995). A critical look at some analyses of major league baseball salaries. The American Statistician, 49(3), 277–285.

Velleman, P. F., & Hoaglin, D. C. (2022). Exploratory data analysis. In H. Cooper (Ed), APA handbook of research methods in psychology: Vol 3. Research designs: Quantitative, qualitative, neuropsychological, and biological. American Psychological Association.

Downloads

Published

2022-07-04