INVESTIGATING DATA LIKE A DATA SCIENTIST: KEY PRACTICES AND PROCESSES

Authors

  • HOLLYLYNNE LEE North Carolina State University
  • GEMMA MOJICA North Carolina State University
  • EMILY THRASHER North Carolina State University
  • PETER BAUMGARTNER Explosion AI

DOI:

https://doi.org/10.52041/serj.v21i2.41

Keywords:

Statistics education research, Data science education, Industry ethnography, Data investigation framework, Literature review

Abstract

With a call for schools to infuse data across the curriculum, many are creating curricula and examining students’ thinking in data-intensive problems. As the discipline of statistics education broadens to data science education, there is a need to examine how practices in data science can inform work in K-12. We synthesize literature about statistics investigation processes, data science as a field and practices of data scientists. Further, we provide results from an ethnographic and interview study of the work of data scientists. Together, these inform a new framework to support data investigation processes. We explicate the practices and dispositions needed and offer a glimpse of how the framework can be used to move the discipline of data science education forward.  

References

Agarwal, S. (2018, February 9). Understanding the data science lifecycle. Sudeep.co. https://www.sudeep.co/data-science/2018/02/09/Understanding-the-Data-Science-Lifecycle.html

Barber, M. (2018, January 14). Data science concepts you need to know! Part 1. Towards Data Science. https://towardsdatascience.com/introduction-to-statistics-e9d72d818745

Bargagliotti, A., Binder, W., Blakesley, L., Eusufzai, Z., Fitzpatrick, B., Ford, M., Huchting, K., Larson, S., Rovetti, R., Seal, K., & Zachariah, T. (2020a). Undergraduate learning outcomes for achieving data acumen. Journal of Statistics Education. https://doi.org/10.1080/10691898.2020.1776653

Bargagliotti, A., Franklin, C., Arnold, P., Gould, R., Johnson, S., Perez, L., & Spangler, D. (2020b). Pre-K–12 Guidelines for assessment and instruction in statistics education II (GAISE II). American Statistical Association and National Council of Teachers of Mathematics. https://www.amstat.org/asa/files/pdfs/GAISE/GAISEIIPreK-12_Full.pdf

Ben-Zvi, D., & Ben-Arush, T. (2014). EDA instrumented learning with TinkerPlots. In T. Wassong, D. Frischemeier, P. R. Fischer, R. Hochmuth, & P. Bender (Eds.), Using tools for learning mathematics and statistics (pp. 193–208). Springer Spektrum, Wiesbden. https://doi.org/10.1007/978-3-658-03104-6_15

Ben-Zvi, D., Gravemeijer, K., & Ainley, J. (2018). Design of statistics learning environments. In D. Ben-Zvi, K. Makar, & J. Garfield (Eds.), International handbook of research in statistics education (pp. 473–502). Springer. https://doi.org/10.1007/978-3-319-66195-7_16

Boaler, J., & Levitt, S. (2019, October 23). Modern high school math should be about data science: Not Algebra 2. Los Angeles Times. https://www.latimes.com/opinion/story/2019-10-23/math-high-school-algebra-data-statistics

Breiman, L. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical Science: A Review Journal of the Institute of Mathematical Statistics, 16(3), 199–231.

Cao, L. (2017). Data science: A comprehensive overview. ACM Computing Surveys, 50(3), 1–42. https://doi.org/10.1145/3076253

Carmichael, I., & Marron, J. S. (2018). Data science vs. statistics: Two cultures?. Japanese Journal of Statistics and Data Science, 1(1), 117–138. https://doi.org/10.1007/s42081-018-0009-3

Creswell, J. W. (2013). Qualitative inquiry & research design: Choosing among the five approaches. SAGE Publications.

Cleveland, W. S. (2001). Data science: An action plan for expanding the technical areas of the field of statistics. International Statistical Review, 69(1), 21–26. https://doi.org/10.1111/j.1751-5823.2001.tb00477.x

Conway, D. (2010, September 30). The data science venn diagram. Drew Conway Data Consulting. http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

D’Ignazio, C., & Klein, L. F. (2020) Data feminism. MIT Press.

Donoho, D. (2017). 50 years of data science. Journal of Computational and Graphical Statistics, 26(4), 745–766. https://doi.org/10.1080/10618600.2017.1384734

Education Development Center. (2014). Profile of a big-data-enabled specialist. http://oceansofdata.org/our-work/profile-big-data-enabled-specialist.

Education Development Center. (2015). Call for action to promote data literacy. EDC Oceans of Data Institute. http://oceansofdata.org/call-action-promote-data-literacy

Education Development Center. (2016). Profile of a data practitioner. http://oceansofdata.org/our-work/profile-data-practitioner

Engel, J. (2017). Statistical literacy for active citizenship: A call for data science education. Statistics Education Research Journal, 16(1), 44–49. https://doi.org/10.52041/serj.v16i1.213

Finzer, W. (2013). The data science education dilemma. Technology Innovations in Statistics Education, 7(2). https://doi.org/10.5070/T572013891

Franklin, C., Kader, G., Mewborn, D., Moreno, J., Peck, R., Perry, M., & Scheaffer, R. (2007). Guidelines for assessment and instruction in statistics education (GAISE) Report: A Pre-K–12 curriculum framework. American Statistical Association. https://www.amstat.org/asa/files/pdfs/GAISE/GAISEPreK-12_Full.pdf

Friel, S., O’Connor, W., & Mamer, J. (2006). More than “meanmedianmode” and a bar graph: What’s needed to have a statistical conversation? In G. Burrill and P. Elliott (Eds.), Thinking and reasoning with data and chance: Sixty-eighth Yearbook (pp. 117–137). National Council of Teachers of Mathematics.

Geringer, S. (2014, January 6). Data science venn diagram v2.0. Steve’s Machine Learning Blog. http://www.anlytcs.com/2014/01/data-science-venn-diagram-v20.html

Goldstein, A. (2017, January 14). Deconstructing data science: Breaking the complex craft into it's simplest parts. Mission.org. https://medium.com/the-mission/deconstructing-data-science-breaking-the-complex-craft-into-its-simplest-parts-15b15420df21

Gould, R., Machado, S., Ong, C., Johnson, T., Molyneux, J., Nolen, S., Tangmunarunkit, H., Trusela, L., & Zanontian, L. (2016). Teaching data science to secondary students: The mobilize introduction to data science curriculum. In J. Engel (Ed.), Promoting understanding of statistics about society. Proceedings of the Roundtable Conference of the International Association of Statistics Education (IASE), Berlin, Germany. https://iase-web.org/documents/papers/rt2016/Gould.pdf

Gould R., Wild C. J., Baglin J., McNamara A., Ridgway J., & McConway K. (2018). Revolutions in teaching and learning statistics: A collection of reflections. In Ben-Zvi D., Makar K., Garfield J. (Eds), International handbook of research in statistics education (pp. 457–472). Springer. https://doi.org/10.1007/978-3-319-66195-7_15

Graham, A. T. (1987). Statistical investigations in the secondary school. Cambridge University Press.

Grimshaw, S. D. (2015). A framework for infusing authentic data experiences within statistics courses. The American Statistician, 69(4), 307–314. http://dx.doi.org/10.1080/00031305.2015.1081106

International Data Science in Schools Project Curriculum Team. (2019). Curriculum frameworks for Introductory Data Science. http://idssp.org/files/IDSSP_Frameworks_1.0.pdf.

Kahn, J., & Jiang, S. (2021). Learning with large, complex data and visualizations: Youth data wrangling in modeling family migration. Learning, Media and Technology, 46(2), 128–143. https://doi.org/10.1080/17439884.2020.1826962

Kolassa, S. (2014, November, 5). The data scientist venn diagram [Comment on the blog post “Data Science without knowledge of a specific topic, is it worth pursuing as a career?”]. Stack Exchange. https://datascience.stackexchange.com/questions/2403/data-science-without-knowledge-of-a-specific-topic-is-it-worth-pursuing-as-a-ca

Lee, H. S. , & Tran, D. (2015). Framework for supporting students’ approaches to statistical investigations: A guiding framework for the Teaching Statistics through Data Investigations. In Teaching Statistics Through Data Investigation MOOC-Ed. Friday Institute for Educational Innovation, NC State University. https://s3.amazonaws.com/fi-courses/tsdi/unit_3/SASI%20Framework.pdf

Lee, H. S., & Harrison, T. R. (2021). Trends in teaching Advanced Placement Statistics: Results from a national survey. Journal of Statistics and Data Science Education, 29(3), 317–327. https://doi.org/10.1080/26939169.2021.1965509

Lee, O., & Campbell, T. (2020). What science and STEM teachers can learn from COVID-19: Harnessing data science and computer science through the convergence of multiple STEM subjects. Journal of Science Teacher Education, 31(8), 932–944. https://doi.org/10.1080/1046560X.2020.1814980

Lee, V. R., & Wilkerson, M. (2018). Data use by middle and secondary students in the digital age: A status report and future prospects. Commissioned Paper for the National Academies of Sciences, Engineering, and Medicine, Board on Science Education, Committee on Science Investigations and Engineering Design for Grades 6–12. https://digitalcommons.usu.edu/itls_facpub/634/

Lesser, L. M. (2007). Critical values and transforming data: Teaching statistics with social justice. Journal of Statistics Education, 15(1). https://doi.org/10.1080/10691898.2007.11889454

Lovett, J. N., & Lee, H. S. (2018). Preservice secondary mathematics teachers’ statistical knowledge: A snapshot of strengths and weaknesses. Journal of Statistics Education, 26(3), 214–222. https://doi.org/10.1080/10691898.2018.1496806

MacGillivray, H., & Pereira-Mendoza, L. (2011). Teaching statistical thinking through investigative projects. In C. Batanero, G. Burrill, and C. Reading (Eds.), Teaching statistics in school mathematics-challenges for teaching and teacher education (pp. 109–120). Springer. https://doi.org/10.1007/978-94-007-1131-0_14

Moore, D. S. (1997). New pedagogy and new content: The case of statistics. International Statistics Review, 65(2), 123–165. https://doi.org/10.2307/1403333

National Academies of Sciences, Engineering, and Medicine. (2018). Data science for undergraduates: Opportunities and options. The National Academies Press. https://doi.org/10.17226/25104

National Council of Teachers of Mathematics. (2000). Principles and standards for school mathematics. Author.

Next Generation Science Standards Lead States. (2013). Next Generation Science Standards: For states, by states. National Academies Press. https://www.nextgenscience.org/standards/standards

National Governors Association Center for Best Practice & Council of Chief State School Officers. (2010). Common core state standards for mathematics. http://www.corestandards.org/Math/

National Research Council. (2000). How people learn: Brain, mind, experience, and school: Expanded edition. The National Academies Press. https://doi.org/10.17226/9853

Pangrazio, L., & Selwyn, N. (2021). Towards a school-based “critical data education.” Pedagogy, Culture & Society 29(3), 431–44. https://doi.org/10.1080/14681366.2020.1747527

Rosenberg, J., Edwards, A., & Chen, B. (2020). Getting messy with data. The Science Teacher, 87(5), 30–34. https://www.nsta.org/science-teacher/science-teacher-january-2020/getting-messy-data

Rubin, A. (2020). Learning to reason with data: How did we get here and what do we know?. Journal of the Learning Sciences, 29(1), 154–164. https://doi.org/10.1080/10508406.2019.1705665

Saltz, J. S. (2020, May 29). CRISP-DM for data science teams: 5 actions to consider. Data Science Process Alliance. https://www.datascience-pm.com/crisp-dm-for-data-science-teams-5-actions-to-consider

Saltz, J. S., Shamshurin, I., & Connors, C. (2017). A framework for describing big data projects. In W. Abramowicz, R. Alt, & B. Franczyk (Eds), Business Information Systems Workshops. BIS 2016. Lecture Notes in Business Information Processing, Vol 263. Springer. https://doi.org/10.1007/978-3-319-52464-1_17

Saltz, J. S., & Hotz, N. (2020). Identifying the most common frameworks data science teams use to structure and coordinate their projects. Proceedings of the 2020 IEEE International Conference on Big Data (pp. 2038–2042). https://doi.org/10.1109/BigData50022.2020.9377813

Tierney, B. (2012, June 13). Data science is multidisciplinary. Oralytics. https://oralytics.com/2012/06/13/data-science-is-multidisciplinary/

Tukey, J. (1977). Exploratory data analysis. Addison-Wesley.

Watson, J., Fitzallen, N., Fielding-Wells, J., & Madden, S. (2018). The practice of statistics. In D. Ben-Zvi, K., Makar, & J. Garfield (Eds.), International handbook of research in statistics education (pp. 105–138). Springer.

Wild, C. J., & Pfannkuch, M. (1999). Statistical thinking in empirical enquiry. International Statistical Review, 67(3), 223–248. https://doi.org/10.1111/j.1751-5823.1999.tb00442.x

Downloads

Published

2022-07-04