INVESTIGATING DATA LIKE A DATA SCIENTIST: KEY PRACTICES AND PROCESSES
DOI:
https://doi.org/10.52041/serj.v21i2.41Keywords:
Statistics education research, Data science education, Industry ethnography, Data investigation framework, Literature reviewAbstract
With a call for schools to infuse data across the curriculum, many are creating curricula and examining students’ thinking in data-intensive problems. As the discipline of statistics education broadens to data science education, there is a need to examine how practices in data science can inform work in K-12. We synthesize literature about statistics investigation processes, data science as a field and practices of data scientists. Further, we provide results from an ethnographic and interview study of the work of data scientists. Together, these inform a new framework to support data investigation processes. We explicate the practices and dispositions needed and offer a glimpse of how the framework can be used to move the discipline of data science education forward.
References
Agarwal, S. (2018, February 9). Understanding the data science lifecycle. Sudeep.co. https://www.sudeep.co/data-science/2018/02/09/Understanding-the-Data-Science-Lifecycle.html
Barber, M. (2018, January 14). Data science concepts you need to know! Part 1. Towards Data Science. https://towardsdatascience.com/introduction-to-statistics-e9d72d818745
Bargagliotti, A., Binder, W., Blakesley, L., Eusufzai, Z., Fitzpatrick, B., Ford, M., Huchting, K., Larson, S., Rovetti, R., Seal, K., & Zachariah, T. (2020a). Undergraduate learning outcomes for achieving data acumen. Journal of Statistics Education. https://doi.org/10.1080/10691898.2020.1776653
Bargagliotti, A., Franklin, C., Arnold, P., Gould, R., Johnson, S., Perez, L., & Spangler, D. (2020b). Pre-K–12 Guidelines for assessment and instruction in statistics education II (GAISE II). American Statistical Association and National Council of Teachers of Mathematics. https://www.amstat.org/asa/files/pdfs/GAISE/GAISEIIPreK-12_Full.pdf
Ben-Zvi, D., & Ben-Arush, T. (2014). EDA instrumented learning with TinkerPlots. In T. Wassong, D. Frischemeier, P. R. Fischer, R. Hochmuth, & P. Bender (Eds.), Using tools for learning mathematics and statistics (pp. 193–208). Springer Spektrum, Wiesbden. https://doi.org/10.1007/978-3-658-03104-6_15
Ben-Zvi, D., Gravemeijer, K., & Ainley, J. (2018). Design of statistics learning environments. In D. Ben-Zvi, K. Makar, & J. Garfield (Eds.), International handbook of research in statistics education (pp. 473–502). Springer. https://doi.org/10.1007/978-3-319-66195-7_16
Boaler, J., & Levitt, S. (2019, October 23). Modern high school math should be about data science: Not Algebra 2. Los Angeles Times. https://www.latimes.com/opinion/story/2019-10-23/math-high-school-algebra-data-statistics
Breiman, L. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical Science: A Review Journal of the Institute of Mathematical Statistics, 16(3), 199–231.
Cao, L. (2017). Data science: A comprehensive overview. ACM Computing Surveys, 50(3), 1–42. https://doi.org/10.1145/3076253
Carmichael, I., & Marron, J. S. (2018). Data science vs. statistics: Two cultures?. Japanese Journal of Statistics and Data Science, 1(1), 117–138. https://doi.org/10.1007/s42081-018-0009-3
Creswell, J. W. (2013). Qualitative inquiry & research design: Choosing among the five approaches. SAGE Publications.
Cleveland, W. S. (2001). Data science: An action plan for expanding the technical areas of the field of statistics. International Statistical Review, 69(1), 21–26. https://doi.org/10.1111/j.1751-5823.2001.tb00477.x
Conway, D. (2010, September 30). The data science venn diagram. Drew Conway Data Consulting. http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
D’Ignazio, C., & Klein, L. F. (2020) Data feminism. MIT Press.
Donoho, D. (2017). 50 years of data science. Journal of Computational and Graphical Statistics, 26(4), 745–766. https://doi.org/10.1080/10618600.2017.1384734
Education Development Center. (2014). Profile of a big-data-enabled specialist. http://oceansofdata.org/our-work/profile-big-data-enabled-specialist.
Education Development Center. (2015). Call for action to promote data literacy. EDC Oceans of Data Institute. http://oceansofdata.org/call-action-promote-data-literacy
Education Development Center. (2016). Profile of a data practitioner. http://oceansofdata.org/our-work/profile-data-practitioner
Engel, J. (2017). Statistical literacy for active citizenship: A call for data science education. Statistics Education Research Journal, 16(1), 44–49. https://doi.org/10.52041/serj.v16i1.213
Finzer, W. (2013). The data science education dilemma. Technology Innovations in Statistics Education, 7(2). https://doi.org/10.5070/T572013891
Franklin, C., Kader, G., Mewborn, D., Moreno, J., Peck, R., Perry, M., & Scheaffer, R. (2007). Guidelines for assessment and instruction in statistics education (GAISE) Report: A Pre-K–12 curriculum framework. American Statistical Association. https://www.amstat.org/asa/files/pdfs/GAISE/GAISEPreK-12_Full.pdf
Friel, S., O’Connor, W., & Mamer, J. (2006). More than “meanmedianmode” and a bar graph: What’s needed to have a statistical conversation? In G. Burrill and P. Elliott (Eds.), Thinking and reasoning with data and chance: Sixty-eighth Yearbook (pp. 117–137). National Council of Teachers of Mathematics.
Geringer, S. (2014, January 6). Data science venn diagram v2.0. Steve’s Machine Learning Blog. http://www.anlytcs.com/2014/01/data-science-venn-diagram-v20.html
Goldstein, A. (2017, January 14). Deconstructing data science: Breaking the complex craft into it's simplest parts. Mission.org. https://medium.com/the-mission/deconstructing-data-science-breaking-the-complex-craft-into-its-simplest-parts-15b15420df21
Gould, R., Machado, S., Ong, C., Johnson, T., Molyneux, J., Nolen, S., Tangmunarunkit, H., Trusela, L., & Zanontian, L. (2016). Teaching data science to secondary students: The mobilize introduction to data science curriculum. In J. Engel (Ed.), Promoting understanding of statistics about society. Proceedings of the Roundtable Conference of the International Association of Statistics Education (IASE), Berlin, Germany. https://iase-web.org/documents/papers/rt2016/Gould.pdf
Gould R., Wild C. J., Baglin J., McNamara A., Ridgway J., & McConway K. (2018). Revolutions in teaching and learning statistics: A collection of reflections. In Ben-Zvi D., Makar K., Garfield J. (Eds), International handbook of research in statistics education (pp. 457–472). Springer. https://doi.org/10.1007/978-3-319-66195-7_15
Graham, A. T. (1987). Statistical investigations in the secondary school. Cambridge University Press.
Grimshaw, S. D. (2015). A framework for infusing authentic data experiences within statistics courses. The American Statistician, 69(4), 307–314. http://dx.doi.org/10.1080/00031305.2015.1081106
International Data Science in Schools Project Curriculum Team. (2019). Curriculum frameworks for Introductory Data Science. http://idssp.org/files/IDSSP_Frameworks_1.0.pdf.
Kahn, J., & Jiang, S. (2021). Learning with large, complex data and visualizations: Youth data wrangling in modeling family migration. Learning, Media and Technology, 46(2), 128–143. https://doi.org/10.1080/17439884.2020.1826962
Kolassa, S. (2014, November, 5). The data scientist venn diagram [Comment on the blog post “Data Science without knowledge of a specific topic, is it worth pursuing as a career?”]. Stack Exchange. https://datascience.stackexchange.com/questions/2403/data-science-without-knowledge-of-a-specific-topic-is-it-worth-pursuing-as-a-ca
Lee, H. S. , & Tran, D. (2015). Framework for supporting students’ approaches to statistical investigations: A guiding framework for the Teaching Statistics through Data Investigations. In Teaching Statistics Through Data Investigation MOOC-Ed. Friday Institute for Educational Innovation, NC State University. https://s3.amazonaws.com/fi-courses/tsdi/unit_3/SASI%20Framework.pdf
Lee, H. S., & Harrison, T. R. (2021). Trends in teaching Advanced Placement Statistics: Results from a national survey. Journal of Statistics and Data Science Education, 29(3), 317–327. https://doi.org/10.1080/26939169.2021.1965509
Lee, O., & Campbell, T. (2020). What science and STEM teachers can learn from COVID-19: Harnessing data science and computer science through the convergence of multiple STEM subjects. Journal of Science Teacher Education, 31(8), 932–944. https://doi.org/10.1080/1046560X.2020.1814980
Lee, V. R., & Wilkerson, M. (2018). Data use by middle and secondary students in the digital age: A status report and future prospects. Commissioned Paper for the National Academies of Sciences, Engineering, and Medicine, Board on Science Education, Committee on Science Investigations and Engineering Design for Grades 6–12. https://digitalcommons.usu.edu/itls_facpub/634/
Lesser, L. M. (2007). Critical values and transforming data: Teaching statistics with social justice. Journal of Statistics Education, 15(1). https://doi.org/10.1080/10691898.2007.11889454
Lovett, J. N., & Lee, H. S. (2018). Preservice secondary mathematics teachers’ statistical knowledge: A snapshot of strengths and weaknesses. Journal of Statistics Education, 26(3), 214–222. https://doi.org/10.1080/10691898.2018.1496806
MacGillivray, H., & Pereira-Mendoza, L. (2011). Teaching statistical thinking through investigative projects. In C. Batanero, G. Burrill, and C. Reading (Eds.), Teaching statistics in school mathematics-challenges for teaching and teacher education (pp. 109–120). Springer. https://doi.org/10.1007/978-94-007-1131-0_14
Moore, D. S. (1997). New pedagogy and new content: The case of statistics. International Statistics Review, 65(2), 123–165. https://doi.org/10.2307/1403333
National Academies of Sciences, Engineering, and Medicine. (2018). Data science for undergraduates: Opportunities and options. The National Academies Press. https://doi.org/10.17226/25104
National Council of Teachers of Mathematics. (2000). Principles and standards for school mathematics. Author.
Next Generation Science Standards Lead States. (2013). Next Generation Science Standards: For states, by states. National Academies Press. https://www.nextgenscience.org/standards/standards
National Governors Association Center for Best Practice & Council of Chief State School Officers. (2010). Common core state standards for mathematics. http://www.corestandards.org/Math/
National Research Council. (2000). How people learn: Brain, mind, experience, and school: Expanded edition. The National Academies Press. https://doi.org/10.17226/9853
Pangrazio, L., & Selwyn, N. (2021). Towards a school-based “critical data education.” Pedagogy, Culture & Society 29(3), 431–44. https://doi.org/10.1080/14681366.2020.1747527
Rosenberg, J., Edwards, A., & Chen, B. (2020). Getting messy with data. The Science Teacher, 87(5), 30–34. https://www.nsta.org/science-teacher/science-teacher-january-2020/getting-messy-data
Rubin, A. (2020). Learning to reason with data: How did we get here and what do we know?. Journal of the Learning Sciences, 29(1), 154–164. https://doi.org/10.1080/10508406.2019.1705665
Saltz, J. S. (2020, May 29). CRISP-DM for data science teams: 5 actions to consider. Data Science Process Alliance. https://www.datascience-pm.com/crisp-dm-for-data-science-teams-5-actions-to-consider
Saltz, J. S., Shamshurin, I., & Connors, C. (2017). A framework for describing big data projects. In W. Abramowicz, R. Alt, & B. Franczyk (Eds), Business Information Systems Workshops. BIS 2016. Lecture Notes in Business Information Processing, Vol 263. Springer. https://doi.org/10.1007/978-3-319-52464-1_17
Saltz, J. S., & Hotz, N. (2020). Identifying the most common frameworks data science teams use to structure and coordinate their projects. Proceedings of the 2020 IEEE International Conference on Big Data (pp. 2038–2042). https://doi.org/10.1109/BigData50022.2020.9377813
Tierney, B. (2012, June 13). Data science is multidisciplinary. Oralytics. https://oralytics.com/2012/06/13/data-science-is-multidisciplinary/
Tukey, J. (1977). Exploratory data analysis. Addison-Wesley.
Watson, J., Fitzallen, N., Fielding-Wells, J., & Madden, S. (2018). The practice of statistics. In D. Ben-Zvi, K., Makar, & J. Garfield (Eds.), International handbook of research in statistics education (pp. 105–138). Springer.
Wild, C. J., & Pfannkuch, M. (1999). Statistical thinking in empirical enquiry. International Statistical Review, 67(3), 223–248. https://doi.org/10.1111/j.1751-5823.1999.tb00442.x