Data cleaning doesn’t happen in a vacuum: An initial exploration of high school statistics teachers’ data practices with messy data
DOI:
https://doi.org/10.52041/iase24.301Abstract
Cleaning data is an important facet of statistical practice. The research literature on examining data practices of learners when dealing with messy data that needs cleaning, however, is scarce. As part of a larger study, six Grade 12 high school statistics teachers engaged with a height estimation task, for which the data were drawn from a publicly available website containing 39,195 rows of text entries in a variety of measurement systems. The teachers’ observed data practices were characterised as inspecting, ideating, sorting, sampling, converting, visualising, creating, and describing. The implications of the findings with regard to statistical enquiry pathways are discussed.References
Bakker, A., & van Eerde, D. (2014). An introduction to design-based research with an example from statistics education. In A. Bikner-Ahsbahs, C. Knipping, & N. Presmeg (Eds.), Approaches to qualitative research in mathematics education (pp. 429-466). Springer. https://doi.org/10.1007/978- 94-017-9181-6_16
Barker, H., & Elrod, E. (2023). An analysis of K-8 pre-service teachers as data storytellers. In: E.M. Jones (Ed.), Fostering Learning of Statistics and Data Science Proceedings of the Satellite conference of the International Association for Statistical Education (IASE), International Association for Statistics Education.
Broman, K. W., & Woo, K. H. (2018). Data organization in spreadsheets. The American Statistician, 72(1), 2-10. https://doi.org/10.1080/00031305.2017.1375989
Chai, C. P. (2020). The importance of data cleaning: Three visualization examples. Chance, 33(1), 4-9. https://chance.amstat.org/2020/02/data-cleaning/
Creswell, J. W. (2012). Educational research: Planning, conducting, and evaluating quantitative and qualitative research (4th ed.). Pearson.
Cummiskey, K., Kuiper, S., & Sturdivant, R. (2012). Using classroom data to teach students about data cleaning and testing assumptions. Frontiers in Psychology, 3. https://doi.org/10.3389/fpsyg.2012.00354
D'Ignazio, C. (2017). Creative data literacy: Bridging the gap between the data-haves and data-have nots. Information Design Journal, 23(1), 6-18. https://doi.org/10.1075/idj.23.1.03dig
Dvir, M., & Ben‐Zvi, D. (2022). Students' actual purposes when engaging with a computerized simulation in the context of citizen science. British Journal of Educational Technology, 53(5), 1202- 1220. https://doi.org/10.1111/bjet.13238
Engel, J. (2017). Statistical literacy for active citizenship: A call for data science education. Statistics Education Research Journal, 16(1), 44-49. https://doi.org/10.52041/serj.v16i1.213
Erickson, T., Wilkerson, M., Finzer, W., & Reichsman, F. (2019). Data moves. Technology Innovations in Statistics Education, 12(1). https://doi.org/10.5070/T5121038001
Fergusson, A. (2022). Towards an integration of statistical and computational thinking: Development of a task design framework for introducing code-driven tools through statistical modelling. PhD Thesis, University of Auckland. https://hdl.handle.net/2292/64664
Fergusson, A., & Pfannkuch, M. (2022). Introducing teachers who use GUI-driven tools for the randomization test to code-driven tools. Mathematical Thinking and Learning, 24(4), 336-356. https://doi.org/10.1080/10986065.2021.1922856
Finzer, W., & Reichsman, F. (2018). Exploring the essential elements of data science education. https://concord.org/newsletter/2018-fall/exploring-the-essential-elements-of-data-science- education/
Fry, K., & Makar, K. (2021). How could we teach data science in primary school? Teaching Statistics, 43(S1), S173-S181. https://doi.org/10.1111/test.12259
Gafny, R., & Ben‐Zvi, D. (2023). Students' articulations of uncertainty about big data in an integrated modeling approach learning environment. Teaching Statistics, 45, S67-S79. https://doi.org/10.1111/test.12330
Gould, R. (2021). Toward data-scientific thinking. Teaching Statistics, 43, S11–S22. https://doi.org/10.1111/test.12267
Gould, R., Bargagliotti, A., & Johnson, T. (2017). An analysis of secondary teachers’ reasoning with participatory sensing data. Statistics Education Research Journal, 16(2), 305-334. https://doi.org/10.52041/serj.v16i2.194
Gould, R., Sunbury, S., & Dussault, M. (2014). In praise of messy data. The Science Teacher, 81(8), 31. https://www.proquest.com/scholarly-journals/praise-messy-data/docview/1627727600/se-2
Hammett, A., & Dorsey, C. (2020). Messy data, real science. The Science Teacher, 87(8), 40-48. https://www.jstor.org/stable/27048170
Hardin, J. (2018). Dynamic data in the statistics classroom. Technology Innovations in Statistics Education, 11(1). https://doi.org/10.5070/T5111031079
Hardin, J., Hoerl, R., Horton, N. J., Nolan, D., Baumer, B., Hall-Holt, O., Murrell, P., Peng, R., Roback, P., Temple Lang, D. & Ward, M. (2015). Data science in statistics curricula: Preparing students to “think with data”. The American Statistician, 69(4), 343-353. https://doi.org/10.1080/00031305.2015.1077729
Holcomb, J., & Spalsbury, A. (2005). Teaching students to use summary statistics and graphics to clean and analyze data. Journal of Statistics Education, 13(3). https://doi.org/10.1080/10691898.2005.11910567
Horton, N. J., Chao, J., Palmer, P., & Finzer, W. (2023). How learners produce data from text in classifying clickbait. Teaching Statistics, 45, S93-S103. https://doi.org/10.1111/test.12339
Kjelvik, M. K., & Schultheis, E. H. (2019). Getting messy with authentic data: Exploring the potential of using data from scientific research to support student data literacy. CBE—Life Sciences Education, 18(2), 1–8. https://www.lifescied.org/doi/10.1187/cbe.18-02-0023
Konold, C., Finzer, W., & Kreetong, K. (2017). Modeling as a core component of structuring data. Statistics Education Research Journal, 16(2), 191-212. https://doi.org/10.52041/serj.v16i2.190
Lee, H., Mojica, G., Thrasher, E., & Baumgartner, P. (2022). Investigating data like a data scientist: Key practices and processes. Statistics Education Research Journal, 21(2). https://doi.org/10.52041/serj.v21i2.41
Legacy, C., Zieffler, A., Fry, E. B., & Le, L. (2022). COMPUTES: Development of an instrument to measure introductory statistics instructors’ emphasis on computational practices. Statistics Education Research Journal, 21(1). https://doi.org/10.52041/serj.v21i1.63
Lohr, S. (2014, August 18). For Big-Data Scientists, “Janitor Work” Is Key Hurdle to Insights. New York Times.
McKenney, S., & Reeves, T. C. (2018). Conducting educational design research. Routledge. https://doi.org/10.4324/9781315105642
Ministry of Education. (2007). The New Zealand Curriculum. Learning Media.
Musyoka, J., Lunalo, J., Garlick, C., Ndung'u, S., Stern, D., Parsons, D., & Stern, R. (2017). Embedding Data Manipulation in Statistics Education. In: A. Molnar (Ed.), Teaching Statistics in a Data Rich World Proceedings of the Satellite conference of the International Association for Statistical Education (IASE), International Association for Statistics Education.
Nolan, D., & Temple Lang, D. (2010). Computing in the statistics curricula. The American Statistician, 64(2), 97-107. https://doi.org/10.1198/tast.2010.09132
Perez, L. & Lionberger, K. (2023). Opening the door to data science in STEM classrooms: How can we help all students navigate our data-rich world? https://ngs.wested.org/doortodatascience/
Rosenberg, J., Edwards, A., & Chen, B. (2020). Getting messy with data. The Science Teacher, 87(5), 30-35. https://www.jstor.org/stable/27048120
Rosenberg, J. M., Schultheis, E. H., Kjelvik, M. K., Reedy, A., & Sultana, O. (2022). Big data, big changes? The technologies and sources of data used in science classrooms. British Journal of Educational Technology, 53(5), 1179-1201. https://doi.org/10.1111/bjet.13245
Thoma, S., Deitrick, E., & Wilkerson, M. (2018). “It didn’t really go very well”: Epistemological framing and the complexity of interdisciplinary computing activities. In J. Kay & R. Luckin (Eds.), Rethinking learning in digital age: Making the learning sciences count. Proceedings of the 13th International Conference of the Learning Sciences (ICLS), London, UK, (Vol. 2, pp. 1121–1124). International Society of the Learning Sciences.
Yue, K. -B. (2012). A realistic data cleansing and preparation project. Journal of Information Systems Education, 23(2), 205-216.
Wickham H. (2104). Tidy Data. Journal of Statistical Software. 59(1), 1–23. https://doi.org/10.18637/jss.v059.i10
Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for data science. O'Reilly Media, Inc.
Wild, C. J., & Pfannkuch, M. (1999). Statistical thinking in empirical enquiry. International Statistical Review, 67(3), 223–248. https://doi.org/10.1111/j.1751-5823.1999.tb00442.x
Wilson, G., Bryan, J., Cranston, K., Kitzes, J., Nederbragt, L., & Teal, T. K. (2017). Good enough practices in scientific computing. PLoS Computational Biology, 13(6), e1005510. https://doi.org/10.1371/journal.pcbi.1005510