The 5Ws AND 1H OF TERM PROJECTS IN THE INTRODUCTORY DATA SCIENCE CLASSROOM
DOI:
https://doi.org/10.52041/serj.v21i2.37Keywords:
Statistics education research, Data science, Teaching statistics, Statistics curriculum, r languageAbstract
Many data science applications involve generating questions, acquiring data and preparing it for analysis—be it exploratory, inferential, or modeling focused—and communicating findings. Most data science curricula address each of these steps as separate units in a course or as separate courses. Open-ended term projects, on the other hand, allow students to put each of these steps into practice, sequentially and iteratively. In this paper we discuss what we mean by data science projects, why they are crucial in introductory data science courses, who works on these projects and how, when in the term they can be implemented, and where they can be shared.
References
Allaire, J., Xie, Y., McPherson, J., Luraschi, J., Ushey, K., Atkins, A., … Iannone, R. (2021). RMarkdown: Dynamic documents for R. https://github.com/rstudio/rmarkdown
Bailey, B., Spence, D. J., & Sinn, R. (2013). Implementation of discovery projects in statistics. Journal of Statistics Education, 21(3), Article 1. https://doi.org/10.1080/10691898.2013.11889682
Bell, S. (2010). Project-based learning for the 21st century: Skills for the future. The Clearing House: A Journal of Educational Strategies, Issues and Ideas, 83(2), 39–43. https://doi.org/10.1080/00098650903505415
Bryan, J., STAT 545 TAs, & Hester, J. (2020). Happy Git and GitHub for the useR. https://happygitwithr.com
Chang, W., Cheng, J., Allaire, J., Xie, Y., & McPherson, J. (2020). Shiny: Web application framework for R. https://CRAN.R-project.org/package=shiny
Cobb, G. (2015). Mere renovation is too little too late: We need to rethink our undergraduate curriculum from the ground up. The American Statistician, 69(4), 266–282. https://doi.org/10.1080/00031305.2015.1093029
Çetinkaya-Rundel, M. (2020, February). Shiny Contest 2020 is here! RStudio. https://blog.rstudio.com/2020/02/12/shiny-contest-2020-is-here
Çetinkaya-Rundel, M., & Ellison, V. (2021). A fresh look at introductory data science. Journal of Statistics and Data Science Education, 29(S1), S16–S26. https://doi.org/10.1080/10691898.2020.1804497
De Veaux, R. D., Agarwal, M., Averett, M., Baumer, B. S., Bray, A., Bressoud, T. C., … Ye, P. (2017). Curriculum guidelines for undergraduate programs in data science. Annual Review of Statistics and Its Application, 4(1), 15–30. https://doi.org/10.1146/annurev-statistics-060116-053930
Dogucu, M., & Çetinkaya-Rundel, M. (2021). Web scraping in the statistics and data science curriculum: Challenges and opportunities. Journal of Statistics and Data Science Education, 29(S1), S112–S122. https://doi.org/10.1080/10691898.2020.1787116
Fiksel, J., Jager, L. R., Hardin, J. S., & Taub, M. A. (2019). Using GitHub classroom to teach statistics. Journal of Statistics Education, 27(2), 110–119. https://doi.org/10.1080/10691898.2019.1617089
GAISE College Report ASA Revision Committee. (2016). Guidelines for assessment and instruction in statistics education (GAISE): College report 2016. https://www.amstat.org/docs/default-source/amstat-documents/gaisecollege_full.pdf
Geier, R., Blumenfeld, P. C., Marx, R. W., Krajcik, J. S., Fishman, B., Soloway, E., & Clay-Chambers, J. (2008). Standardized test outcomes for students engaged in inquiry-based science curricula in the context of urban reform. Journal of Research in Science Teaching, 45(8), 922–939. https://doi.org/10.1002/tea.20248
GitHub. (2021a). Mastering issues - GitHub guides. https://guides.github.com/features/issues
GitHub. (2021b). GitHub pages. https://docs.github.com/en/github/working-with-github-pages
Gould, R., & Çetinkaya-Rundel, M. (2013). Teaching statistical thinking in the data deluge (pp. 377–391). Springer Fachmedien Wiesbaden. https://doi.org/10.1007/978-3-658-03104-6_27
Lazar, N. A., Reeves, J., & Franklin, C. (2011). A capstone course for undergraduate statistics majors. The American Statistician, 65(3), 183–189. https://doi.org/10.1198/tast.2011.10240
Lu, R., & Bol, L. (2007). A comparison of anonymous versus identifiable e-peer review on college student writing performance and the extent of critical feedback. Journal of Interactive Online Learning, 6(2). https://digitalcommons.odu.edu/cgi/viewcontent.cgi?article=1002&context=efl_fac_pubs
Michaelsen, L., & Sweet, M. (2004). Team-based learning. Sterling. https://digitalcommons.georgiasouthern.edu/ct2-library/199
R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org
RStudio Team. (2021). RStudio: Integrated development environment for R. RStudio, PBC. http://www.rstudio.com
Rundel, C., & Çetinkaya-Rundel, M. (2021). ghclass: Tools for managing classes on GitHub. https://rundel.github.io/ghclass-dev/articles/articles/peer.html
Smucker, B. J., & Bailer, A. J. (2015). Beyond normal: Preparing undergraduates for the work force in a statistical consulting capstone. The American Statistician, 69(4), 300–306. https://doi.org/10.1080/00031305.2015.1077731
Spurrier, J. D. (2001). A capstone course for undergraduate statistics majors. Journal of Statistics Education, 9(1). https://doi.org/10.1080/10691898.2001.11910643
USCLAP. (2021). USCLAP Competition. https://www.causeweb.org/usproc/usclap
Vance, E. (2021). Using team-based learning to teach data science. Journal of Statistics and Data Science Education. https://doi.org/10.1080/26939169.2021.1971587
White, D. (2019). A project-based approach to statistics and data science. PRIMUS, 29(9), 997–1038. https://doi.org/10.1080/10511970.2018.1488781
Wild, C. J., Pfannkuch, M., Regan, M., & Horton, N. J. (2011). Towards more accessible conceptions of statistical inference. Journal of the Royal Statistical Society: Series A (Statistics in Society), 174(2), 247–295. https://doi.org/10.1111/j.1467-985X.2010.00678.x