MACHINE LEARNING FOR NON-MAJORS: A WHITE BOX APPROACH
DOI:
https://doi.org/10.52041/serj.v21i2.45Keywords:
Statistics education research, Data science education, Machine learningAbstract
Data science is a new field of research that has attracted growing interest in recent years as it focuses on turning raw data into understanding, insight, knowledge, and value. New data science education programs, which are being launched at an increasing rate, are designed for multiple education levels and populations. Machine learning (ML) is an essential element of data science that requires an extensive background in mathematics. Whereas it is possible to teach the principles of ML only as a black box, novice learners might find it difficult to improve an algorithm’s performance without a white box understanding of the underlying ML algorithms. In this paper, we suggest a pedagogical method, based on hands-on pen-and-paper tasks, to support white box understanding of ML algorithms for learners who lack the level of mathematics knowledge required for this purpose. Data were collected using a comprehension questionnaire and analyzed according to the process-object theory borrowed from mathematics education research. We present evidence of the effectiveness of this method based on data collected in an introduction-level data science course for graduate psychology students. This population had extensive psychology domain knowledge, as well as an established background in statistics, but had gaps in mathematical and computer science knowledge compared with data science majors. The research contribution is both practical and theoretical. Practically, we present a learning module that supports non-major data science students’ white box understanding of ML. Theoretically, we propose a data analysis method to evaluate students’ conceptions of ML algorithms.
References
Adams, J. C. (2020). Creating a balanced data science program. Proceedings of the 51st ACM Technical Symposium on Computer Science Education (pp. 185–191). https://doi.org/10.1145/3328778.3366800
Anderson, L. W., Bloom, B. S., & others. (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives. Longmans.
Anderson, P., Bowring, J., McCauley, R., Pothering, G., & Starr, C. (2014). An undergraduate degree in data science: Curriculum and a decade of implementation experience. Proceedings of the 45th ACM Technical Symposium on Computer Science Education - SIGCSE ’14 (pp. 145–150). https://doi.org/10.1145/2538862.2538936
Berman, F., Rutenbar, R., Hailpern, B., Christensen, H., Davidson, S., Estrin, D., Franklin, M., Martonosi, M., Raghavan, P., Stodden, V., & Szalay, A. S. (2018). Realizing the potential of data science. Communications of the ACM, 61(4), 67–72. https://doi.org/10.1145/3188721
Biehler, R., & Schulte, C. (2018). Paderborn symposium on data science education at school level 2017: The collected extended abstracts. Universitätsbibliothek.
Biggs, J. B., & Collis, K. F. (2014). Evaluating the quality of learning: The SOLO taxonomy (Structure of the Observed Learning Outcome). Academic Press.
Bloom, B. S., Engelhart, M. D., Furst, E. J., Hill, W. H., & Krathwohl, D. R. (1956). Taxonomy of educational objectives. Vol. 1: Cognitive domain. Longmans.
Bryant, C., Chen, Y., Chen, Z., Gilmour, J., Gumidyala, S., Herce-Hagiwara, B., Koures, A., Lee, S., Msekela, J., Pham, A. T., & others. (2019). A middle-school camp emphasizing data science and computing for social good. Proceedings of the 50th ACM Technical Symposium on Computer Science Education (pp. 358–364).
Buckley, J., Brown, M., Thomson, S., Olsen, W., & Carter, J. (2015). Embedding quantitative skills into the social science curriculum: Case studies from Manchester. International Journal of Social Research Methodology, 18(5), 495–510.
Carter, J., Brown, M., & Simpson, K. (2017). From the classroom to the workplace: How social science students are doing data analysis for real. Statistics Education Research Journal, 16(1), 80–101. https://doi.org/10.52041/serj.v16i1.218
Cassel, L. N., Dicheva, D., Dichev, C., Goelman, D., & Posner, M. (2016). Data science for all: An introductory course for non-majors; in flipped format (Abstract Only). Proceedings of the 47th ACM Technical Symposium on Computing Science Education (p. 691). https://doi.org/10.1145/2839509.2850558
Conway, D. (2010). The data science venn diagram. http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
Crooks, N. M., Bartel, A. N., & Alibali, M. W. (2019). Conceptual knowledge of confidence intervals in psychology undergraduate and graduate students. Statistics Education Research Journal, 18(1), 46–62. https://doi.org/10.52041/serj.v18i1.149
Danyluk, A., Leidig, P., Cassel, L., & Servin, C. (2019). ACM task force on data science education: Draft report and opportunity for feedback. Proceedings of the 50th ACM Technical Symposium on Computer Science Education (pp. 496–497). https://doi.org/10.1145/3287324.3287522
De Veaux, R. D., Agarwal, M., Averett, M., Baumer, B. S., Bray, A., Bressoud, T. C., Bryant, L., Cheng, L. Z., Francis, A., Gould, R., Kim, A. Y., Kretchmar, M., Lu, Q., Moskol, A., Nolan, D., Pelayo, R., Raleigh, S., Sethi, R. J., Sondjaja, M., … Ye, P. (2017). Curriculum guidelines for undergraduate programs in data science. Annual Review of Statistics and Its Application, 4(1), 15–30. https://doi.org/10.1146/annurev-statistics-060116-053930
Demchenko, Y., Belloum, A., Los, W., Wiktorski, T., Manieri, A., Brocks, H., Becker, J., Heutelbeck, D., Hemmje, M., & Brewer, S. (2016). EDISON data science framework: A foundation for building data science profession for research and industry. 2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom) (pp. 620–626). https://doi.org/10.1109/CloudCom.2016.0107
Dichev, C., & Dicheva, D. (2017). Towards data science literacy. Procedia Computer Science, 108, 2151–2160. https://doi.org/10.1016/j.procs.2017.05.240
Dryer, A., Walia, N., & Chattopadhyay, A. (2018). A middle-school Module for Introducing data-mining, big-data, ethics and privacy using RapidMiner and a Hollywood theme. Proceedings of the 49th ACM Technical Symposium on Computer Science Education (pp. 753–758). https://doi.org/10.1145/3159450.3159553
Elad, M. (2017). Deep, deep trouble: Deep learning’s impact on image processing, mathematics, and humanity. SIAM News, 50(4). https://sinews.siam.org/Details-Page/deep-deep-trouble-4
Fillebrown, S. (1994). Using projects in an elementary statistics course for non-science majors. Journal of Statistics Education, 2(2). https://doi.org/10.1080/10691898.1994.11910470
Fisher, N., Anand, A., Gould, R., Hesterberg, J. B. ans T., Bailey, J., Ng, R., Burr, W., Rosenberger, J., Fekete, A., Sheldon, N., Gibbs, A., & Wild, C. (2019, September). Curriculum frameworks for introductory data science.
http://www.idssp.org/files/IDSSP_Data_Science_Curriculum_Frameworks_for_Schools_Edition_1.0.pdf
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179–188. https://doi.org/10.1111/j.1469-1809.1936.tb02137.x
Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep learning (Vol. 1). MIT Press.
Gould, R., Suyen, M.-M., James, M., Terri, J., & LeeAnn, T. (2018). Mobilize: A data science curriculum for 16-year-old students. In M. A. Sorto, White, & L. Guyot, L. (Eds.), Looking back, looking forward. Proceedings of the 10th International Conference on the Teaching of Statistics (ICOTS10), Kyoto, Japan, July 8–14. International Statistics Institute.
Gray, E. M., & Tall, D. O. (1994). Duality, ambiguity, and flexibility: A “proceptual” view of simple arithmetic. Journal for Research in Mathematics Education, 25(2), 116–140. https://doi.org/10.5951/jresematheduc.25.2.0116
Hancock, S. A., & Rummerfield, W. (2020). Simulation methods for teaching sampling distributions: should hands-on activities precede the computer? Journal of Statistics Education, 28(1), 9–17. https://doi.org/10.1080/10691898.2020.1720551
Haqqi, S., Sooriamurthi, R., Macdonald, B., Begandy, C., Cameron, J., Pirollo, B., Becker, E., Choffo, J., Davis, C., Farrell, M., & others. (2018). Data jam: Introducing high school students to data science. Proceedings of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education (pp. 387–387).
Havill, J. (2019). Embracing the liberal arts in an interdisciplinary data analytics program. Proceedings of the 50th ACM Technical Symposium on Computer Science Education (pp. 9–14).
Hazan, B., Zhang, W., Olcum, E., Bergdoll, R., Grandoit, E., Mandelbaum, F., Wilson-Doenges, G., & Rabin, L. (2018). Gamification of an undergraduate psychology statistics lab: Benefits to perceived competence. Statistics Education Research Journal, 17(2), 255–265. https://doi.org/10.52041/serj.v17i2.169
Hazzan, O. (1999). Reducing abstraction level when learning abstract algebra concepts. Educational Studies in Mathematics, 40(1), 71–90.
Hazzan, O. (2003a). How students attempt to reduce abstraction in the learning of mathematics and in the learning of computer science. Computer Science Education, 13(2), 95–122.
Hazzan, O. (2003b). Reducing abstraction when learning computability theory. Journal of Computers in Mathematics and Science Teaching, 22(2), 95–117.
Hazzan, O., & Hadar, I. (2005). Reducing abstraction when learning graph theory. Journal of Computers in Mathematics and Science Teaching, 24(3), 255–272.
Hazzan, O., Ragonis, N., & Lapidot, T. (2020). Guide to teaching computer science: An activity-based approach. Springer.
Heinemann, B., Opel, S., Budde, L., Schulte, C., Frischemeier, D., Biehler, R., Podworny, S., & Wassong, T. (2018). Drafting a data science curriculum for secondary schools. Proceedings of the 18th Koli Calling International Conference on Computing Education Research - Koli Calling ’18 (pp. 1–5). https://doi.org/10.1145/3279720.3279737
Heyd-Metzuyanim, E., & Graven, M. (2019). Rituals and explorations in mathematical teaching and learning: Introduction to the special issue. Educational Studies in Mathematics, 101(2), 141–151. https://doi.org/10.1007/s10649-019-09890-x
Immekus, J. C. (2019). Flipping statistics courses in graduate education: Integration of cognitive psychology and technology. Journal of Statistics Education, 27(2), 79–89.
Khuri, S., VanHoven, M., & Khuri, N. (2017). Increasing the Capacity of STEM Workforce: Minor in bioinformatics. Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education (pp. 315–320). https://doi.org/10.1145/3017680.3017721
Kolaczyk, E. D., Wright, H., & Yajima, M. (2021). Statistics practicum: Placing 'practice' at the center of data science education. Harvard Data Science Review. https://hdsr.mitpress.mit.edu/pub/twyc748y/release/4
Lavie, I., Steiner, A., & Sfard, A. (2019). Routines we live by: From ritual to exploration. Educational Studies in Mathematics, 101(2), 153–176. https://doi.org/10.1007/s10649-018-9817-4
Leron, U., & Dubinsky, E. (1995). An abstract algebra story. The American Mathematical Monthly, 102(3), 227–242. https://doi.org/10.1080/00029890.1995.11990563
Mike, K., Hartal, G., & Hazzan, O. (2021). Widening the shrinking pipeline: The case of data science. 2021 IEEE Global Engineering Education Conference (EDUCON) (pp. 252–261).
Mike, K., & Hazzan, O. (2022). Interdisciplinary CS1 for non-majors: The case of graduate psychology students. 2022 IEEE Global Engineering Education Conference (EDUCON) (pp. 86–93), https://doi.org/10.1109/EDUCON52537.2022.9766516
Moore, D. S. (1997). New pedagogy and new content: The case of statistics. International Statistical Review, 65(2), 123–137.
Neumann, D. L., Hood, M., & Neumann, M. M. (2013). Using real-life data when teaching statistics: Student perceptions of this strategy in an introductory statistics course. Statistics Education Research Journal, 12(2), 59–70. https://doi.org/10.52041/serj.v12i2.304
Páez, A. (2019). The pragmatic turn in explainable artificial intelligence (XAI). Minds and Machines, 29(3), 441–459.
Pfaff, T. J., & Weinberg, A. (2009). Do hands-on activities increase student understanding? A case study. Journal of Statistics Education, 17(3). https://doi.org/10.1080/10691898.2009.11889536
Prodromou, T., & Dunne, T. (2017). Statistical literacy in data revolution era: Building blocks and instructional dilemmas. Statistics Education Research Journal, 16(1), 38–43. https://doi.org/10.52041/serj.v16i1.212
Rabin, L., Fink, L., Krishnan, A., Fogel, J., Berman, L., & Bergdoll, R. (2018). A measure of basic math skills for use with undergraduate statistics students: The MACS. Statistics Education Research Journal, 17(2), 179–195. https://doi.org/10.52041/serj.v17i2.165
Raj, R. K., Parrish, A., Impagliazzo, J., Romanowski, C. J., Ahmed, S. A., Bennett, C. C., Davis, K. C., McGettrick, A., Pereira, T. S. M., & Sundin, L. (2019). Data science education: Global perspectives and convergence. Proceedings of the 2019 ACM Conference on Innovation and Technology in Computer Science Education, pp. 265–266. https://doi.org/10.1145/3304221.3325533
Rosenberg-Kima, R. B., & Mike, K. (2020). Teaching online teaching: Using the task-centered instructional design strategy for online computer science teachers’ preparation. In Teaching, Technology, and Teacher Education during the COVID-19 Pandemic: Stories from the Field (pp. 119–123). Association for the Advancement of Computing in Education.
Sfard, A. (1991). On the dual nature of mathematical conceptions: Reflections on processes and objects as different sides of the same coin. Educational Studies in Mathematics, 22(1), 1–36. https://doi.org/10.1007/BF00302715
Sfard, A., & Lavie, I. (2005). Why cannot children see as the same what grown-ups cannot see as different? Early numerical thinking revisited. Cognition and Instruction, 23(2), 237–309. https://doi.org/10.1207/s1532690xci2302_3
Skiena, S. S. (2017). The data science design manual. Springer.
Srikant, S., & Aggarwal, V. (2017). Introducing data science to school kids. Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education (pp. 561–566). https://doi.org/10.1145/3017680.3017717
Sulmont, E., Patitsas, E., & Cooperstock, J. R. (2019a). Can You Teach Me to Machine Learn? Proceedings of the 50th ACM Technical Symposium on Computer Science Education (pp. 948–954). https://doi.org/10.1145/3287324.3287392
Sulmont, E., Patitsas, E., & Cooperstock, J. R. (2019b). What is hard about teaching machine learning to non-majors? Insights from classifying instructors’ learning goals. ACM Transactions on Computing Education, 19(4), 1–16. https://doi.org/10.1145/3336124
Tartaro, A., & Chosed, R. J. (2015). Computer scientists at the biology lab bench. Proceedings of the 46th ACM Technical Symposium on Computer Science Education (pp. 120–125). https://doi.org/10.1145/2676723.2677246
Wiberg, M. (2009). Teaching statistics in integration with psychology. Journal of Statistics Education, 17(1). https://doi.org/10.1080/10691898.2009.11889509
Wickham, H., & Grolemund, G. (2016). R for data science: Import, tidy, transform, visualize, and model data. O’Reilly Media.