Toward Automated Classroom Observation: Multimodal Machine Learning to Estimate CLASS Positive Climate and Negative Climate

Ramakrishnan, Anand; Zylich, Brian; Ottmar, Erin; LoCasale-Crouch, Jennifer; Whitehill, Jacob

doi:10.1109/TAFFC.2021.3059209

Computer Science > Computer Vision and Pattern Recognition

arXiv:2005.09525 (cs)

This paper has been withdrawn by Jacob Whitehill

[Submitted on 19 May 2020 (v1), last revised 23 Jul 2021 (this version, v3)]

Title:Toward Automated Classroom Observation: Multimodal Machine Learning to Estimate CLASS Positive Climate and Negative Climate

Authors:Anand Ramakrishnan, Brian Zylich, Erin Ottmar, Jennifer LoCasale-Crouch, Jacob Whitehill

No PDF available, click to view other formats

Abstract:In this work we present a multi-modal machine learning-based system, which we call ACORN, to analyze videos of school classrooms for the Positive Climate (PC) and Negative Climate (NC) dimensions of the CLASS observation protocol that is widely used in educational research. ACORN uses convolutional neural networks to analyze spectral audio features, the faces of teachers and students, and the pixels of each image frame, and then integrates this information over time using Temporal Convolutional Networks. The audiovisual ACORN's PC and NC predictions have Pearson correlations of $0.55$ and $0.63$ with ground-truth scores provided by expert CLASS coders on the UVA Toddler dataset (cross-validation on $n=300$ 15-min video segments), and a purely auditory ACORN predicts PC and NC with correlations of $0.36$ and $0.41$ on the MET dataset (test set of $n=2000$ videos segments). These numbers are similar to inter-coder reliability of human coders. Finally, using Graph Convolutional Networks we make early strides (AUC=$0.70$) toward predicting the specific moments (45-90sec clips) when the PC is particularly weak/strong. Our findings inform the design of automatic classroom observation and also more general video activity recognition and summary recognition systems.

Comments:	The authors discovered that the results are not reproducible
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2005.09525 [cs.CV]
	(or arXiv:2005.09525v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2005.09525
Journal reference:	IEEE Transactions on Affective Computing, 2021
Related DOI:	https://doi.org/10.1109/TAFFC.2021.3059209

Submission history

From: Jacob Whitehill [view email]
[v1] Tue, 19 May 2020 15:36:32 UTC (7,800 KB)
[v2] Wed, 10 Feb 2021 23:02:07 UTC (17,785 KB)
[v3] Fri, 23 Jul 2021 15:24:49 UTC (1 KB) (withdrawn)

Computer Science > Computer Vision and Pattern Recognition

Title:Toward Automated Classroom Observation: Multimodal Machine Learning to Estimate CLASS Positive Climate and Negative Climate

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Toward Automated Classroom Observation: Multimodal Machine Learning to Estimate CLASS Positive Climate and Negative Climate

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators