“Developing Strategies and Technology for Generation and Analysis of Longitudinal High Frequency Data Streams from Faculty and Students”.
(This project has been presented
to the NSF in the form of a proposal for The NSF 2026 Idea Machine)
Synopsis
Synopsis
Avery developed science is the result of the
scrupulous analysis of a vast amount of data.
To advance science of education, researchers
need (a) to establish reliable sources of data, which could later be (b)
analyzed.
This project is to establish a reliable
procedure or procedures for developing a large amount of reliable data using
techniques developed in other field (physics) for high frequency “datamining”.
The proposal has been developed by Prof. Plamen
Ivanov and I more than a year ago. Prof. Plamen Ivanov is the director of the Keck Laboratory for Network
Physiology where he applies various data mining methods (and more) to study
correlations within the physiological networks of a human body. We believe that
similar methods can and need to be applied to study various aspects of learning
and teaching practices of individuals and the groups of individuals. However, despite
our best effort, so far, we could not find any interests in funding our
research, neither inside our institutions, nor from the NSF. This fact
represents another confirmation that currently the NSF has more interest in funding social-oriented
projects in education (making improvements “here and now”) than supporting fundamental
research (no one can predict if, where, and how that research would bring fruitful
social or economic results).
However, we hope the NSF may soon shift its
approach to funding fundamental research in the field of education; the
indication of this possibility is the NSF’s call for “big
ideas”.
We have submitted our proposal for The NSF 2026
Idea Machine. However, we welcome the attention and support from any party
interested in the development of a fundamental science of education.
Below is the short description of the goals and
proposed methods (copied from the full proposal).
I.
Introduction:
This project, when realized, has a potential to transform science of education
and, hence, education. The essence of the project is developing a revolutionary and science-based innovative
approach to describing, structuring, analyzing, and assessing the teaching and learning
process.
This
2-page presentation is to present the shortest version of the proposal, but the
innovative nature of the proposed project demands more detailed representation,
which is offered later.
The
big data analysis has entered many
important human practices. For example, one can point at such fields like: Human
Genome Project (DNA v. health), healthcare and epidemiology (spread of
diseases), particle physics, social and business networking (Facebook, Twitter,
Snapchat, Instagram, cellphone communication, telemedicine, remote business
communication), national security (trends in various networks), business
network analysis (AirB&B, Uber, Lift, Netflix), trading stocks, currency
exchange (live records of massive volume of transactions). Within all those
fields, data scientists were able to: (1) establish protocols and procedures
for quantifying data, for collecting, structuring, comparing and sharing vast
amounts of data; and (2) for mining the large data bases for extracting
valuable and reliable information on the correlations between multiple parameters
based on various types and levels of data coming from multiple sources.
However,
despite the fact that education represents one of the most vastly spread and
one of the most important human practices, the methods developed in other
fields for (1) collecting, and (2) mining BIG data have not found applications
in the field of education. Current approaches do not provide understanding of the
deep structure of teaching and learning processes, do not lead to development
of quantitative measures of the quality of teaching, and development of
quantitative measures of the trends in teaching (e.g. the measure of the
improvement in teaching), and development of quantitative measures of the
student progress correlated with student learning outcomes.
II.
Description of the current state of the Educational Data Mining (a.k.a. EDM):
1.
EDM
is in the stage of an early development and rather represents Advanced
Educational statistics (e.g. Educational Data Mining Society has been formed
only five years ago: educationaldatamining.org).
2.
Currently
the following approaches are used to obtain various educational data:
· Observing school
teachers or college faculty while teaching and assessing teacher’s actions
using various observation protocols (e.g. BOPR, COPUS, MarzanoOP, RTOP, GORP).
· Observing school and
college students while being taught using various observation protocols (e.g. a
“STEM class observation protocol”).
· Collecting responses to
various surveys (e.g. “National Survey of Student Engagement”, “National Survey
of College Faculty”).
· Collecting data during
various student-computer interactions when using various computer-based media
(MOOCs, computer games, intelligent tutoring systems, online content delivery
systems, online homework delivery systems).
It
is important to stress that:
(A)
When
data collection methods are based on the use of surveys or observation protocols,
they are typically used only ones or twice during a teaching period (a semester,
or a year); these methods are typically used to observe of a small percentage
of teachers and students.
(B)
Data
collected using computer-based media does not access the everyday reflection of
students on the learning process (actions taken for absorbing information and
developing skills, and following results and satisfaction); does not access the
everyday reflection of teaching faculty on the teaching process and on the
student progress; this data typically presents the aggregated student response
on the course as a whole (ranking the difficulty of a course, ranking homework
assignments, indicating relevance of a textbook and other resources, overall satisfaction);
mostly present two-parametric correlations like “time used for homework” –
“final grade”.
Currently,
educational data: is collected during isolated educational projects; does not
represent longitudinal streams of high frequency data collected during the full
term of learning; does not satisfy criteria for being “big data” (except few
collected via student-computer interactions); does not involve data streams
with a large number of parameters; does not allow cross analysis for searching
stable correlations between multiple parameters. In its current state, EDM is
rather Advanced Educational Statistics.
Currently,
there is NO research which:
(1)
regularly and frequently (e.g.
several times a week) collects data simultaneously
from teaching faculty and from
students during the whole period of
teaching a course (not just via observing one lecture);
(2)
uses media technologies, including phone apps, to collect the desired sets of educational
data incoming from multiple sources
(faculty, disciplines, departments, institutions);
(3)
uses technologies to mining data in searching for stable correlations between
different factors affecting teaching-learning practices and student’s
performance using multivariable
(multi-parametric) space.
Currently,
there is no “brick-and-mortal” educational institution which collects from
faculty and from students high frequency responses about multiple features of a
teaching and learning processes. There is no institution which collects and
cross-correlates multiple responses across various disciplines over a long
period of time.
III.
The scope and immediate goals of the proposed project:
The
project will pioneer (A) the development of a new type of a big data base via
collecting longitudinal streams of high frequency data in the field of
education; (B) the development of the new methodology for mining new type of
educational data and extracting valuable and reliable information on the
correlations between various parameters of multiple data sources of different
types and levels (faculty, departments, institutions).
Every
day zillions of apps are being used by millions of people. People already have
habits of tracking information every day (calories intake, calories burned,
steps made, miles traveled, etc.). Why not harness the new technologies and the
new habit to generate a stream of high frequency educational data?
The
goals:
1.
Establishing
a set of measurable and universal (but modifiable) parameters which will be
used for describing the state and structure of any teaching and learning processes
(i.e. for any course).
2.
Developing
one questionnaire for teaching faculty and one questionnaire for students,
which they will use during a course regularly and frequently for
self-observation, for assessing students’ actions and progress, for assessing
faculty teaching actions and traits.
3.
Developing
an app for collecting the data provided by students and faculty.
4.
Developing
the strategy for analyzing the data coming from faculty and students in search
for correlations.
5.
Developing
a web-site for collecting the data coming from faculty and students.
6.
Piloting
the program
We
are proposing collecting high frequency
longitudinal responses (from faculty and students: before the beginning of
the course, then after each lecture, after each exam, summative responses after
two weeks of a course about lectures, labs and all other features of the
course, generalized responses after each third of a semester, and the
accumulative responses just before and after the final examination). The goal
is to develop procedures which will allow to visualize the structure of the
responses, changes in the structure, trends in changes in the structure. This
should allow to access regularly student reflection on the course and on his or
her performance during the course (how do students assess the difficulty of
various assignments, the clarity or helpfulness of lectures, workbooks,
textbook, office hours, etc., helpful traits of a lecturer). This also should
allow to access regularly the structured reflection of a faculty on teaching
approach selected for the course, on students’ readiness, behavior,
performance, success. The next goal is to demonstrate the existence of stable
trends in correlations between various parameters affecting learning process of
students.
IV.
Resources.
The
project will leverage the existence of the expertise and resources allocated at
the Boston University: including scientists who have deep expertise in
developing and application of methods for collecting and organizing big data
coming from multiple sources, for quantifying data, extracting information from
big data on important correlations between multiple parameters describing functioning
of various systems or subsystems, finding cross relations, describing
information transfer between multiple sources. Using noise reduction methods,
finding critical points and visualizing state transitions (PI Prof. Plamen
Ivanov), and experienced teaching faculty (co-PI Dr. Valentin Voroshilov), and high
computational facility (GHPCC).
V.
Future development.
The
proposed approach to educational data mining is pioneering the development of
the new type of educational data, and the new methodology for collecting and
mining that new type of educational data.
It
has a potential to follow the history of the Human Genome Project (started at Boston
University).
Appendix I
Naturally, a similar approach for collecting live data with the following correlation analysis can be used for many other applications. For example, for any event the organizers can develop a website with corresponding phone/tablet/computer apps for the participants who can download the app creating an anonymous account. Then, during each talk the audience can use an app for live assessment and reactions to the talk, which could be summarized and represented right after the talk is over as well as after the whole event (to observe the evolution).
Appendix I
Naturally, a similar approach for collecting live data with the following correlation analysis can be used for many other applications. For example, for any event the organizers can develop a website with corresponding phone/tablet/computer apps for the participants who can download the app creating an anonymous account. Then, during each talk the audience can use an app for live assessment and reactions to the talk, which could be summarized and represented right after the talk is over as well as after the whole event (to observe the evolution).
Appendix II
The links to all six my applications to the NSF 2026 Big Idea Machine (from August 31, 2018 to October 26, 2018):
1. Entry125253: High Frequency Data Streams in Education
2. Entry124656: objective measures of physics knowledge
3. Entry125317: National database teacher PD
4. Entry124655: role of NSF in funding education
5. Entry125719: The new type of a science course for science teachers.
6. Entry126205: The development of the uniform standard for measuring content knowledge in physics.
The links to all six my applications to the NSF 2026 Big Idea Machine (from August 31, 2018 to October 26, 2018):
1. Entry125253: High Frequency Data Streams in Education
2. Entry124656: objective measures of physics knowledge
3. Entry125317: National database teacher PD
4. Entry124655: role of NSF in funding education
5. Entry125719: The new type of a science course for science teachers.
6. Entry126205: The development of the uniform standard for measuring content knowledge in physics.
No comments:
Post a Comment