Machine Learning Meetup Notes: 2010-05-23
Erin, Theo and Andy came together to define KDD machine learning problem definition.
We decided to remove (-), perhaps-remove (?) and add (+) the following features:
- Row, Anon Student Id, ? Problem Hierarchy,
? Problem Name, ? Problem View, - Step Name, + unique step name (step name+problem name) - Step Start Time, - First Transaction Time, - Correct Transaction Time, - Step End Time, - Step Duration (sec), - Correct Step Duration (sec), - Error Step Duration (sec), Correct First Attempt, - Incorrects, - Hints, - Corrects, - KC(...), - Opportunity(...) + set of superskills (either boolean or opportunity value) (superskills = clustered skills) + step success chance (% of successes total for this step) + student IQ (% successful answers) + complexity (number of skills required) + frequency of skills (e.g. discretize into low/medium/high frequency)
Submission datasets naming convention:
- "algebra" -> also has KC rules model (need to orthoganlize them as well TODO erin)
- After we discussed the features above, we had the idea to use multiple algorithms for predicting out output variable (probability of success for first try for this student-step): one algorithm predicting student success, one for step difficulty, perhaps additional ones... then have an aggregate function learn the overall success probability. Since we have about 3000 steps per student, we should have enough data to train a model for each student.
- We agreed that the accepted features above generally make sense to us; to define the set of features for the individual multi-algo problems is yet TODO.