Machine Learning Meetup Notes: 2010-05-23

From Noisebridge
Jump to navigation Jump to search

Erin, Theo and Andy came together to define KDD machine learning problem definition.

We decided to remove (-) and add (+) the following features:

  • Row, (only used for submission, not for ML algorithms)
  • Anon Student Id,
  • Problem Hierarchy,
  • Problem Name,
  • Problem View,
  • - Step Name,
  • + unique step name (step name+problem name)
  • - Step Start Time,
  • - First Transaction Time,
  • - Correct Transaction Time,
  • - Step End Time,
  • - Step Duration (sec),
  • - Correct Step Duration (sec),
  • - Error Step Duration (sec),
  • Correct First Attempt,
  • - Incorrects,
  • - Hints,
  • - Corrects,
  • - KC(...),
  • - Opportunity(...)
  • + set of superskills (either boolean or opportunity value) (superskills = clustered skills)
  • + step success chance (% of successes total for this unique stepname)
  • + student "IQ" (% successful answers by this student)
  • + complexity (number of skills required for this unique stepname)



Submission datasets naming convention:[edit]

  • "bridge"
  • "algebra" -> also has KC rules model (need to orthoganlize them as well TODO erin)


Multi-algorithm idea:[edit]

  • After we discussed the features above, we had the idea to use multiple algorithms for predicting out output variable (probability of success for first try for this student-step): one algorithm predicting student success, one for step difficulty, perhaps additional ones... then have an aggregate function learn the overall success probability. Since we have about 3000 steps per student, we should have enough data to train a model for each student.
  • We agreed that the accepted features above generally make sense to us; to define the set of features for the individual multi-algo problems is yet TODO.


Andy