Machine Learning Meetup Notes: 2010-05-23

From Noisebridge
(Difference between revisions)
Jump to: navigation, search
Line 26: Line 26:
* + student "IQ" (% successful answers by this student)
* + student "IQ" (% successful answers by this student)
* + complexity (number of skills required for this step)
* + complexity (number of skills required for this step)
* + frequency of skills (e.g. discretize into low/medium/high frequency -- reasoning: infrequently tested skills may be harder/easier)

Revision as of 21:39, 26 May 2010

Erin, Theo and Andy came together to define KDD machine learning problem definition.

We decided to remove (-) and add (+) the following features:

  • Row, (only used for submission, not for ML algorithms)
  • Anon Student Id,
  • Problem Hierarchy,
  • Problem Name,
  • Problem View,
  • - Step Name,
  • + unique step name (step name+problem name)
  • - Step Start Time,
  • - First Transaction Time,
  • - Correct Transaction Time,
  • - Step End Time,
  • - Step Duration (sec),
  • - Correct Step Duration (sec),
  • - Error Step Duration (sec),
  • Correct First Attempt,
  • - Incorrects,
  • - Hints,
  • - Corrects,
  • - KC(...),
  • - Opportunity(...)
  • + set of superskills (either boolean or opportunity value) (superskills = clustered skills)
  • + step success chance (% of successes total for this unique stepname)
  • + student "IQ" (% successful answers by this student)
  • + complexity (number of skills required for this step)

Submission datasets naming convention:

  • "bridge"
  • "algebra" -> also has KC rules model (need to orthoganlize them as well TODO erin)

Multi-algorithm idea:

  • After we discussed the features above, we had the idea to use multiple algorithms for predicting out output variable (probability of success for first try for this student-step): one algorithm predicting student success, one for step difficulty, perhaps additional ones... then have an aggregate function learn the overall success probability. Since we have about 3000 steps per student, we should have enough data to train a model for each student.
  • We agreed that the accepted features above generally make sense to us; to define the set of features for the individual multi-algo problems is yet TODO.


Personal tools