Machine Learning Meetup Notes: 2010-05-23

From Noisebridge
(Difference between revisions)
Jump to: navigation, search
 
(7 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
Erin, Theo and Andy came together to define KDD machine learning problem definition.
 
Erin, Theo and Andy came together to define KDD machine learning problem definition.
  
We decided to remove (-), perhaps-remove (?) and add (+) the following features:
+
We decided to remove (-) and add (+) the following features:
- Row,
+
* Row, (only used for submission, not for ML algorithms)
Anon Student Id,
+
* Anon Student Id,
? Problem Hierarchy,
+
* Problem Hierarchy,
? Problem Name,
+
* Problem Name,
? Problem View,
+
* Problem View,
- Step Name,
+
* - Step Name,
+ unique step name (step name+problem name)
+
* + unique step name (step name+problem name)
- Step Start Time,
+
* - Step Start Time,
- First Transaction Time,
+
* - First Transaction Time,
- Correct Transaction Time,
+
* - Correct Transaction Time,
- Step End Time,
+
* - Step End Time,
- Step Duration (sec),
+
* - Step Duration (sec),
- Correct Step Duration (sec),
+
* - Correct Step Duration (sec),
- Error Step Duration (sec),
+
* - Error Step Duration (sec),
Correct First Attempt,
+
Correct First Attempt,
- Incorrects,
+
* - Incorrects,
- Hints,
+
* - Hints,
- Corrects,
+
* - Corrects,
- KC(...),
+
* - KC(...),
- Opportunity(...)
+
* - Opportunity(...)
+ set of superskills (either boolean or opportunity value) (superskills = clustered skills)
+
* + set of superskills (either boolean or opportunity value) (superskills = clustered skills)
+ step success chance (% of successes total for this step)
+
* + step success chance (% of successes total for this unique stepname)
+ student IQ (% successful answers)
+
* + student "IQ" (% successful answers by this student)
+ complexity (number of skills required)
+
* + complexity (number of skills required for this unique stepname)
+ frequency of skills (e.g. discretize into low/medium/high frequency)
+
 
  
  
Line 41: Line 41:
  
 
* We agreed that the accepted features above generally make sense to us;  to define the set of features for the individual multi-algo problems is yet TODO.
 
* We agreed that the accepted features above generally make sense to us;  to define the set of features for the individual multi-algo problems is yet TODO.
 +
 +
 +
 +
Andy

Latest revision as of 15:36, 28 May 2010

Erin, Theo and Andy came together to define KDD machine learning problem definition.

We decided to remove (-) and add (+) the following features:

  • Row, (only used for submission, not for ML algorithms)
  • Anon Student Id,
  • Problem Hierarchy,
  • Problem Name,
  • Problem View,
  • - Step Name,
  • + unique step name (step name+problem name)
  • - Step Start Time,
  • - First Transaction Time,
  • - Correct Transaction Time,
  • - Step End Time,
  • - Step Duration (sec),
  • - Correct Step Duration (sec),
  • - Error Step Duration (sec),
  • Correct First Attempt,
  • - Incorrects,
  • - Hints,
  • - Corrects,
  • - KC(...),
  • - Opportunity(...)
  • + set of superskills (either boolean or opportunity value) (superskills = clustered skills)
  • + step success chance (% of successes total for this unique stepname)
  • + student "IQ" (% successful answers by this student)
  • + complexity (number of skills required for this unique stepname)



[edit] Submission datasets naming convention:

  • "bridge"
  • "algebra" -> also has KC rules model (need to orthoganlize them as well TODO erin)


[edit] Multi-algorithm idea:

  • After we discussed the features above, we had the idea to use multiple algorithms for predicting out output variable (probability of success for first try for this student-step): one algorithm predicting student success, one for step difficulty, perhaps additional ones... then have an aggregate function learn the overall success probability. Since we have about 3000 steps per student, we should have enough data to train a model for each student.
  • We agreed that the accepted features above generally make sense to us; to define the set of features for the individual multi-algo problems is yet TODO.


Andy

Personal tools