Machine Learning Meetup Notes: 2010-05-26

From Noisebridge
(Difference between revisions)
Jump to: navigation, search
 
(One intermediate revision by one user not shown)
Line 6: Line 6:
 
*Brainstorming session on how to reduce skill set column
 
*Brainstorming session on how to reduce skill set column
 
**Tom tried to quantify opportunity per skills per row as high dimensional vector
 
**Tom tried to quantify opportunity per skills per row as high dimensional vector
*Brainstorming on how to reduce other data and compute new features
+
*Brainstorming on how to reduce other data and compute new features for the KDD Dataset
**Tom assigned to k-means clustering of skills
+
**Tom will apply k-means clustering of skills (or steps), for data reduction
**Andy assigned to computing new features: step/problem = student IQ, step complexity
+
**Andy will compute new features: unique step/problem id, student IQ (avg. correct), step challenge/difficulty (avg correct), step complexity (# skills required)
**
+
**Mike will use self-organizing maps to reduce skills
 +
**Paul will visualize/summarize the data, to provide understanding and insight
 +
**Mike will set up an FTP server for people to transfer their enormous datasets
 +
**Theo will use some Weka classifiers to produce a classification method for the data

Latest revision as of 22:09, 26 May 2010

  • Andy gave overview of where we're at with KDD data
  • Mike S gave presentation:
    • Gaussian Mixture Models
    • k-means clustering
    • very basic expectation-maximization
  • Brainstorming session on how to reduce skill set column
    • Tom tried to quantify opportunity per skills per row as high dimensional vector
  • Brainstorming on how to reduce other data and compute new features for the KDD Dataset
    • Tom will apply k-means clustering of skills (or steps), for data reduction
    • Andy will compute new features: unique step/problem id, student IQ (avg. correct), step challenge/difficulty (avg correct), step complexity (# skills required)
    • Mike will use self-organizing maps to reduce skills
    • Paul will visualize/summarize the data, to provide understanding and insight
    • Mike will set up an FTP server for people to transfer their enormous datasets
    • Theo will use some Weka classifiers to produce a classification method for the data
Personal tools