[ml] KDD cup submission status
thomas.lotze at gmail.com
Sat Jun 5 13:23:26 PDT 2010
Awesome. I'm going to be looking at getting moa working today, and will
upload a how-to and code once I get it set. Mike, thanks for setting up the
repository! Andreas, if you have datasets with IQ/IQ strength available,
I'd love to make use of them (question, though: what is IQ strength as
compared to IQ?) I'm also curious what you used for the submission, as I am
(happily) surprised at the good performance!
On Sat, Jun 5, 2010 at 11:40 AM, Andreas von Hessling <vonhessling at gmail.com
> Sweet, Mike. Please note that we need the row -> clusterid mapping
> for both training AND testing sets. Otherwise it will not help the ML
> If I understand correctly, your input are the orthogonalized skills.
> So far, the girls only provided these orthogonalizations for the
> training files. I'm computing them for the test sets so you can use
> them. If I don't understand this assumption correctly, please let me
> know so I can use my CPU's cycles for other tasks.
> Ideally you can provide these cluster mappings by about Sunday, which
> is when I want to start running classifiers. I will need some time to
> actually run the ML algorithms.
> I have now IQ and IQ strength feature values for all datasets and am
> hoping time permits to compute chance and chance strength values for
> Computing # of skills required should not be difficult and I will add
> this feature as well. I plan on sharing my datasets as new versions
> become available.
> On Fri, Jun 4, 2010 at 1:42 PM, Mike Schachter <mike at mindmech.com> wrote:
> > So it's taking about 9 hours to create a graph from a 4.4GB file, I'm
> > going to work on improving the code to make it a bit faster, and also
> > am investigating a MapReduce solution.
> > Basically the clustering process can be broken down into two stages:
> > 1) Construct the graph, apply the clustering algorithm to break graph
> > clusters
> > 2) Apply the clustered graph to the data again to classify each skill set
> > I'll keep working on it and let everyone know how things are going with
> > as I mentioned in another email, the source code is in our new
> > project's git repository.
> > mike
> > On Thu, Jun 3, 2010 at 7:48 PM, Mike Schachter <mike at mindmech.com>
> >> Sounds like you're making great progress! I'll be working on the
> >> graph clustering algorithm for the skill set tonight and will keep
> >> you posted on how things are going.
> >> mike
> >> On Thu, Jun 3, 2010 at 6:17 PM, Andreas von Hessling
> >> <vonhessling at gmail.com> wrote:
> >>> Doing a few basic tricks, I catapulted the submission into the 50th
> >>> percentile. That is not even running any ML algorithm.
> >>> I'm planning on running the NaiveBayesUpdateable classifier
> >>> (http://weka.wikispaces.com/Classifying+large+datasets) over
> >>> discretized IQ/IQ strength/Chance/Chance strength from the command
> >>> line to evaluate performance. Another attempt would be to load all
> >>> data into memory (<3GB, even for full Bridge Train) and run SVMlib
> >>> over it.
> >>> If someone wants to try MOA
> >>> (http://www.cs.waikato.ac.nz/~abifet/MOA/index.html<http://www.cs.waikato.ac.nz/%7Eabifet/MOA/index.html>),
> this would be
> >>> helpful also in the long run (at least a tutorial how to set it up and
> >>> run).
> >>> The reduced datasets plus the IQ values are linked on the wiki:
> >>> are:
> >>> ...> row INT,
> >>> ...> studentid VARCHAR(30),
> >>> ...> problemhierarchy TEXT,
> >>> ...> problemname TEXT,
> >>> ...> problemview INT,
> >>> ...> problemstepname TEXT,
> >>> ...> cfa INT,
> >>> ...> iq REAL
> >>> IQ strength (number of attempts per student) should be available soon.
> >>> (perhaps add'l features will become available as well)
> >>> I'm still hoping somebody could cluster Erin's normalized skills data
> >>> and provide a row -> cluster id mapping for algebra and bridge train
> >>> and test sets (I don't have the data any more).
> >>> Andy
> >>> _______________________________________________
> >>> ml mailing list
> >>> ml at lists.noisebridge.net
> >>> https://www.noisebridge.net/mailman/listinfo/ml
> > _______________________________________________
> > ml mailing list
> > ml at lists.noisebridge.net
> > https://www.noisebridge.net/mailman/listinfo/ml
> ml mailing list
> ml at lists.noisebridge.net
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the ml