[ml] clustering, weka

Mike Schachter mike at mindmech.com
Tue May 25 21:42:51 PDT 2010


Erin, Theo, any way I could get ahold of a subset of the orthogonalized
dataset before tomorrow's meeting?

  mike


On Tue, May 25, 2010 at 8:51 PM, Andreas von Hessling <vonhessling at gmail.com
> wrote:

> Mike,
>
> it would be great if you could apply the clustering not to the raw
> datasets (which contain a lot of meaningless information), but to the
> orthogonalized dataset that Erin & Theo provided (where the
> skill/opportunity columns are split up into many features.  Erin/Theo
> should have the latest version of these datasets.  If these challenge
> datasets are too big for Weka, I suggest sampling some records -- I
> believe Thomas has some code for this.
>
> We *will* need to cluster the skills at some point to make use of the
> orthogonalized datasets.
>
> Looking forward to your results.
>
> Andy
>
> On Tue, May 25, 2010 at 8:24 PM, Mike Schachter <mike at mindmech.com> wrote:
> > Hey everyone,
> >
> > Been super busy since last week's meeting, but started
> > reading up on k-Means clustering and expecation-maximization,
> > in the hopes that I can use one of these techniques to start
> > clustering the KDD data.
> >
> > Tonight I'm finally getting around to using Weka's built-in
> > clustering to see if it works with the KDD data:
> >
> > http://weka.wikispaces.com/Using+cluster+algorithms
> >
> > Can't promise anything in terms of results, but tomorrow I'd
> > be happy to give a (very) brief overview of k-means clustering
> > and expectation maximization, and hopefully some preliminary
> > results with a subset of the KDD data.
> >
> > Perhaps some of us could work together to implement a clustering
> algorithm
> > in map-reduce form to work on an elastic map reduce cluster! Looking
> > forward to seeing everyone tomorrow,
> >
> >   mike
> >
> >
> > _______________________________________________
> > ml mailing list
> > ml at lists.noisebridge.net
> > https://www.noisebridge.net/mailman/listinfo/ml
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.noisebridge.net/pipermail/ml/attachments/20100525/582e5d80/attachment.htm 


More information about the ml mailing list