[ml] This week: KDD, next week: Hadoop!
voberoi at gmail.com
Wed May 19 17:53:24 PDT 2010
I've also thrown some code up on
http://github.com/voberoi/hadoop-mrutilsfor the workshop tonight.
There are a couple of example Python streaming/Pig
scripts and Pig UDFs in addition to instructions on how to get up and
running with Amazon's Elastic MapReduce.
If you have a moment to poke around the code, that'd be great!
On Wed, May 19, 2010 at 4:25 PM, Andreas von Hessling <vonhessling at gmail.com
> Hi all,
> For the discussion tonight it will be helpful if everybody could read
> through the KDD data format; It's fairly technical and is not
> trivial, so instead of spending time to re-hash it during the meeting
> it would be great if we could all be on the same page.
> Deadline for the challenge is June 8th, so we need to move fast if we
> are to submit an entry.
> Looking forward to tonight.
> On Tue, May 18, 2010 at 8:52 AM, Andreas von Hessling
> <vonhessling at gmail.com> wrote:
> > Mike,
> > we haven't actually gotten far in running algorithms so far. To this
> > point you're the only one working on dimensionality reduction. I say
> > go for it; knock yourself out. It will be good just to get a sense
> > where we should focus our energy.
> > BTW I'll put up a description of how to set up Weka with this dataset
> > soon. There's some NN algorithms right in there...
> > Andy
> > On Mon, May 17, 2010 at 9:31 PM, Mike Schachter <mike at mindmech.com>
> >> Hey everyone!
> >> Just got back the other day and looking forward to meeting up Wednesday
> >> and hearing about Hadoop. I just read a bit through the KDD challenge,
> >> was wondering if I could help out by doing something involving neural
> >> Neural nets can be made good at generalization and prediction, and also
> >> reducing problem dimensionality by clustering. For example, we could
> >> cluster the input records into groups, and pass that group data into an
> >> or something. Or we could use some sort of dimensionality reducing
> >> and pass the dimensionally-reduced dataset to a bayesian learner (which
> >> wouldn't work well if the data was high dimensional).
> >> If someone was already thinking of doing this I'd be happy to help out,
> >> can't
> >> glean much of what happened from the meeting notes.
> >> See you Wednesday!
> >> mike
> >> On Wed, May 12, 2010 at 10:05 PM, Thomas Lotze <thomas.lotze at gmail.com>
> >> wrote:
> >>> Hello, all! There was a good meeting today where we talked about the
> >>> dataset and plans for the next steps. I think it'll be a really good
> >>> opportunity for learning new tools and methods on machine learning,
> >>> knowledge and upping our collective ability! We've got plans to look
> at R,
> >>> libsvm, weka, and Hadoop to tackle the problem. I'm excited about
> >>> with it, and anyone else who wants to get involved should email me,
> >>> the data, and take a look at the wiki page I've put our initial plans
> >>> https://www.noisebridge.net/wiki/KDD_Competition_2010
> >>> Next week, Vikarem will be presenting Hadoop, with some scripts and
> >>> to actually use it -- I think we're all aware of how important Hadoop
> >>> already is and will continue to be in the future for analyzing large
> >>> sets, so I'm really glad that we've now got someone who knows about it
> >>> is willing to tell us more! I think this is a really great
> opportunity, and
> >>> many thanks to Vikarem for presenting!
> >>> Best wishes,
> >>> Thomas
> >>> _______________________________________________
> >>> ml mailing list
> >>> ml at lists.noisebridge.net
> >>> https://www.noisebridge.net/mailman/listinfo/ml
> >> _______________________________________________
> >> ml mailing list
> >> ml at lists.noisebridge.net
> >> https://www.noisebridge.net/mailman/listinfo/ml
> ml mailing list
> ml at lists.noisebridge.net
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the ml