[ml] This week: KDD, next week: Hadoop!

Vikram Oberoi voberoi at gmail.com
Wed May 19 17:53:24 PDT 2010


I've also thrown some code up on
http://github.com/voberoi/hadoop-mrutilsfor the workshop tonight.
There are a couple of example Python streaming/Pig
scripts and Pig UDFs in addition to instructions on how to get up and
running with Amazon's Elastic MapReduce.

If you have a moment to poke around the code, that'd be great!

Cheers,
Vikram

On Wed, May 19, 2010 at 4:25 PM, Andreas von Hessling <vonhessling at gmail.com
> wrote:

> Hi all,
>
> For the discussion tonight it will be helpful if everybody could read
> through the KDD data format;  It's fairly technical and is not
> trivial, so instead of spending time to re-hash it during the meeting
> it would be great if we could all be on the same page.
>
> https://pslcdatashop.web.cmu.edu/KDDCup/rules_data_format.jsp
>
> Deadline for the challenge is June 8th, so we need to move fast if we
> are to submit an entry.
>
> Looking forward to tonight.
>
>
> On Tue, May 18, 2010 at 8:52 AM, Andreas von Hessling
> <vonhessling at gmail.com> wrote:
> > Mike,
> > we haven't actually gotten far in running algorithms so far.  To this
> > point you're the only one working on dimensionality reduction.  I say
> > go for it; knock yourself out.  It will be good just to get a sense
> > where we should focus our energy.
> >
> > BTW I'll put up a description of how to set up Weka with this dataset
> > soon.  There's some NN algorithms right in there...
> >
> > Andy
> >
> >
> >
> >
> > On Mon, May 17, 2010 at 9:31 PM, Mike Schachter <mike at mindmech.com>
> wrote:
> >> Hey everyone!
> >>
> >> Just got back the other day and looking forward to meeting up Wednesday
> >> and hearing about Hadoop. I just read a bit through the KDD challenge,
> and
> >> was wondering if I could help out by doing something involving neural
> nets?
> >>
> >> Neural nets can be made good at generalization and prediction, and also
> >> reducing problem dimensionality by clustering. For example, we could
> >> cluster the input records into groups, and pass that group data into an
> SVM
> >> or something. Or we could use some sort of dimensionality reducing
> network
> >> and pass the dimensionally-reduced dataset to a bayesian learner (which
> >> wouldn't work well if the data was high dimensional).
> >>
> >> If someone was already thinking of doing this I'd be happy to help out,
> >> can't
> >> glean much of what happened from the meeting notes.
> >>
> >> See you Wednesday!
> >>
> >>   mike
> >>
> >>
> >>
> >> On Wed, May 12, 2010 at 10:05 PM, Thomas Lotze <thomas.lotze at gmail.com>
> >> wrote:
> >>>
> >>> Hello, all!  There was a good meeting today where we talked about the
> KDD
> >>> dataset and plans for the next steps.  I think it'll be a really good
> >>> opportunity for learning new tools and methods on machine learning,
> trading
> >>> knowledge and upping our collective ability!  We've got plans to look
> at R,
> >>> libsvm, weka, and Hadoop to tackle the problem.  I'm excited about
> working
> >>> with it, and anyone else who wants to get involved should email me,
> download
> >>> the data, and take a look at the wiki page I've put our initial plans
> in:
> >>>
> >>> https://www.noisebridge.net/wiki/KDD_Competition_2010
> >>>
> >>>
> >>> Next week, Vikarem will be presenting Hadoop, with some scripts and
> tools
> >>> to actually use it -- I think we're all aware of how important Hadoop
> >>> already is and will continue to be in the future for analyzing large
> data
> >>> sets, so I'm really glad that we've now got someone who knows about it
> and
> >>> is willing to tell us more!  I think this is a really great
> opportunity, and
> >>> many thanks to Vikarem for presenting!
> >>>
> >>>
> >>> Best wishes,
> >>> Thomas
> >>>
> >>> _______________________________________________
> >>> ml mailing list
> >>> ml at lists.noisebridge.net
> >>> https://www.noisebridge.net/mailman/listinfo/ml
> >>>
> >>
> >>
> >> _______________________________________________
> >> ml mailing list
> >> ml at lists.noisebridge.net
> >> https://www.noisebridge.net/mailman/listinfo/ml
> >>
> >>
> >
> _______________________________________________
> ml mailing list
> ml at lists.noisebridge.net
> https://www.noisebridge.net/mailman/listinfo/ml
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.noisebridge.net/pipermail/ml/attachments/20100519/b6772625/attachment.htm 


More information about the ml mailing list