Machine Learning Meetup Notes: 2010-05-19: Difference between revisions

From Noisebridge
Jump to navigation Jump to search
(Created page with '*Erin provided a list of unique SubSkills and TracedSkills with frequencies, as well as a python script to normalize the skill values in the challenge sets. *Vikram gave a presen…')
 
No edit summary
 
Line 1: Line 1:
*Erin provided a list of unique SubSkills and TracedSkills with frequencies, as well as a python script to normalize the skill values in the challenge sets.
*Erin provided a list of unique SubSkills and TracedSkills with frequencies, as well as a python script to normalize the skill values in the challenge sets.
*Vikram gave a presentation on Hadoop, EC2 and MapReduce.  He created a bunch of scripts for EC2 MapReduce.  Those tools can be found on [http://github.com/voberoi/hadoop-mrutils github].
*Vikram gave a presentation and demo on Hadoop, EC2 and MapReduce.  He created a bunch of scripts for EC2 MapReduce.  Those tools can be found on [http://github.com/voberoi/hadoop-mrutils github].


Here are some map reduce notes:
Here are some map reduce notes:

Latest revision as of 22:04, 23 May 2010

  • Erin provided a list of unique SubSkills and TracedSkills with frequencies, as well as a python script to normalize the skill values in the challenge sets.
  • Vikram gave a presentation and demo on Hadoop, EC2 and MapReduce. He created a bunch of scripts for EC2 MapReduce. Those tools can be found on github.

Here are some map reduce notes:

Word Counts (let line number be the key):

1 hello how are you

2 how is it going

3 are you happy

def map(key, value):

	words = value.split()

	#["hello", "how", "are", "you"]

	for word in words

		emit(word, 1)
		

def reduce(key, values):

	emit(key, len(values))	

results:

hello [1]

how [1,1]

are [1,1]