Machine Learning Meetup Notes: 2010-05-19

From Noisebridge
Jump to: navigation, search
  • Erin provided a list of unique SubSkills and TracedSkills with frequencies, as well as a python script to normalize the skill values in the challenge sets.
  • Vikram gave a presentation and demo on Hadoop, EC2 and MapReduce. He created a bunch of scripts for EC2 MapReduce. Those tools can be found on github.

Here are some map reduce notes:

Word Counts (let line number be the key):

1 hello how are you

2 how is it going

3 are you happy

def map(key, value):

	words = value.split()

	#["hello", "how", "are", "you"]

	for word in words

		emit(word, 1)

def reduce(key, values):

	emit(key, len(values))	


hello [1]

how [1,1]

are [1,1]