Machine Learning Meetup Notes: 2010-05-19
- Erin provided a list of unique SubSkills and TracedSkills with frequencies, as well as a python script to normalize the skill values in the challenge sets.
- Vikram gave a presentation and demo on Hadoop, EC2 and MapReduce. He created a bunch of scripts for EC2 MapReduce. Those tools can be found on github.
Here are some map reduce notes:
Word Counts (let line number be the key):
1 hello how are you
2 how is it going
3 are you happy
def map(key, value): words = value.split() #["hello", "how", "are", "you"] for word in words emit(word, 1) def reduce(key, values): emit(key, len(values))