[ml] drama prediction - training set

Full Name imsoexcitd at excite.com
Thu May 31 18:45:02 PDT 2012

I am planning on coming to the space tonight, is anyone else planning on coming in?  I'd like to talk about creating a training set from the mbox file so we can create a drama prediction model.  We can consider all sorts of interesting features, but at the bare minimum, we should create a large spare matrix of wordcounts for all (or a subset) of the words contained in either the message body, subject line or both.  Secondly, we need develop a protocol for labeling each message as drama or not-drama.  I don't know how diligently the [DRAMA] tag was applied to drama messages, but we can start there, and possibly also mark any messages that contain the word drama as "drama."

Anyone want to work on creating the training set?


