[ml] drama prediction - training set
imsoexcitd at excite.com
Thu Jun 7 09:20:50 PDT 2012
I will be there tonight (6/7) to work on creating word counts for a training set. I'm open to ideas that people have about
parsing the mbox file for word counts, so let me know if you have any thoughts on that. See you later.
From: "Wladyslaw Zbikowski" [embeddedlinuxguy at gmail.com]
Date: 05/31/2012 09:58 PM
To: "Full Name" <imsoexcitd at excite.com>
CC: ml at lists.noisebridge.net
Subject: Re: [ml] drama prediction - training set
I'm here, another guy is in the library who came for ML, Zephyr and
Mischief might come.
On Thu, May 31, 2012 at 9:45 PM, Full Name <imsoexcitd at excite.com> wrote:
> I am planning on coming to the space tonight, is anyone else planning on coming in? I'd like to talk about creating a training set from the mbox file so we can create a drama prediction model. We can consider all sorts of interesting features, but at the bare minimum, we should create a large spare matrix of wordcounts for all (or a subset) of the words contained in either the message body, subject line or both. Secondly, we need develop a protocol for labeling each message as drama or not-drama. I don't know how diligently the [DRAMA] tag was applied to drama messages, but we can start there, and possibly also mark any messages that contain the word drama as "drama."
> Anyone want to work on creating the training set?
> ml mailing list
> ml at lists.noisebridge.net
More information about the ml