[ml] Kaggle HIV update
mike at mindmech.com
Tue Jun 22 19:31:44 PDT 2010
I found an explanation on the forum of the Kaggle page that
explains what the non-standard letters mean, it linked to this:
On Tue, Jun 22, 2010 at 5:27 PM, Mike Schachter <mike at mindmech.com> wrote:
> Hey David,
> Unfortunately I don't think the sequences are amino acid sequences.
> For the PR sequences, most of them have a length of 297. If it's a
> DNA sequence, then this means it codes for 99 amino acids. A quick
> look shows that HIV-1 Protease (the protein whose sequence we're
> dealing with in the first sequence column) has 99 amino acid pairs:
> Does that make sense? If it does, then the sequences from the data are
> just noisy and of poor quality, and we're going to have to throw out some
> of the noisy data before running it through a sequence aligner. I'm in the
> process of doing this now, and will let everyone know how things are coming
> along at the meeting.
> See everyone tonight!
> On Tue, Jun 22, 2010 at 8:37 AM, David Faden <dfaden at gmail.com> wrote:
>> It looks like the sequences are already coded in terms of amino acids
>> rather than nucleotide triples? <
>> On Mon, Jun 21, 2010 at 10:29 PM, Thomas Lotze <thomas.lotze at gmail.com>wrote:
>>> I committed some python for generating base pair triplet count features,
>>> and R code for determining frequency and doing a basic GLM including the
>>> most frequent triplets.
>>> (The Noisebridge machine learning sourceforge git repository is here:
>>> https://sourceforge.net/scm/?type=git&group_id=326816 To download the
>>> files, run "git clone git://
>>> or, better yet, ask Mike to give you read/write access to this project so
>>> you can upload code as well)
>>> This got me to 53.8462 MCE, 36th out of 49 teams.
>>> See you tomorrow night at 9 for fun with Hadoop!
>>> ml mailing list
>>> ml at lists.noisebridge.net
>> ml mailing list
>> ml at lists.noisebridge.net
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the ml