Machine Learning Meetup Notes: 2010-06-30
Jump to navigation
Jump to search
Mike's bio overview:
amino acids build proteins
20 amino acids
- protein has an amino acid sequence (three bases make up an amino acid)
- dna comprised of 4 bases: A, T, C, G
- rna comprised of 4 bases, A, U, C, G
- A goes with T
- C with G
every three bases is a codon, dave wrote a script that will take the codons and map them to their amino acids
- protease - are a type of proteins that cleave other proteins?
- reverse transcriptase - takes viral rna and transcribes it into dna
- sends mrna (bad) into the ribosomes
- they replicate very fast in your immune cells and thats how they kill them
99 amino acids in protease (297 dna bases)
reverse transcriptase is not predictable - each sequence is a different length
Possible Features:
- for the OR acids, perhaps create all possible combinations and weight them by 1/(number of combinations), normal rows weight = 1
- find most probable sequences (T)
- correlating permutations (T)
- molecular weight/length (E/Th)
- acidity/charge
- edit distance (differences between the sequences), use to cluster (A)
- list of known resistant mvt sites (M)
- find out which sites are most variable
Liebenstein?
for each site, and look at frequency of each amino acid
could put into a tree classifier