Machine Learning Meetup Notes: 2010-06-30

From Noisebridge
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Mike's bio overview:

amino acids build proteins

20 amino acids

  • protein has an amino acid sequence (three bases make up an amino acid)
  • dna comprised of 4 bases: A, T, C, G
  • rna comprised of 4 bases, A, U, C, G
  • A goes with T
  • C with G

every three bases is a codon, dave wrote a script that will take the codons and map them to their amino acids

  • protease - are a type of proteins that cleave other proteins?
  • reverse transcriptase - takes viral rna and transcribes it into dna
  • sends mrna (bad) into the ribosomes
  • they replicate very fast in your immune cells and thats how they kill them

99 amino acids in protease (297 dna bases)

reverse transcriptase is not predictable - each sequence is a different length


Possible Features:

  • for the OR acids, perhaps create all possible combinations and weight them by 1/(number of combinations), normal rows weight = 1
  • find most probable sequences (T)
  • correlating permutations (T)
  • molecular weight/length (E/Th)
  • acidity/charge
  • edit distance (differences between the sequences), use to cluster (A)
  • list of known resistant mvt sites (M)
  • find out which sites are most variable

Liebenstein?

for each site, and look at frequency of each amino acid

could put into a tree classifier