Machine Learning Meetup Notes: 2010-06-30

From Noisebridge
(Difference between revisions)
Jump to: navigation, search
(Created page with 'amino acids build proteins 20 amino acids protein has an amino acid sequence (three bases make up an amino acid) dna comprised of 4 bases: A, T, C, G rna comprised of 4 bases, A…')
 
 
Line 1: Line 1:
 +
Mike's bio overview:
 +
 
amino acids build proteins
 
amino acids build proteins
 +
 
20 amino acids
 
20 amino acids
  
protein has an amino acid sequence (three bases make up an amino acid)
+
*protein has an amino acid sequence (three bases make up an amino acid)
dna comprised of 4 bases: A, T, C, G
+
*dna comprised of 4 bases: A, T, C, G
rna comprised of 4 bases, A, U, C, G
+
*rna comprised of 4 bases, A, U, C, G
 +
*A goes with T
 +
*C with G
  
A goes with T
+
every three bases is a codon,
C with G
+
 
+
every three bases is a codon
+
 
dave wrote a script that will take the codons and map them to their amino acids
 
dave wrote a script that will take the codons and map them to their amino acids
  
protease - are a type of proteins that cleave other proteins?
+
*protease - are a type of proteins that cleave other proteins?
 
+
*reverse transcriptase - takes viral rna and transcribes it into dna
reverse transcriptase - takes viral rna and transcribes it into dna
+
*sends mrna (bad) into the ribosomes
sends mrna (bad) into the ribosomes
+
*they replicate very fast in your immune cells and thats how they kill them
they replicate very fast in your immune cells and thats how they kill them
+
  
 
99 amino acids in protease (297 dna bases)
 
99 amino acids in protease (297 dna bases)
Line 24: Line 25:
  
 
Possible Features:
 
Possible Features:
-for the OR acids, perhaps create all possible combinations and weight them by 1/(number of combinations), normal rows weight = 1
+
*for the OR acids, perhaps create all possible combinations and weight them by 1/(number of combinations), normal rows weight = 1
-find most probable sequences
+
*find most probable sequences (T)
-correlating permutations
+
*correlating permutations (T)
-molecular weight/length
+
*molecular weight/length (E/Th)
-acidity/charge
+
*acidity/charge  
-edit distance (differences between the sequences), use to cluster
+
*edit distance (differences between the sequences), use to cluster (A)
-list of known resistant mvt sites
+
*list of known resistant mvt sites (M)
-find out which sites are most variable
+
*find out which sites are most variable
  
 
Liebenstein?
 
Liebenstein?
  
 
for each site, and look at frequency of each amino acid
 
for each site, and look at frequency of each amino acid
 +
 
could put into a tree classifier
 
could put into a tree classifier

Latest revision as of 22:21, 30 June 2010

Mike's bio overview:

amino acids build proteins

20 amino acids

  • protein has an amino acid sequence (three bases make up an amino acid)
  • dna comprised of 4 bases: A, T, C, G
  • rna comprised of 4 bases, A, U, C, G
  • A goes with T
  • C with G

every three bases is a codon, dave wrote a script that will take the codons and map them to their amino acids

  • protease - are a type of proteins that cleave other proteins?
  • reverse transcriptase - takes viral rna and transcribes it into dna
  • sends mrna (bad) into the ribosomes
  • they replicate very fast in your immune cells and thats how they kill them

99 amino acids in protease (297 dna bases)

reverse transcriptase is not predictable - each sequence is a different length


Possible Features:

  • for the OR acids, perhaps create all possible combinations and weight them by 1/(number of combinations), normal rows weight = 1
  • find most probable sequences (T)
  • correlating permutations (T)
  • molecular weight/length (E/Th)
  • acidity/charge
  • edit distance (differences between the sequences), use to cluster (A)
  • list of known resistant mvt sites (M)
  • find out which sites are most variable

Liebenstein?

for each site, and look at frequency of each amino acid

could put into a tree classifier

Personal tools