Machine Learning Meetup Notes: 2010-07-07

From Noisebridge
(Difference between revisions)
Jump to: navigation, search
(Created page with 'Col-1: patient ID Col-2: responder status ("1" for patients who improved and "0" otherwise) Col-3: Protease nucleotide sequence (if available) Col-4: Reverse Transciptase nucleot…')
 
Line 1: Line 1:
Col-1: patient ID
+
*Col-1: patient ID
Col-2: responder status ("1" for patients who improved and "0" otherwise)
+
*Col-2: responder status ("1" for patients who improved and "0" otherwise)
Col-3: Protease nucleotide sequence (if available)
+
*Col-3: Protease nucleotide sequence (if available)
Col-4: Reverse Transciptase nucleotide sequence (if available)
+
*Col-4: Reverse Transciptase nucleotide sequence (if available)
Col-5: viral load at the beginning of therapy (log-10 units)
+
*Col-5: viral load at the beginning of therapy (log-10 units)
Col-6: CD4 count at the beginning of therapy
+
*Col-6: CD4 count at the beginning of therapy
  
 
molecular weight and length of "PR Sequence" and "RT Sequence" from the training data
 
molecular weight and length of "PR Sequence" and "RT Sequence" from the training data
start weka
+
#start weka
open mweight.csv
+
#open mweight.csv
remove patient
+
#remove patient
select resp
+
#select resp
filter->unsupervised->attribute->numerictonominal
+
#filter->unsupervised->attribute->numerictonominal
click to change to first only
+
#click to change to first only
apply
+
#apply
  
 
neural network
 
neural network
 
classify->functions->multilayerperceptron
 
classify->functions->multilayerperceptron
resp
+
#resp
start
+
#start
738 correct predictions a=0 no improvement
+
*738 correct predictions a=0 no improvement
66 correct predictions b=1 improvement
+
*66 correct predictions b=1 improvement
  
56 no improvement classified as improvement
+
*56 no improvement classified as improvement
140 improvement classified as no improvement
+
*140 improvement classified as no improvement
  
 
how well did it do?  80.4% accuracy
 
how well did it do?  80.4% accuracy
 
+
*rows tell you what really happenned
rows tell you what really happenned
+
*columns tell you what was predicted
columns tell you what was predicted
+
  
 
cluster simplekmeans
 
cluster simplekmeans
  change num clusters 5
+
#change num clusters 5
  ok->start
+
#ok->start
  
 
scipy cluster.hierarchy
 
scipy cluster.hierarchy
Line 41: Line 40:
  
 
single linkage clustering: start with n clusters, take the ones that have the shortest distance between them and make that a cluster.  then keep going until you have 1 cluster.   
 
single linkage clustering: start with n clusters, take the ones that have the shortest distance between them and make that a cluster.  then keep going until you have 1 cluster.   
-when you join two points, you always check both of the distances in that cluster against other points, and then take whatever is smaller
+
*when you join two points, you always check both of the distances in that cluster against other points, and then take whatever is smaller
 
complete linkage: you take the largest distance instead
 
complete linkage: you take the largest distance instead
-there is also one that takes the average
+
*there is also one that takes the average

Revision as of 22:18, 21 July 2010

  • Col-1: patient ID
  • Col-2: responder status ("1" for patients who improved and "0" otherwise)
  • Col-3: Protease nucleotide sequence (if available)
  • Col-4: Reverse Transciptase nucleotide sequence (if available)
  • Col-5: viral load at the beginning of therapy (log-10 units)
  • Col-6: CD4 count at the beginning of therapy

molecular weight and length of "PR Sequence" and "RT Sequence" from the training data

  1. start weka
  2. open mweight.csv
  3. remove patient
  4. select resp
  5. filter->unsupervised->attribute->numerictonominal
  6. click to change to first only
  7. apply

neural network classify->functions->multilayerperceptron

  1. resp
  2. start
  • 738 correct predictions a=0 no improvement
  • 66 correct predictions b=1 improvement
  • 56 no improvement classified as improvement
  • 140 improvement classified as no improvement

how well did it do? 80.4% accuracy

  • rows tell you what really happenned
  • columns tell you what was predicted

cluster simplekmeans

  1. change num clusters 5
  2. ok->start

scipy cluster.hierarchy main function called linkage ldist takes levenstein distance of each parts of the set result is a matrix distance hierarchical clustering

single linkage clustering: start with n clusters, take the ones that have the shortest distance between them and make that a cluster. then keep going until you have 1 cluster.

  • when you join two points, you always check both of the distances in that cluster against other points, and then take whatever is smaller

complete linkage: you take the largest distance instead

  • there is also one that takes the average
Personal tools