Machine Learning Meetup Notes: 2010-07-07

From Noisebridge
(Difference between revisions)
Jump to: navigation, search
(Created page with 'Col-1: patient ID Col-2: responder status ("1" for patients who improved and "0" otherwise) Col-3: Protease nucleotide sequence (if available) Col-4: Reverse Transciptase nucleot…')
 
 
(2 intermediate revisions by one user not shown)
Line 1: Line 1:
Col-1: patient ID
+
*Col-1: patient ID
Col-2: responder status ("1" for patients who improved and "0" otherwise)
+
*Col-2: responder status ("1" for patients who improved and "0" otherwise)
Col-3: Protease nucleotide sequence (if available)
+
*Col-3: Protease nucleotide sequence (if available)
Col-4: Reverse Transciptase nucleotide sequence (if available)
+
*Col-4: Reverse Transciptase nucleotide sequence (if available)
Col-5: viral load at the beginning of therapy (log-10 units)
+
*Col-5: viral load at the beginning of therapy (log-10 units)
Col-6: CD4 count at the beginning of therapy
+
*Col-6: CD4 count at the beginning of therapy
 +
 
 +
Helpful WEKA Videos http://sentimentmining.net/weka/
  
 
molecular weight and length of "PR Sequence" and "RT Sequence" from the training data
 
molecular weight and length of "PR Sequence" and "RT Sequence" from the training data
start weka
+
#start weka
open mweight.csv
+
#open mweight.csv
remove patient
+
#remove patient
select resp
+
#select resp
filter->unsupervised->attribute->numerictonominal
+
#filter->unsupervised->attribute->numerictonominal
click to change to first only
+
#click to change to first only
apply
+
#apply
  
 
neural network
 
neural network
 
classify->functions->multilayerperceptron
 
classify->functions->multilayerperceptron
resp
+
#resp
start
+
#start
738 correct predictions a=0 no improvement
+
*738 correct predictions a=0 no improvement
66 correct predictions b=1 improvement
+
*66 correct predictions b=1 improvement
  
56 no improvement classified as improvement
+
*56 no improvement classified as improvement
140 improvement classified as no improvement
+
*140 improvement classified as no improvement
  
 
how well did it do?  80.4% accuracy
 
how well did it do?  80.4% accuracy
 
+
*rows tell you what really happenned
rows tell you what really happenned
+
*columns tell you what was predicted
columns tell you what was predicted
+
  
 
cluster simplekmeans
 
cluster simplekmeans
  change num clusters 5
+
#change num clusters 5
  ok->start
+
#ok->start
  
 
scipy cluster.hierarchy
 
scipy cluster.hierarchy
Line 41: Line 42:
  
 
single linkage clustering: start with n clusters, take the ones that have the shortest distance between them and make that a cluster.  then keep going until you have 1 cluster.   
 
single linkage clustering: start with n clusters, take the ones that have the shortest distance between them and make that a cluster.  then keep going until you have 1 cluster.   
-when you join two points, you always check both of the distances in that cluster against other points, and then take whatever is smaller
+
*when you join two points, you always check both of the distances in that cluster against other points, and then take whatever is smaller
 
complete linkage: you take the largest distance instead
 
complete linkage: you take the largest distance instead
-there is also one that takes the average
+
*there is also one that takes the average

Latest revision as of 22:21, 21 July 2010

  • Col-1: patient ID
  • Col-2: responder status ("1" for patients who improved and "0" otherwise)
  • Col-3: Protease nucleotide sequence (if available)
  • Col-4: Reverse Transciptase nucleotide sequence (if available)
  • Col-5: viral load at the beginning of therapy (log-10 units)
  • Col-6: CD4 count at the beginning of therapy

Helpful WEKA Videos http://sentimentmining.net/weka/

molecular weight and length of "PR Sequence" and "RT Sequence" from the training data

  1. start weka
  2. open mweight.csv
  3. remove patient
  4. select resp
  5. filter->unsupervised->attribute->numerictonominal
  6. click to change to first only
  7. apply

neural network classify->functions->multilayerperceptron

  1. resp
  2. start
  • 738 correct predictions a=0 no improvement
  • 66 correct predictions b=1 improvement
  • 56 no improvement classified as improvement
  • 140 improvement classified as no improvement

how well did it do? 80.4% accuracy

  • rows tell you what really happenned
  • columns tell you what was predicted

cluster simplekmeans

  1. change num clusters 5
  2. ok->start

scipy cluster.hierarchy main function called linkage ldist takes levenstein distance of each parts of the set result is a matrix distance hierarchical clustering

single linkage clustering: start with n clusters, take the ones that have the shortest distance between them and make that a cluster. then keep going until you have 1 cluster.

  • when you join two points, you always check both of the distances in that cluster against other points, and then take whatever is smaller

complete linkage: you take the largest distance instead

  • there is also one that takes the average
Personal tools