Machine Learning Meetup Notes: 2010-07-07: Difference between revisions

From Noisebridge
Jump to navigation Jump to search
No edit summary
No edit summary
Line 43: Line 43:
complete linkage: you take the largest distance instead
complete linkage: you take the largest distance instead
*there is also one that takes the average
*there is also one that takes the average
[Helpful WEKA Videos http://sentimentmining.net/weka/]

Revision as of 22:20, 21 July 2010

  • Col-1: patient ID
  • Col-2: responder status ("1" for patients who improved and "0" otherwise)
  • Col-3: Protease nucleotide sequence (if available)
  • Col-4: Reverse Transciptase nucleotide sequence (if available)
  • Col-5: viral load at the beginning of therapy (log-10 units)
  • Col-6: CD4 count at the beginning of therapy

molecular weight and length of "PR Sequence" and "RT Sequence" from the training data

  1. start weka
  2. open mweight.csv
  3. remove patient
  4. select resp
  5. filter->unsupervised->attribute->numerictonominal
  6. click to change to first only
  7. apply

neural network classify->functions->multilayerperceptron

  1. resp
  2. start
  • 738 correct predictions a=0 no improvement
  • 66 correct predictions b=1 improvement
  • 56 no improvement classified as improvement
  • 140 improvement classified as no improvement

how well did it do? 80.4% accuracy

  • rows tell you what really happenned
  • columns tell you what was predicted

cluster simplekmeans

  1. change num clusters 5
  2. ok->start

scipy cluster.hierarchy main function called linkage ldist takes levenstein distance of each parts of the set result is a matrix distance hierarchical clustering

single linkage clustering: start with n clusters, take the ones that have the shortest distance between them and make that a cluster. then keep going until you have 1 cluster.

  • when you join two points, you always check both of the distances in that cluster against other points, and then take whatever is smaller

complete linkage: you take the largest distance instead

  • there is also one that takes the average

[Helpful WEKA Videos http://sentimentmining.net/weka/]