Machine Learning Meetup Notes: 2010-07-07

From Noisebridge
Revision as of 22:16, 21 July 2010 by SpammerHellDontDelete (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Col-1: patient ID Col-2: responder status ("1" for patients who improved and "0" otherwise) Col-3: Protease nucleotide sequence (if available) Col-4: Reverse Transciptase nucleotide sequence (if available) Col-5: viral load at the beginning of therapy (log-10 units) Col-6: CD4 count at the beginning of therapy

molecular weight and length of "PR Sequence" and "RT Sequence" from the training data start weka open mweight.csv remove patient select resp filter->unsupervised->attribute->numerictonominal click to change to first only apply

neural network classify->functions->multilayerperceptron resp start 738 correct predictions a=0 no improvement 66 correct predictions b=1 improvement

56 no improvement classified as improvement 140 improvement classified as no improvement

how well did it do? 80.4% accuracy

rows tell you what really happenned columns tell you what was predicted

cluster simplekmeans

 change num clusters 5

scipy cluster.hierarchy main function called linkage ldist takes levenstein distance of each parts of the set result is a matrix distance hierarchical clustering

single linkage clustering: start with n clusters, take the ones that have the shortest distance between them and make that a cluster. then keep going until you have 1 cluster. -when you join two points, you always check both of the distances in that cluster against other points, and then take whatever is smaller complete linkage: you take the largest distance instead -there is also one that takes the average

Personal tools