Machine Learning Meetup Notes: 2010-07-07

Col-1: patient ID
Col-2: responder status ("1" for patients who improved and "0" otherwise)
Col-3: Protease nucleotide sequence (if available)
Col-4: Reverse Transciptase nucleotide sequence (if available)
Col-5: viral load at the beginning of therapy (log-10 units)
Col-6: CD4 count at the beginning of therapy

Helpful WEKA Videos http://sentimentmining.net/weka/

molecular weight and length of "PR Sequence" and "RT Sequence" from the training data

start weka
open mweight.csv
remove patient
select resp
filter->unsupervised->attribute->numerictonominal
click to change to first only
apply

neural network classify->functions->multilayerperceptron

resp
start

738 correct predictions a=0 no improvement
66 correct predictions b=1 improvement

56 no improvement classified as improvement
140 improvement classified as no improvement

how well did it do? 80.4% accuracy

rows tell you what really happenned
columns tell you what was predicted

cluster simplekmeans

change num clusters 5
ok->start

scipy cluster.hierarchy main function called linkage ldist takes levenstein distance of each parts of the set result is a matrix distance hierarchical clustering

single linkage clustering: start with n clusters, take the ones that have the shortest distance between them and make that a cluster. then keep going until you have 1 cluster.

when you join two points, you always check both of the distances in that cluster against other points, and then take whatever is smaller

complete linkage: you take the largest distance instead

there is also one that takes the average

Machine Learning Meetup Notes: 2010-07-07

Navigation menu

Search