Machine Learning/moa
Setup Instructions[edit | edit source]
- Download and unzip http://thomaslotze.com/kdd/moa_prep.tgz
OR
- Create a directory to run your moa programs from; we'll assume it is ~/moa
- Download the moa release .tar.gz file from http://sourceforge.net/projects/moa-datastream/ and extract it
- copy moa.jar into ~/moa
- Download the weka release .zip file from http://sourceforge.net/projects/weka/ and extract it
- copy weka.jar into ~/moa
- Download http://jroller.com/resources/m/maxim/sizeofag.jar and copy it into ~/moa
Training MOA models[edit | edit source]
- Your data will need to be in ARFF format
- To evaluate the performance of different models, you can run varying prequential classifiers and look at their performance; for example,
java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask "EvaluatePrequential -l NaiveBayes -s (ArffFileStream -f atrain.arff -c -1) -O amodel_bayes.moa" java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask "EvaluatePrequential -l HoeffdingTree -s (ArffFileStream -f atrain.arff -c -1) -O amodel_hoeffding.moa"
- To actually generate the final model, you can run a command line like the following:
java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask "LearnModel -l NaiveBayes -s (ArffFileStream -f atrain.arff -c -1) -O amodel_bayes.moa"
Generating MOA model predictions[edit | edit source]
To generate predictions for a test set, you will need your test set to be in ARFF format, with the same columns as the training data (including output class; I just set this to all-0's)
To do this, you will also need the moa_personal.jar file in the same directory as your other jar files; you can get all the jar files needed from http://thomaslotze.com/kdd/jarfiles.tgz
You can then run the following (after generating a model using the above steps)
java -cp .:moa_personal.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask "EvaluateModel -e BasicLoggingClassificationPerformanceEvaluator -m file:amodel_bayes.moa -s (ArffFileStream -f atest.arff -c -1)" > a_bayes_predicted.txt
This generates a comma-separated file, which contains the item number as the first column and the probability of class 1 (in our case, cfa=1) as the second column
Thomas is going to develop the evaluator to be more general and robust, and hopefully submit it back for inclusion in the main MOA trunk. Right now, it will only work for examples with two classes.
Other Resources[edit | edit source]
- MOA site: http://www.cs.waikato.ac.nz/~abifet/MOA/