Machine Learning Meetup Notes:2011-4-13

From Noisebridge
(Difference between revisions)
Jump to: navigation, search
m
(link to uploaded ppt)
 
(5 intermediate revisions by one user not shown)
Line 1: Line 1:
 
Anthony Goldbloom from Kaggle Visits
 
Anthony Goldbloom from Kaggle Visits
  
 +
*Link to his talk: [https://www.noisebridge.net/images/e/ed/Goldbloom_-_Predictive_modeling_competitions_-_April_2011.ppt PPT presentation]
 
*Guy used random forests to win HIV competition. Word "random forests" is trademarked. Dude taught himself machine learning from watching youtube videos. Random forests are pretty robust to new data.
 
*Guy used random forests to win HIV competition. Word "random forests" is trademarked. Dude taught himself machine learning from watching youtube videos. Random forests are pretty robust to new data.
 
**Used [http://cran.r-project.org/web/packages/caret/ caret] package in R to deal with random forests.
 
**Used [http://cran.r-project.org/web/packages/caret/ caret] package in R to deal with random forests.
Line 7: Line 8:
 
*Performance bottoms out in competitions within a few weeks in general. This seems to be due to all the information being "squeezed" out of the dataset at that point.
 
*Performance bottoms out in competitions within a few weeks in general. This seems to be due to all the information being "squeezed" out of the dataset at that point.
 
*Chess rating competition: build a new rating system that more accurately produces the results. The performance still plateaued, but took longer.
 
*Chess rating competition: build a new rating system that more accurately produces the results. The performance still plateaued, but took longer.
 +
*Most users of kaggle are from computer science and statistics, followed by economics, math, biostats.
 +
*Tools people use:
 +
**R: lots of american users
 +
**Matlab
 +
**SAS
 +
**Weka
 +
**SPSS
 +
**Python: although it's lower on the list, people are successful with it
 +
*R packages used: Caret, RFE, GLM, NNET, Forecast
 +
*Heritage Prize
 +
**Real shit is going down may 4th, with release of all datasets.
 +
**Ends in 2 years. No rush.
 +
**Four prizes in total, given out throughout the next two years.

Latest revision as of 11:38, 19 April 2011

Anthony Goldbloom from Kaggle Visits

  • Link to his talk: PPT presentation
  • Guy used random forests to win HIV competition. Word "random forests" is trademarked. Dude taught himself machine learning from watching youtube videos. Random forests are pretty robust to new data.
    • Used caret package in R to deal with random forests.
  • Kaggle splits test dataset into two, uses half for leaderboard.
  • Often score difference between winning model and second place is not statistically significant. So they award prizes to top few. Might impose restrictions on execution time of model.
  • Performance bottoms out in competitions within a few weeks in general. This seems to be due to all the information being "squeezed" out of the dataset at that point.
  • Chess rating competition: build a new rating system that more accurately produces the results. The performance still plateaued, but took longer.
  • Most users of kaggle are from computer science and statistics, followed by economics, math, biostats.
  • Tools people use:
    • R: lots of american users
    • Matlab
    • SAS
    • Weka
    • SPSS
    • Python: although it's lower on the list, people are successful with it
  • R packages used: Caret, RFE, GLM, NNET, Forecast
  • Heritage Prize
    • Real shit is going down may 4th, with release of all datasets.
    • Ends in 2 years. No rush.
    • Four prizes in total, given out throughout the next two years.
Personal tools