Machine Learning Meetup Notes:2011-4-13

From Noisebridge
(Difference between revisions)
Jump to: navigation, search
m
m
Line 5: Line 5:
 
*Kaggle splits test dataset into two, uses half for leaderboard.
 
*Kaggle splits test dataset into two, uses half for leaderboard.
 
*Often score difference between winning model and second place is not statistically significant. So they award prizes to top few. Might impose restrictions on execution time of model.
 
*Often score difference between winning model and second place is not statistically significant. So they award prizes to top few. Might impose restrictions on execution time of model.
 +
*Performance bottoms out in competitions within a few weeks in general. This seems to be due to all the information being "squeezed" out of the dataset at that point.

Revision as of 20:11, 13 April 2011

Anthony Goldbloom from Kaggle Visits

  • Guy used random forests to win HIV competition. Word "random forests" is trademarked. Dude taught himself machine learning from watching youtube videos. Random forests are pretty robust to new data.
    • Used caret package in R to deal with random forests.
  • Kaggle splits test dataset into two, uses half for leaderboard.
  • Often score difference between winning model and second place is not statistically significant. So they award prizes to top few. Might impose restrictions on execution time of model.
  • Performance bottoms out in competitions within a few weeks in general. This seems to be due to all the information being "squeezed" out of the dataset at that point.
Personal tools