Machine Learning Meetup Notes:2011-4-13: Difference between revisions
Jump to navigation
Jump to search
Mschachter (talk | contribs) mNo edit summary |
(link to uploaded ppt) |
||
(11 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
Anthony Goldbloom from Kaggle Visits | Anthony Goldbloom from Kaggle Visits | ||
* | *Link to his talk: [https://www.noisebridge.net/images/e/ed/Goldbloom_-_Predictive_modeling_competitions_-_April_2011.ppt PPT presentation] | ||
*Guy used random forests to win HIV competition. Word "random forests" is trademarked. Dude taught himself machine learning from watching youtube videos. Random forests are pretty robust to new data. | |||
**Used [http://cran.r-project.org/web/packages/caret/ caret] package in R to deal with random forests. | |||
*Kaggle splits test dataset into two, uses half for leaderboard. | |||
*Often score difference between winning model and second place is not statistically significant. So they award prizes to top few. Might impose restrictions on execution time of model. | |||
*Performance bottoms out in competitions within a few weeks in general. This seems to be due to all the information being "squeezed" out of the dataset at that point. | |||
*Chess rating competition: build a new rating system that more accurately produces the results. The performance still plateaued, but took longer. | |||
*Most users of kaggle are from computer science and statistics, followed by economics, math, biostats. | |||
*Tools people use: | |||
**R: lots of american users | |||
**Matlab | |||
**SAS | |||
**Weka | |||
**SPSS | |||
**Python: although it's lower on the list, people are successful with it | |||
*R packages used: Caret, RFE, GLM, NNET, Forecast | |||
*Heritage Prize | |||
**Real shit is going down may 4th, with release of all datasets. | |||
**Ends in 2 years. No rush. | |||
**Four prizes in total, given out throughout the next two years. |
Latest revision as of 11:38, 19 April 2011
Anthony Goldbloom from Kaggle Visits
- Link to his talk: PPT presentation
- Guy used random forests to win HIV competition. Word "random forests" is trademarked. Dude taught himself machine learning from watching youtube videos. Random forests are pretty robust to new data.
- Used caret package in R to deal with random forests.
- Kaggle splits test dataset into two, uses half for leaderboard.
- Often score difference between winning model and second place is not statistically significant. So they award prizes to top few. Might impose restrictions on execution time of model.
- Performance bottoms out in competitions within a few weeks in general. This seems to be due to all the information being "squeezed" out of the dataset at that point.
- Chess rating competition: build a new rating system that more accurately produces the results. The performance still plateaued, but took longer.
- Most users of kaggle are from computer science and statistics, followed by economics, math, biostats.
- Tools people use:
- R: lots of american users
- Matlab
- SAS
- Weka
- SPSS
- Python: although it's lower on the list, people are successful with it
- R packages used: Caret, RFE, GLM, NNET, Forecast
- Heritage Prize
- Real shit is going down may 4th, with release of all datasets.
- Ends in 2 years. No rush.
- Four prizes in total, given out throughout the next two years.