Machine Learning/Datasets

From Noisebridge
< Machine Learning(Difference between revisions)
Jump to: navigation, search
m
m
Line 1: Line 1:
This page describes in detail the datasets used for the [[NBML Course]].
+
Machine learning is a vast field and there are many different types of problems to be solved. If you find a dataset interesting, try to categorize it (or add a new category) and add it to the links below.
  
'''Classification'''
+
===Classification===
 
*[http://yann.lecun.com/exdb/mnist/ MNIST Handwritten Digits]
 
*[http://yann.lecun.com/exdb/mnist/ MNIST Handwritten Digits]
 
**Classify handwritten digits using this dataset, a very popular one with lots of training examples.
 
**Classify handwritten digits using this dataset, a very popular one with lots of training examples.
Line 9: Line 9:
 
**Try to predict whether a person has an income greater than or less than 50k
 
**Try to predict whether a person has an income greater than or less than 50k
  
'''Regression'''
+
===Regression===
 
*[http://www.sci.usq.edu.au/staff/dunn/Datasets/Books/Hand/Hand-R/alps-R.html Boiling point in the Alps]
 
*[http://www.sci.usq.edu.au/staff/dunn/Datasets/Books/Hand/Hand-R/alps-R.html Boiling point in the Alps]
 
**The boiling point of water at different barometric pressures.  
 
**The boiling point of water at different barometric pressures.  
Line 19: Line 19:
 
**How does smoking affect lung capacity?
 
**How does smoking affect lung capacity?
  
'''Time Series'''
+
===Time Series===
 
*[http://robjhyndman.com/tsdldata/data/ausgundeaths.dat Gun-related Deaths in Australia]
 
*[http://robjhyndman.com/tsdldata/data/ausgundeaths.dat Gun-related Deaths in Australia]
 
**"Deaths from gun-related homicides and suicides and non-gun-related homicides and suicides. Australia: 1915-2004. Source: Neill and Leigh (2007)."
 
**"Deaths from gun-related homicides and suicides and non-gun-related homicides and suicides. Australia: 1915-2004. Source: Neill and Leigh (2007)."
Line 27: Line 27:
 
**"Percent of Men with full beards, 1866 – 1911. Source: Hipel and Mcleod (1994)."
 
**"Percent of Men with full beards, 1866 – 1911. Source: Hipel and Mcleod (1994)."
  
'''Clustering'''
+
===Clustering===

Revision as of 23:44, 14 March 2011

Machine learning is a vast field and there are many different types of problems to be solved. If you find a dataset interesting, try to categorize it (or add a new category) and add it to the links below.

Contents

Classification

  • MNIST Handwritten Digits
    • Classify handwritten digits using this dataset, a very popular one with lots of training examples.
  • Heart Disease
    • Predict whether a person will have heart disease based on a subset of 76 factors.
  • Census Income
    • Try to predict whether a person has an income greater than or less than 50k

Regression

Time Series

  • Gun-related Deaths in Australia
    • "Deaths from gun-related homicides and suicides and non-gun-related homicides and suicides. Australia: 1915-2004. Source: Neill and Leigh (2007)."
  • Immigration Rates
    • "Annual immigration into the United States: thousands. 1820 – 1962. From Kendall & Ord (1990), p.13."
  • Percent of Men with Beards 1866-1911
    • "Percent of Men with full beards, 1866 – 1911. Source: Hipel and Mcleod (1994)."

Clustering

Personal tools