Machine Learning/Datasets

From Noisebridge
< Machine Learning(Difference between revisions)
Jump to: navigation, search
m
m
 
(6 intermediate revisions by one user not shown)
Line 27: Line 27:
 
**"Percent of Men with full beards, 1866 – 1911. Source: Hipel and Mcleod (1994)."
 
**"Percent of Men with full beards, 1866 – 1911. Source: Hipel and Mcleod (1994)."
 
*[http://robjhyndman.com/tsdldata/roberts/velmon.dat Velocity of Money in America 1869-1960]
 
*[http://robjhyndman.com/tsdldata/roberts/velmon.dat Velocity of Money in America 1869-1960]
**The [http://en.wikipedia.org/wiki/Velocity_of_money velocity of money] is basically the amount of money that changes hands over a year.
+
**The [http://en.wikipedia.org/wiki/Velocity_of_money velocity of money] is basically the number of times a single unit of money changes hands over a period of time.  Theory goes, MV=PY, or Velocity = Prices * Economic Output / Quantity of Money.
 
*[http://robjhyndman.com/tsdldata/annual/globtp.dat Changes in Global Air Temperature 1880-1985]
 
*[http://robjhyndman.com/tsdldata/annual/globtp.dat Changes in Global Air Temperature 1880-1985]
 
**"Surface air temperature change for the globe, 1880-1985, Temperature change actually means temperature against an arbitrary zero point. From James Hansen and Sergej Lebedeff, "Global Trends of Measured Surface Air Temperature", `Journal of Geophysical Research`, Vol. 92, No. D11, pages 13,345-13,372, November 20, 1987."
 
**"Surface air temperature change for the globe, 1880-1985, Temperature change actually means temperature against an arbitrary zero point. From James Hansen and Sergej Lebedeff, "Global Trends of Measured Surface Air Temperature", `Journal of Geophysical Research`, Vol. 92, No. D11, pages 13,345-13,372, November 20, 1987."
Line 33: Line 33:
 
**"Source: National Earthquake Information Center. Different lists will give different numbers depending on the formula used for calculating the magnitude."
 
**"Source: National Earthquake Information Center. Different lists will give different numbers depending on the formula used for calculating the magnitude."
  
 +
===Clustering===
 +
*[http://archive.ics.uci.edu/ml/datasets/Plants USDA Plants Data]
 +
**Automatically cluster plants based on 70 attributes.
 +
*[http://www.uni-koeln.de/themen/statistik/data/cluster/ Nutriens in Meat, Fish and Fowl]
 +
**Can you cluster into animal type given the data?
  
 +
===Text Data===
 +
*[http://www.cs.cmu.edu/~enron/ Enron Emails]
 +
**Search through Enron's publicly accessible emails.
 +
*[http://archive.ics.uci.edu/ml/datasets/Bag+of+Words Bag of Words]
 +
**Collection of word counts for various types of documents, including Enron emails, scientific papers, and New York Times articles.
  
===Clustering===
+
 
 +
===Reinforcement Learning===

Latest revision as of 23:07, 15 March 2011

Machine learning is a vast field and there are many different types of problems to be solved. If you find a dataset interesting, try to categorize it (or add a new category) and add it to the links below.

Contents

[edit] Classification

  • MNIST Handwritten Digits
    • Classify handwritten digits using this dataset, a very popular one with lots of training examples.
  • Heart Disease
    • Predict whether a person will have heart disease based on a subset of 76 factors.
  • Census Income
    • Try to predict whether a person has an income greater than or less than 50k

[edit] Regression

[edit] Time Series

  • Gun-related Deaths in Australia
    • "Deaths from gun-related homicides and suicides and non-gun-related homicides and suicides. Australia: 1915-2004. Source: Neill and Leigh (2007)."
  • Immigration Rates
    • "Annual immigration into the United States: thousands. 1820 – 1962. From Kendall & Ord (1990), p.13."
  • Percent of Men with Beards 1866-1911
    • "Percent of Men with full beards, 1866 – 1911. Source: Hipel and Mcleod (1994)."
  • Velocity of Money in America 1869-1960
    • The velocity of money is basically the number of times a single unit of money changes hands over a period of time. Theory goes, MV=PY, or Velocity = Prices * Economic Output / Quantity of Money.
  • Changes in Global Air Temperature 1880-1985
    • "Surface air temperature change for the globe, 1880-1985, Temperature change actually means temperature against an arbitrary zero point. From James Hansen and Sergej Lebedeff, "Global Trends of Measured Surface Air Temperature", `Journal of Geophysical Research`, Vol. 92, No. D11, pages 13,345-13,372, November 20, 1987."
  • Number of Earthquakes per Year 1900-1988 (>= 7.0)
    • "Source: National Earthquake Information Center. Different lists will give different numbers depending on the formula used for calculating the magnitude."

[edit] Clustering

[edit] Text Data

  • Enron Emails
    • Search through Enron's publicly accessible emails.
  • Bag of Words
    • Collection of word counts for various types of documents, including Enron emails, scientific papers, and New York Times articles.


[edit] Reinforcement Learning

Personal tools