Machine Learning/Datasets

From Noisebridge
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Machine learning is a vast field and there are many different types of problems to be solved. If you find a dataset interesting, try to categorize it (or add a new category) and add it to the links below.

Classification

  • MNIST Handwritten Digits
    • Classify handwritten digits using this dataset, a very popular one with lots of training examples.
  • Heart Disease
    • Predict whether a person will have heart disease based on a subset of 76 factors.
  • Census Income
    • Try to predict whether a person has an income greater than or less than 50k

Regression

Time Series

  • Gun-related Deaths in Australia
    • "Deaths from gun-related homicides and suicides and non-gun-related homicides and suicides. Australia: 1915-2004. Source: Neill and Leigh (2007)."
  • Immigration Rates
    • "Annual immigration into the United States: thousands. 1820 – 1962. From Kendall & Ord (1990), p.13."
  • Percent of Men with Beards 1866-1911
    • "Percent of Men with full beards, 1866 – 1911. Source: Hipel and Mcleod (1994)."
  • Velocity of Money in America 1869-1960
    • The velocity of money is basically the number of times a single unit of money changes hands over a period of time. Theory goes, MV=PY, or Velocity = Prices * Economic Output / Quantity of Money.
  • Changes in Global Air Temperature 1880-1985
    • "Surface air temperature change for the globe, 1880-1985, Temperature change actually means temperature against an arbitrary zero point. From James Hansen and Sergej Lebedeff, "Global Trends of Measured Surface Air Temperature", `Journal of Geophysical Research`, Vol. 92, No. D11, pages 13,345-13,372, November 20, 1987."
  • Number of Earthquakes per Year 1900-1988 (>= 7.0)
    • "Source: National Earthquake Information Center. Different lists will give different numbers depending on the formula used for calculating the magnitude."

Clustering

Text Data

  • Enron Emails
    • Search through Enron's publicly accessible emails.
  • Bag of Words
    • Collection of word counts for various types of documents, including Enron emails, scientific papers, and New York Times articles.


Reinforcement Learning