# Machine Learning/Datasets

(Difference between revisions)

Machine learning is a vast field and there are many different types of problems to be solved. If you find a dataset interesting, try to categorize it (or add a new category) and add it to the links below.

## Contents

### Classification

• MNIST Handwritten Digits
• Classify handwritten digits using this dataset, a very popular one with lots of training examples.
• Heart Disease
• Predict whether a person will have heart disease based on a subset of 76 factors.
• Census Income
• Try to predict whether a person has an income greater than or less than 50k

### Time Series

• Gun-related Deaths in Australia
• "Deaths from gun-related homicides and suicides and non-gun-related homicides and suicides. Australia: 1915-2004. Source: Neill and Leigh (2007)."
• Immigration Rates
• "Annual immigration into the United States: thousands. 1820 – 1962. From Kendall & Ord (1990), p.13."
• Percent of Men with Beards 1866-1911
• "Percent of Men with full beards, 1866 – 1911. Source: Hipel and Mcleod (1994)."
• Velocity of Money in America 1869-1960
• The velocity of money is basically the number of times a single unit of money changes hands over a period of time. Theory goes, MV=PY, or Velocity = Prices * Economic Output / Quantity of Money.
• Changes in Global Air Temperature 1880-1985
• "Surface air temperature change for the globe, 1880-1985, Temperature change actually means temperature against an arbitrary zero point. From James Hansen and Sergej Lebedeff, "Global Trends of Measured Surface Air Temperature", `Journal of Geophysical Research`, Vol. 92, No. D11, pages 13,345-13,372, November 20, 1987."
• Number of Earthquakes per Year 1900-1988 (>= 7.0)
• "Source: National Earthquake Information Center. Different lists will give different numbers depending on the formula used for calculating the magnitude."

### Text Data

• Enron Emails
• Search through Enron's publicly accessible emails.
• Bag of Words
• Collection of word counts for various types of documents, including Enron emails, scientific papers, and New York Times articles.