Machine Learning/Datasets: Difference between revisions

Revision as of 00:57, 15 March 2011

Machine learning is a vast field and there are many different types of problems to be solved. If you find a dataset interesting, try to categorize it (or add a new category) and add it to the links below.

Classification

MNIST Handwritten Digits
- Classify handwritten digits using this dataset, a very popular one with lots of training examples.
Heart Disease
- Predict whether a person will have heart disease based on a subset of 76 factors.
Census Income
- Try to predict whether a person has an income greater than or less than 50k

Regression

Boiling point in the Alps
- The boiling point of water at different barometric pressures.
Shocking Rats
- How does shocking a rat affect it's ability to complete a maze?
Ice Cream Sales
- Predict the quantity of ice cream consumed based on some other variables.
Smoking and Respiratory Function
- How does smoking affect lung capacity?

Time Series

Gun-related Deaths in Australia
- "Deaths from gun-related homicides and suicides and non-gun-related homicides and suicides. Australia: 1915-2004. Source: Neill and Leigh (2007)."
Immigration Rates
- "Annual immigration into the United States: thousands. 1820 – 1962. From Kendall & Ord (1990), p.13."
Percent of Men with Beards 1866-1911
- "Percent of Men with full beards, 1866 – 1911. Source: Hipel and Mcleod (1994)."
Velocity of Money in America 1869-1960
- The velocity of money is basically the amount of money that changes hands over a year.
Changes in Global Air Temperature 1880-1985
- "Surface air temperature change for the globe, 1880-1985, Temperature change actually means temperature against an arbitrary zero point. From James Hansen and Sergej Lebedeff, "Global Trends of Measured Surface Air Temperature", `Journal of Geophysical Research`, Vol. 92, No. D11, pages 13,345-13,372, November 20, 1987."
Number of Earthquakes per Year 1900-1988 (>= 7.0)
- "Source: National Earthquake Information Center. Different lists will give different numbers depending on the formula used for calculating the magnitude."

Clustering

USDA Plants Data
- Automatically cluster plants based on 70 attributes.
Nutriens in Meat, Fish and Fowl
- Can you cluster into animal type given the data?

Text Data

Enron Emails
- Search through Enron's publicly accessible emails.
Bag of Words
- Collection of word counts for various types of documents, including Enron emails, scientific papers, and New York Times articles.

@@ Line 36: / Line 36: @@
 *[http://archive.ics.uci.edu/ml/datasets/Plants USDA Plants Data]
 **Automatically cluster plants based on 70 attributes.
+*[http://www.uni-koeln.de/themen/statistik/data/cluster/ Nutriens in Meat, Fish and Fowl]
+**Can you cluster into animal type given the data?
 ===Text Data===

Machine Learning/Datasets: Difference between revisions

Revision as of 00:57, 15 March 2011

Contents

Classification

Regression

Time Series

Clustering

Text Data

Navigation menu

Machine Learning/Datasets: Difference between revisions

Revision as of 00:57, 15 March 2011

Classification

Regression

Time Series

Clustering

Text Data

Navigation menu

Search