Editing Machine Learning/Kaggle Social Network Contest/Problem Representation
Jump to navigation
Jump to search
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 2: | Line 2: | ||
* come up with a plan of attack. | * come up with a plan of attack. | ||
== Idea A == | |||
== Idea A | |||
Construct a huge csv file containing each possible directed link and a bunch of features associated with it, then do some supervised learning on it. | Construct a huge csv file containing each possible directed link and a bunch of features associated with it, then do some supervised learning on it. | ||
Line 20: | Line 15: | ||
Say each column took up took up 7 characters and there were 12 columns (ie 10 features) we'd have a row of size 84 bytes. This makes it about 3,342 gigabytes | Say each column took up took up 7 characters and there were 12 columns (ie 10 features) we'd have a row of size 84 bytes. This makes it about 3,342 gigabytes | ||
Note if I have miscounted the number of unique nodes and there really are only 38k we'd still be dealing with a 112 GB file.) | |||
This number could be culled by considering just the nodes in some neighbourhood - but I figure that would only provide us with information about nodes which are connected. | |||
== Idea B == | |||
== Idea B | |||
We could perform some kind of online learning on the network where compute features based on a pair of nodes and then update of parameters. This would take 42 billion steps - which sounds like a lot. | We could perform some kind of online learning on the network where compute features based on a pair of nodes and then update of parameters. This would take 42 billion steps - which sounds like a lot. | ||