Editing Machine Learning/Kaggle Social Network Contest/Problem Representation
Jump to navigation
Jump to search
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 1: | Line 1: | ||
== TODO == | == TODO == | ||
* someone with large memory (>5.5GB) double check the number of unique nodes by loading it in networkx | |||
* come up with a plan of attack. | * come up with a plan of attack. | ||
== Idea A == | |||
== Idea A | |||
Construct a huge csv file containing each possible directed link and a bunch of features associated with it, then do some supervised learning on it. | Construct a huge csv file containing each possible directed link and a bunch of features associated with it, then do some supervised learning on it. | ||
Line 14: | Line 10: | ||
node_i, node_j, feature_ij_1, feature_ij_2, ... | node_i, node_j, feature_ij_1, feature_ij_2, ... | ||
The length of this would be long. When loading 3M rows of the edge list file I get 732166 nodes which means that this file would need (732 166^2) - 732 166 = 536 066 319 390 rows. | |||
The length of this would be | |||
Say each column took up took up 7 characters and there were 12 columns (ie 10 features) we'd have a row of size 84 bytes. This makes it about 4.5 x10^13 bytes = 41 937 gigabytes | |||
This is just if we use the first 3 million rows. | |||
(Note if I have miscounted the number of unique nodes and there really are only 38k we'd still be dealing with a 112 GB file.) | |||
This number could be culled by considering just the nodes in some neighbourhood - but I figure that would only provide us with information about nodes which are connected. | |||
== Idea B == | |||
We could perform some kind of online learning on the network where compute features based on a pair of nodes and then update of parameters. This would take 500 billion steps - which sounds like a lot (again just based on the first 3M rows from the edge file). |