Machine Learning/Kaggle Social Network Contest/Features

From Noisebridge
< Machine Learning | Kaggle Social Network Contest(Difference between revisions)
Jump to: navigation, search
(Possible Features)
(Possible Features)
Line 3: Line 3:
  
 
== Possible Features ==
 
== Possible Features ==
*nodeid
+
*Node Features
*nodetofollowid
+
**nodeid
*median path length
+
**outdegree
*shortest distance from nodeid to nodetofollowid
+
**indegree
*inbound edges
+
**local clustering coefficient
*outbound edges
+
**reciprocation of inbound probability (num of edges returned / num of inbound edges)
*clustering coefficient
+
**reciprocation of outbound probability (num of edges returned / num of outbound edges)
*reciprocation probability (num of edges returned / num of outbound edges)
+
  
The response variable is the probability that the nodeid to nodetofollowid edge will be created in the future
+
*Edge Features
 +
**nodetofollowid
 +
**shortest distance nodeid to nodetofollowid
 +
**density? (<strike>median path length</strike>)
 +
**is nodetofollowid following nodeid?
 +
**number of common friends
 +
**indegrees & outdegrees of nodetofollowid
  
From the Backstrom and Leskovec, for a node s and a potential target c
 
 
* Network features
 
* Network features
 
** unweighted random walk score
 
** unweighted random walk score
 +
** global clustering coefficient
 
** Adamic-Adar score
 
** Adamic-Adar score
 
*** see [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.108.1370&rep=rep1&type=pdf original paper]
 
*** see [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.108.1370&rep=rep1&type=pdf original paper]
** number of common friends
+
 
** indegrees and outdegrees of  s
+
The response variable is the probability that the nodeid to nodetofollowid edge will be created in the future
*** the indegree is the number of edges coming into node s
+
*** the outdegree is the number of edges leaving node s
+
** indegrees and outdegrees of  c
+

Revision as of 22:54, 19 November 2010

TODO

  • Precisely define the listed features

Possible Features

  • Node Features
    • nodeid
    • outdegree
    • indegree
    • local clustering coefficient
    • reciprocation of inbound probability (num of edges returned / num of inbound edges)
    • reciprocation of outbound probability (num of edges returned / num of outbound edges)
  • Edge Features
    • nodetofollowid
    • shortest distance nodeid to nodetofollowid
    • density? (median path length)
    • is nodetofollowid following nodeid?
    • number of common friends
    • indegrees & outdegrees of nodetofollowid
  • Network features
    • unweighted random walk score
    • global clustering coefficient
    • Adamic-Adar score

The response variable is the probability that the nodeid to nodetofollowid edge will be created in the future

Personal tools