Machine Learning/Kaggle Social Network Contest/Features

From Noisebridge
< Machine Learning | Kaggle Social Network Contest(Difference between revisions)
Jump to: navigation, search
(Possible Features)
Line 18: Line 18:
 
** unweighted random walk score
 
** unweighted random walk score
 
** Adamic-Adar score
 
** Adamic-Adar score
 +
*** see [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.108.1370&rep=rep1&type=pdf original paper]
 
** number of common friends
 
** number of common friends
 
** indegrees and outdegrees of  s
 
** indegrees and outdegrees of  s

Revision as of 22:30, 19 November 2010

TODO

  • Precisely define the listed features

Possible Features

  • nodeid
  • nodetofollowid
  • median path length
  • shortest distance from nodeid to nodetofollowid
  • inbound edges
  • outbound edges
  • clustering coefficient
  • reciprocation probability (num of edges returned / num of outbound edges)

The response variable is the probability that the nodeid to nodetofollowid edge will be created in the future

From the Backstrom and Leskovec, for a node s and a potential target c

  • Network features
    • unweighted random walk score
    • Adamic-Adar score
    • number of common friends
    • indegrees and outdegrees of s
      • the indegree is the number of edges coming into node s
      • the outdegree is the number of edges leaving node s
    • indegrees and outdegrees of c
Personal tools