Machine Learning/Kaggle Social Network Contest/Features: Difference between revisions

From Noisebridge
Jump to navigation Jump to search
Line 24: Line 24:
** Adamic-Adar score
** Adamic-Adar score
*** see [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.108.1370&rep=rep1&type=pdf original paper]
*** see [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.108.1370&rep=rep1&type=pdf original paper]
*** R igraph: [http://cneurocvs.rmki.kfki.hu/igraph/doc/R/similarity.html similarity.invlogweighted]


The response variable is the probability that the nodeid to nodetofollowid edge will be created in the future
The response variable is the probability that the nodeid to nodetofollowid edge will be created in the future

Revision as of 19:32, 22 November 2010

TODO

  • Precisely define the listed features

Possible Features

  • Node Features
    • nodeid
    • outdegree
    • indegree
    • local clustering coefficient
    • reciprocation of inbound probability (num of edges returned / num of inbound edges)
    • reciprocation of outbound probability (num of edges returned / num of outbound edges)
  • Edge Features
    • nodetofollowid
    • shortest distance nodeid to nodetofollowid
    • density? (median path length)
    • does reverse edge exist? (aka is nodetofollowid following nodeid?)
    • number of common friends
    • indegrees & outdegrees of nodetofollowid

The response variable is the probability that the nodeid to nodetofollowid edge will be created in the future