# Machine Learning/Kaggle Social Network Contest/Features

From Noisebridge

< Machine Learning | Kaggle Social Network Contest(Difference between revisions)

(→Possible Features) |
(→Possible Features) |
||

Line 3: | Line 3: | ||

== Possible Features == | == Possible Features == | ||

− | *nodeid | + | *Node Features |

− | * | + | **nodeid |

− | * | + | **outdegree |

− | * | + | **indegree |

− | *inbound edges | + | **local clustering coefficient |

− | + | **reciprocation of inbound probability (num of edges returned / num of inbound edges) | |

− | * | + | **reciprocation of outbound probability (num of edges returned / num of outbound edges) |

− | *reciprocation probability (num of edges returned / num of outbound edges) | + | |

− | + | *Edge Features | |

+ | **nodetofollowid | ||

+ | **shortest distance nodeid to nodetofollowid | ||

+ | **density? (<strike>median path length</strike>) | ||

+ | **is nodetofollowid following nodeid? | ||

+ | **number of common friends | ||

+ | **indegrees & outdegrees of nodetofollowid | ||

− | |||

* Network features | * Network features | ||

** unweighted random walk score | ** unweighted random walk score | ||

+ | ** global clustering coefficient | ||

** Adamic-Adar score | ** Adamic-Adar score | ||

*** see [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.108.1370&rep=rep1&type=pdf original paper] | *** see [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.108.1370&rep=rep1&type=pdf original paper] | ||

− | + | ||

− | + | The response variable is the probability that the nodeid to nodetofollowid edge will be created in the future | |

− | + | ||

− | + | ||

− | + |

## Revision as of 22:54, 19 November 2010

## TODO

- Precisely define the listed features

## Possible Features

- Node Features
- nodeid
- outdegree
- indegree
- local clustering coefficient
- reciprocation of inbound probability (num of edges returned / num of inbound edges)
- reciprocation of outbound probability (num of edges returned / num of outbound edges)

- Edge Features
- nodetofollowid
- shortest distance nodeid to nodetofollowid
- density? (
~~median path length~~) - is nodetofollowid following nodeid?
- number of common friends
- indegrees & outdegrees of nodetofollowid

- Network features
- unweighted random walk score
- global clustering coefficient
- Adamic-Adar score
- see original paper

The response variable is the probability that the nodeid to nodetofollowid edge will be created in the future