Machine Learning/Kaggle Social Network Contest/Network Description

From Noisebridge
< Machine Learning | Kaggle Social Network Contest(Difference between revisions)
Jump to: navigation, search
Line 6: Line 6:
 
* number of edges: 7,237,983
 
* number of edges: 7,237,983
  
* The Graph is '''not''' weakly connected! This means that it can be broken down into at least two discrete subgraphs.
+
== Conectivity ==
 +
"A digraph is strongly connected if every vertex is reachable from every other following the directions of the arcs. On the contrary, a digraph is weakly connected if its underlying undirected graph is connected. A weakly connected graph can be thought of as a digraph in which every vertex is "reachable" from every other but not necessarily following the directions of the arcs. A strong orientation is an orientation that produces a strongly connected digraph." [http://en.wikipedia.org/wiki/Glossary_of_graph_theory wikipedia]
  
 +
* The Graph is '''not''' weakly connected
 +
* It contains 27 subgraphs This means that it can be broken down into at least two discrete subgraphs.
 +
** c.f. [http://cneurocvs.rmki.kfki.hu/igraph/doc/R/clusters.html igraph clustering]
 +
** There is one very large cluster containing all but 154 verticies, then 4 with size 10 - 37, 8 sized 3 - 7, 13 size 2 and one lonely dude.
 +
 +
* I also grabbed the number of strongly connected subgraphs
 +
{| border="1"
 +
|-
 +
!| Cluster Size
 +
| 1 
 +
|  2 
 +
|  3 
 +
|  4 
 +
|  5   
 +
|  9 
 +
| 10
 +
| 32464
 +
|-
 +
!| freq
 +
|1100647   
 +
| 162   
 +
| 18   
 +
|5 
 +
|4   
 +
|  1 
 +
|    1 
 +
|    1
 +
|}
 
* Diameter of the directed graph
 
* Diameter of the directed graph
 
** This is the longest of the shortest directed paths between two nodes
 
** This is the longest of the shortest directed paths between two nodes

Revision as of 20:59, 22 November 2010

Here we can put the descriptive statistics of the network:

  • Number of fully sampled nodes: 37,689
    • ie the unique "outnodes" in the edge list
  • Total number of nodes: 1,133,547
  • number of edges: 7,237,983

Conectivity

"A digraph is strongly connected if every vertex is reachable from every other following the directions of the arcs. On the contrary, a digraph is weakly connected if its underlying undirected graph is connected. A weakly connected graph can be thought of as a digraph in which every vertex is "reachable" from every other but not necessarily following the directions of the arcs. A strong orientation is an orientation that produces a strongly connected digraph." wikipedia

  • The Graph is not weakly connected
  • It contains 27 subgraphs This means that it can be broken down into at least two discrete subgraphs.
    • c.f. igraph clustering
    • There is one very large cluster containing all but 154 verticies, then 4 with size 10 - 37, 8 sized 3 - 7, 13 size 2 and one lonely dude.
  • I also grabbed the number of strongly connected subgraphs
Cluster Size 1 2 3 4 5 9 10 32464
freq 1100647 162 18 5 4 1 1 1
  • Diameter of the directed graph
    • This is the longest of the shortest directed paths between two nodes
    • R igraph
      • diameter (dg, directed = TRUE, unconnected = TRUE)
      • Was taking forever so I aborted (after 34 minutes...)
  • Total number of direct neighbours out: 7 275 672, in: 508 688, all: 7 473 273
    • For each of our 38k I calculated the number of outbound neighbours and summed it
    • R igraph:
      • sum(neighborhood.size(dg, 1, nodes=myGuys, mode="out"))
      • mode = "in", "out" or "all"
Personal tools