Machine Learning/Kaggle Social Network Contest/load data

From Noisebridge
< Machine Learning | Kaggle Social Network Contest
Revision as of 22:40, 18 November 2010 by Jjhale (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

How to load the network into networkx

There is a network analysis package for Python called networkx. This package can be installed using easy_install.

The network can be loaded using the read_edgelist function in networkx eg:

import networkx as nx
DG = nx.read_edgelist('social_train.csv', create_using=nx.DiGraph(), nodetype=int, delimiter=',')

Loading 1M rows of the edge list took 21s on a MacPro with 3GB mem and 2.8Ghz Quad-Core processor. I can do this in 15s with the following.

An alternate method of loading it is the follow which seems to run quicker for me (Joe).

import networkx as nx
import csv
import time

t0 = time.clock()
DG = nx.DiGraph()

netcsv = csv.reader(open('social_train.csv', 'rb'), delimiter=',')

for row in netcsv:
    tmp1 = int(row[0])
    tmp2 = int(row[1])
    DG.add_edge(tmp1, tmp2)

print "Loaded in ", str(time.clock() - t0), "s"
Personal tools