Loading and Exporting Data With Pandas

Loading and Exporting Data With Pandas
Photo by Mika Baumeister / Unsplash

One of the features which makes networkx really useful is the ability to import and export data from a variety of sources. This is especially useful as data can come in all different shapes and sizes which may not always be consistent. The purpose of this blog post is to walk through some of the standard techniques for reading and writing graphs using networkx and pandas.

To allow for more flexibility and control, networkx supports the ability to convert to and from pandas data frames. When combined this allows for more options when reading and writing data. Using pandas alone produces a total of 399 distinct combinations.

Network Representations

Before we get into how to import/export data it's worth going through some of the ways in which graphs can be represented in data. As mentioned in the previous blog posts, networks (also known as graphs) are a collection of nodes and edges. We are essentially representing two things - an entity (node) and a relationship (an edge).

Edge list

An edge list is exactly what it says - it's a list of edges. Simple. They usually come in the form of a table with two columns. One column for source, and one for the target. Depending on the type of graph it might feature multiple columns which contain attributes relating to an edge. This may include things like a timestamp.

SourceTarget

| A | B | | A | C | | C | D | | D | C |

Adjacency matrix

An adjacency matrix is an n-by-n square matrix used to indicate the presence of an edge between nodes. For example, by reading the graph by row then column, a '1' is used to indicate an edge between the corresponding row then column.

ABCD
A
0

| 1 | 1 | 0 | | B | 0 | 0 | 0 | 0 | | C | 0 | 0 | 0 | 1 | | D | 0 | 0 | 1 | 0 |

Using NetworkX

As shown above, there are quite a few ways for importing and exporting networks. To keep things simple we will go through some of the most popular functions. To begin, we'll look at some of the features which are integrated into networkx.

Example 1: Reading and Writing Edge Lists

By far the easiest and simplest approach is to store data in a simple text file. This can be achieved using the read_edgelist and write_edgelist functions within networkx. To save an edge list to file, the write_edgelist function takes a graph as input, and the path of the output file (' example.edgelist '). Here's a simple example using the graph above.

G = nx.DiGraph()

G.add_edge('A', 'B')
G.add_edge('A', 'C')
G.add_edge('C', 'D')
G.add_edge('D', 'C')

nx.write_edgelist(G, 'example.edgelist', data=False)

Note : This function also takes other parameters to control for things such as edge attributes. In our case, we set data=False as we don't need to save the edge attributes as we don't have any. You can also just things such as the delimiters too. By default, columns are separated by a space.

The output of this graph looks something like this...

A B
A C
C D
D C

Now that the data has been saved, we can read this using the read_edgelist function. This is as simple as doing the following.

>>> H = nx.read_edgelist('example.edgelist', create_using=nx.DiGraph)
>>> H.edges()
OutEdgeView([('A', 'B'), ('A', 'C'), ('C', 'D'), ('D', 'C')])

Note : When reading in a graph it's important to ensure that you've got the right graph type defined. By default, networkx uses a simple undirected graph nx.Graph whereas in our case we explicitly mention that this is a directed graph by setting create_using=nx.DiGraph .

Example 2: GEXF

In some cases when you're exporting a graph you're doing so with the intention of analysing it with other software. For example, many users use Gephi to visualise their networks as this provides a whole suite of tools to allow them to create presentable graphs quickly and easily. Gephi has made an appearance on this blog. See below:

Networkx allows us to import / export graphs directly to a compatible file format for Gephi using the read_gexf / write_gexf functions.

Example 3: JSON

One of the more complex ways for exporting graphs is to use JSON as a way of serialising a network. This approach is typically used for those who wish to use graphs on the Internet either through an API or an interactive visualisation package such as the D3.js .

Using pandas

As mentioned previously, pandas provide multiple ways of import/export data. Pandas is primarily used to provide interactive data frames within a Python environment. These data frames are represented as tabular data. This is particularly ideal considering we are working with edge lists. To export a graph to a pandas data frame, it's as simple as using to_pandas_edgelist .

>>> G = nx.DiGraph()
>>> G.add_edge('A', 'B')
>>> G.add_edge('A', 'C')
>>> G.add_edge('C', 'D')
>>> G.add_edge('D', 'C')
>>> df = nx.to_pandas_edgelist(G)
>>> df
  source target
0      A      B
1      A      C
2      C      D
3      D      C

Why use pandas? May want to do additional processing such as filtering and querying

Now that we've got a panda data frame, we can do additional processing such as filtering and querying our edge list. For example, if we wanted to examine edges where 'A' is the target...

>>> df[df['source'] == 'A']
  source target
0      A      B
1      A      C

By using pandas, you can perform more complex operations but for the purpose of this example, we will keep things simple. Let's say we want to read this edge list back into a networkx graph, all we need to do is use from_pandas_edgelist . Note : As mentioned before, it's important to make sure we get the graph type correct hence why we're using create_using=nx.DiGraph .

>>> df_new = df[df['source'] == 'A']
>>> G = nx.from_pandas_edgelist(df_new, create_using=nx.DiGraph)
>>> G.edges()
OutEdgeView([('A', 'B'), ('A', 'C')])

As we can see, we now have a new graph which we modified using pandas data frames. Also, it's worth pointing out that by using pandas we've also opened up our opportunities to export our graphs into many other formats (see above).

Final Thoughts and Conclusions

In this blog post, we explored a few ways in which graphs can be imported and exported to different formats. We also covered some of the ways in which graphs can be represented using edge lists and adjacency matrices.

This blog post provides a very basic overview of how to import and export data with a few simple transformations with the aid of pandas. By using this approach, there are many more operations we can perform as shown in the figures is above.