Skip to content. | Skip to navigation

Personal tools


You are here: Home / Members / jhb / neo4j performance compared to graphagus

neo4j performance compared to graphagus

by Jörg Baach last modified Jul 23, 2015 03:16 PM
A performance comparison between neo4j and graphagus, a small property graph database written in python as a thin layer on top of ZODB.

This is a followup to


Last summer I played around with neo4j, and did some performance meassurements from a python web developers point of view. This means my requirements are:

  • I want to use the graph database from python, in a way that supports transactions
  • The database needs to run in a networked (client/server) setup

I came accross some performance claims in the "graph databases" book, which I could not replicate at all. (see the link above)

That got me playing around with a very small pure python layer on top of ZODB. The last weeks I resumed that project, and the outcome is graphagus, a little property graph database for python (see

Running the performance tests

I assume you have a virtualenv setup with graphagus easy_installed, and use the python of this environment. In the example directory of the source (also available on github: you find the example directory. In there:

#create the data file for 100000 people
python 100000

#import the data file into a graphagus database, called Friends.fs

#in the first console run zeoserver
runzeo -f Friends.fs -a 1234

#in the second console run the tests, depth 5, 10 runs
python 100000 5 10 zeo client1


The tables extend the tables on the neo4j vs mysql page, and integrate the numbers from the section about 'improving results': So it compares graphagus to the best results I could get from neo4j. Each number for graphagus is the average of 10 runs.

Querying 100k nodes

depth    neo4j        mysql       python    graphagus   graphagus
                                                           1GB cc

1        0.010        0.000        0.000        0.007       0.006
2        0.028        0.001        0.000        0.052       0.012
3        0.376        0.072        0.009        0.351       0.077
4        7.278        3.600        0.330        0.897       0.571
5       18.225      180.143        0.758        1.568       1.251

Querying 1 million nodes

depth    neo4j        mysql       python    graphagus   graphagus
                                                           4GB cc

1        0.010        0.000        0.000        0.052       0.022
2        0.017        0.002        0.000        0.194       0.090
3        0.484        0.082        0.012        2.018       1.835
4       18.950        5.598        1.079        4.610       3.712
5      462.466      300.000        9.791       15.440      12.385

In both tables the column headings mean:

  • neo4j: using the transactional rest endpoint, with the query for improved results
  • mysql: using the mysql-python connector
  • python: pure python scripts running on the internal data structure
  • graphagus: using the standard client cache size
  • graphagusp XX cc: graphagus using a XX GB client disk cache

One could argue that I am comparing apple and oranges - the neo4j test uses cypher, which needs to be parsed, while the graphagus tests use something much closer to the internal neo4j api. This obviously could cause quite a difference in performance.

One could also argue that these are realistic numbers - for my use case, and my requirements. I don't see a way to use the internal neo4j api within the scope of my requirements. So from the python web developers point of view those numbers might show what to expect from the different approaches.

I guess I am quite happy with the results for the 426 lines of python code in graphagus :-)

Filed under: ,
Add comment

You can add a comment by filling out the form below. Plain text formatting.

Question: What is 23 plus 19?
Your answer: