This is a followup to https://baach.de/Members/jhb/neo4j-performance-compared-to-mysql

History

Last summer I played around with neo4j, and did some performance meassurements from a python web developers point of view. This means my requirements are:

I want to use the graph database from python, in a way that supports transactions
The database needs to run in a networked (client/server) setup

I came accross some performance claims in the "graph databases" book, which I could not replicate at all. (see the link above)

That got me playing around with a very small pure python layer on top of ZODB. The last weeks I resumed that project, and the outcome is graphagus, a little property graph database for python (see https://pypi.python.org/pypi/graphagus).

Running the performance tests

I assume you have a virtualenv setup with graphagus easy_installed, and use the python of this environment. In the example directory of the source (also available on github: https://github.com/jhb/graphagus) you find the example directory. In there:

#create the data file for 100000 people
python friendsdata.py 100000

#import the data file into a graphagus database, called Friends.fs
python import_friends.py

#in the first console run zeoserver
runzeo -f Friends.fs -a 1234

#in the second console run the tests, depth 5, 10 runs
python query_friends.py 100000 5 10 zeo client1

Results

The tables extend the tables on the neo4j vs mysql page, and integrate the numbers from the section about 'improving results': https://baach.de/Members/jhb/neo4j-performance-compared-to-mysql/view#section-16. So it compares graphagus to the best results I could get from neo4j. Each number for graphagus is the average of 10 runs.

Querying 100k nodes

depth    neo4j        mysql       python    graphagus   graphagus
                                                           1GB cc

1        0.010        0.000        0.000        0.007       0.006
2        0.028        0.001        0.000        0.052       0.012
3        0.376        0.072        0.009        0.351       0.077
4        7.278        3.600        0.330        0.897       0.571
5       18.225      180.143        0.758        1.568       1.251

Querying 1 million nodes

depth    neo4j        mysql       python    graphagus   graphagus
                                                           4GB cc

1        0.010        0.000        0.000        0.052       0.022
2        0.017        0.002        0.000        0.194       0.090
3        0.484        0.082        0.012        2.018       1.835
4       18.950        5.598        1.079        4.610       3.712
5      462.466      300.000        9.791       15.440      12.385

In both tables the column headings mean:

neo4j: using the transactional rest endpoint, with the query for improved results
mysql: using the mysql-python connector
python: pure python scripts running on the internal data structure
graphagus: using the standard client cache size
graphagusp XX cc: graphagus using a XX GB client disk cache

One could argue that I am comparing apple and oranges - the neo4j test uses cypher, which needs to be parsed, while the graphagus tests use something much closer to the internal neo4j api. This obviously could cause quite a difference in performance.

One could also argue that these are realistic numbers - for my use case, and my requirements. I don't see a way to use the internal neo4j api within the scope of my requirements. So from the python web developers point of view those numbers might show what to expect from the different approaches.

I guess I am quite happy with the results for the 426 lines of python code in graphagus :-)