This is a followup to https://baach.de/Members/jhb/neo4j-performance-compared-to-mysql
History
Last summer I played around with neo4j, and did some performance meassurements from a python web developers point of view. This means my requirements are:
- I want to use the graph database from python, in a way that supports transactions
- The database needs to run in a networked (client/server) setup
I came accross some performance claims in the "graph databases" book, which I could not replicate at all. (see the link above)
That got me playing around with a very small pure python layer on top of ZODB. The last weeks I resumed that project, and the outcome is graphagus, a little property graph database for python (see https://pypi.python.org/pypi/graphagus).
Running the performance tests
I assume you have a virtualenv setup with graphagus easy_installed, and use the python of this environment. In the example directory of the source (also available on github: https://github.com/jhb/graphagus) you find the example directory. In there:
#create the data file for 100000 people
python friendsdata.py 100000
#import the data file into a graphagus database, called Friends.fs
python import_friends.py
#in the first console run zeoserver
runzeo -f Friends.fs -a 1234
#in the second console run the tests, depth 5, 10 runs
python query_friends.py 100000 5 10 zeo client1
Results
The tables extend the tables on the neo4j vs mysql page, and integrate the numbers from the section about 'improving results': https://baach.de/Members/jhb/neo4j-performance-compared-to-mysql/view#section-16. So it compares graphagus to the best results I could get from neo4j. Each number for graphagus is the average of 10 runs.
Querying 100k nodes
depth neo4j mysql python graphagus graphagus
1GB cc
1 0.010 0.000 0.000 0.007 0.006
2 0.028 0.001 0.000 0.052 0.012
3 0.376 0.072 0.009 0.351 0.077
4 7.278 3.600 0.330 0.897 0.571
5 18.225 180.143 0.758 1.568 1.251
Querying 1 million nodes
depth neo4j mysql python graphagus graphagus
4GB cc
1 0.010 0.000 0.000 0.052 0.022
2 0.017 0.002 0.000 0.194 0.090
3 0.484 0.082 0.012 2.018 1.835
4 18.950 5.598 1.079 4.610 3.712
5 462.466 300.000 9.791 15.440 12.385
In both tables the column headings mean:
- neo4j: using the transactional rest endpoint, with the query for improved results
- mysql: using the mysql-python connector
- python: pure python scripts running on the internal data structure
- graphagus: using the standard client cache size
- graphagusp XX cc: graphagus using a XX GB client disk cache
One could argue that I am comparing apple and oranges - the neo4j test uses cypher, which needs to be parsed, while the graphagus tests use something much closer to the internal neo4j api. This obviously could cause quite a difference in performance.
One could also argue that these are realistic numbers - for my use case, and my requirements. I don't see a way to use the internal neo4j api within the scope of my requirements. So from the python web developers point of view those numbers might show what to expect from the different approaches.
I guess I am quite happy with the results for the 426 lines of python code in graphagus :-)