First steps with memgraph

My personal notes on my first steps with memgraph.

It is quite exiting to play with a new graph database that supports cypher and transactions, and that is open source.

Running it

I tend to run memgraph-platform like this:

docker run --name mg -it -p 7687:7687 -p 3000:3000 \
 -v mg_lib:/var/lib/memgraph \
 -v mg_etc:/etc/memgraph \
 -e MEMGRAPH="--bolt-server-name-for-init=Neo4j/" \
 memgraph/memgraph-platform

Compatibility

The “–bolt-server-name-for-init” in the above run statement allows me two things: I can connect with a neo4j browser, e.g. from “http://browser-canary.graphapp.io/” to the local memgraph. And I can use the bolt python driver to connect to the memgraph database, I needed to use space ” ” as login and password, btw.

To be fair it also works the other way around: you can ‘connect manually’ memgraph lab to a running neo4j instance.

To me, as a user, this is a good thing™: I can now switch databases and tools as I like. To me, these are the first steps towards a true ecosystem. It is a bit what SQL feels like - you can choose your own tools and can recombine elements of your workflow. Standardising the language opencypher goes into the same direction.

MAGE on speed

Coming from a python background, I find the ability to run python code on the server side quite appealing. To my excitement one can also speed up calculations quite a bit using numba.jit. For this to work one has to install numba in the docker container first:

docker exec -it mg bash

root@d7f0454a9085:/# pip3 install numba

(Creating a docker image that does that for you is left as an exercise to the reader)

I use a modified version of real python’s code to calculate fibonacci numbers for this. I create a new query module speedup

import random
import time

import mgp
from numba import jit


@jit()  # speedup of factor 400
def fibonacci_of(n):
  if not (isinstance(n, int) and n >= 0):
    raise ValueError(f'Positive integer number expected, got "{n}"')

  if n in {0, 1}:
    return n

  previous, fib_number = 0, 1

  for _ in range(2, n + 1):
    # Compute the next Fibonacci number, remember the previous one
    previous, fib_number = fib_number, previous + fib_number
  return fib_number


@mgp.function
def doit(ctx: mgp.FuncCtx):
  start = time.time()
  out = 0
  for i in range(500):
    out += fibonacci_of(random.randint(20000, 30000))
  end = time.time()
  taken = start - end
  return f"{taken} {str(out)[:20]} {end}"

Once numba is installed and the query module is in place it can be called as:

RETURN speedup.doit();

The @jit line gives a speedup factor of 400!

The functions doit and fibonacci_of had to be separated, because numba.jit doesn’t seem to handle the mgp specific code. Even though this example is probably is of no use for graph use cases, it shows me:

Calculations in python can be made much faster.
I can use even compiled python modules to extend functionality, which feels like using python the way it was meant to be.

Other observations

Even though it has persistence with transactions, the database can only handle data that fits into RAM.
At the point of writing, memgraph doesn’t do indexes on edges. This might not be an issue, depending on your use. For me, it is somewhat relevant.
The indexing functionality is basic: there are no special indexes for e.g. geometric calculations, fulltext indexes etc.
Memgraph lab (which sadly is not opensource)
- Good:
  - Good performance of the graph visualization, which is powered by orb, which is opensource. No WebGL, though.
  - mMp underlay: you can project nodes onto a map, which is extremely helpful if you are handling geo-related data.
  - Query history is nice.
  - A script language to style graph results. The language has lots of functions, including conditionals. This is fantastic, because you don’t need to adapt your data to styling.
  - Inclusion of networkx - this means that a rather large collection of graph algorithms is available (even though not as a fast as c implemented counterparts). But still.
- Room for improvement:
  - One can’t easily use the arrow keys in the query editor to go to old queries.
  - One can’t style the graph schema. You can’t do it in lab directly, and there is no way to return the schema as a projection or virtual nodes, like apoc.meta.schema allows for neo4j, so you also can’t style it like other results from normal queries. In the past, I spent weeks styling schemas, over and over. So seeing the graph styling language mentioned above, but not being able to use it on schema is a bit frustrating.
From what I can see the community version lacks certain enterprise functionality, but doesn’t have an arbitrary performance limit.

Summary

If labeled property graphs and cypher are your thing, give memgraph a go. I think it will be interesting to see how it holds up in real projects.