Skip to content. | Skip to navigation

Personal tools

Navigation

You are here: Home / Members / jhb / Working examples for the 'Graph Databases' book

Working examples for the 'Graph Databases' book

by Jörg Baach last modified Jul 25, 2013 12:12 PM
The examples in the 'Graph Databases' book don't work out of the box. I've modified them, so that they do work (for chapter 3, that is).

Graphgist version available

I've created a graphgist version of this blog post. Its the same text, but the examples work right in the browser: http://gist.neo4j.org/?6078256

The Graph Databases book and it's examples

I downloaded the 'Graph Databases' book from http://graphdatabases.com/, and even got a printed version for free at a neo4j meetup on tuesday. I like neo4j, and the book, and I am really grateful for both.

The book says, on page 27, it uses cypher in the 2.0 version. Great. I'm using neo4j-community-2.0.0-M03 anyhow, because I need to use the transactional http endpoint. That exists in 2.0 only, and only speaks cypher.

The problem: the examples (starting from page 44) don't work. You can use the create statement from page 44, but when you try to use the reading request from page 47:

START   theater=node:venue(name='Theatre Royal'),
        newcastle=node:city(name='Newcastle'),
        bard=node:author(lastname='Shakespeare')
MATCH   (newcastle)<-[:STREET|CITY*1..2]-(theater)
        <-[:VENUE]-()-[:PERFORMANCE_OF]->()-[:PRODUCTION_OF]->
        (play)<-[:WROTE_PLAY]-(bard)
RETURN  DISTINCT play.title AS play

you get the following result:

MissingIndexException: Index 'author' does not exist

Why?

Indexing using cypher

Lets look at the first line:

START   theater=node:venue(name='Theatre Royal'),

This line tries to lookup up a node in the venue index, which has 'Theatre Royal' stored for the index property name. One could also say, its using a legacy index. This index needs setting up first. You can't do that from cypher, but thats not even the main problem. To use legacy indexes, you need to manually trigger adding/updates/deletes of nodes and relationships to this index. And you can't do that from cypher either, and thats a problem. So even though we can put the shakespeare data into our graph, we don't get it into the indexes. And hence we can't search the indexes. Now we could use the command line interface, or the REST Api, but we won't, because I need to use the transactional http endpoint (with seperate rollback commands etc.) :-).

Rescue comes in the form of Schema/Labels. You can attach as many labels to a node if you like, and you can create auto updating indexes. Using cypher only. Those indexes will not only automaticly update, they also are used behind the scenes without explicit mentioning. Isn't this great? Thought so...

I prepared some modified examples below (for chapter 4). They actually run, using cypher only. Before you use them, clean out your database of the example data above, if needed:

start n=node(*) match n-[r]->m delete r,n,m;

(This actually cleans out everything, so know what you do)

Modified examples (chapter 3)

Besides updating the examples, I also add semicola at the end of phrases, so that you don't stumple upon errors every time you copy and paste (like I do). And changed the formatting a bit to my preferred style.

Creating the Shakespear Graph

Page 44:

create
    (shakespeare:Author { firstname: 'William', lastname: 'Shakespeare' }),
    (juliusCaesar:Character { title: 'Julius Caesar' }),
    (shakespeare)-[:WROTE_PLAY { year: 1599 }]->(juliusCaesar),
    (theTempest:Play { title: 'The Tempest' }),
    (shakespeare)-[:WROTE_PLAY { year: 1610}]->(theTempest),
    (rsc:Company { name: 'RSC' }),
    (production1:Production { name: 'Julius Caesar' }),
    (rsc)-[:PRODUCED]->(production1),
    (production1)-[:PRODUCTION_OF]->(juliusCaesar),
    (performance1:Performance { date: 20120729 }),
    (performance1:Performance)-[:PERFORMANCE_OF]->(production1),
    (production2:Production { name: 'The Tempest' }),
    (rsc)-[:PRODUCED]->(production2),
    (production2)-[:PRODUCTION_OF]->(theTempest),
    (performance2:Performance { date: 20061121 }),
    (performance2)-[:PERFORMANCE_OF]->(production2),
    (performance3:performance { date: 20120730 }),
    (performance3)-[:PERFORMANCE_OF]->(production1),
    (billy:Person { name: 'Billy' }),
    (review:Review { rating: 5, review: 'This was awesome!' }),
    (billy)-[:WROTE_REVIEW]->(review),
    (review)-[:RATED]->(performance1),
    (theatreRoyal:Venue { name: 'Theatre Royal' }),
    (performance1)-[:VENUE]->(theatreRoyal),
    (performance2)-[:VENUE]->(theatreRoyal),
    (performance3)-[:VENUE]->(theatreRoyal),
    (greyStreet:Street { name: 'Grey Street' }),
    (theatreRoyal)-[:STREET]->(greyStreet),
    (newcastle:City { name: 'Newcastle' }),
    (greyStreet)-[:CITY]->(newcastle),
    (tyneAndWear:County { name: 'Tyne and Wear' }),
    (newcastle)-[:COUNTY]->(tyneAndWear),
    (england:Country { name: 'England' }),
    (tyneAndWear)-[:COUNTRY]->(england),
    (stratford:City { name: 'Stratford upon Avon' }),
    (stratford)-[:COUNTRY]->(england),
    (rsc)-[:BASED_IN]->(stratford),
    (shakespeare)-[:BORN_IN]->stratford;

I assigned now labels to all node. That wouldn't have been necessary, but it felt a bit clearer to me. The labes are :Author, :Character and so forth.

Lets also create some indexes on some of the labels:

create index on :Author(firstname);
create index on :Author(lastname);
create index on :City(name);
create index on :Venue(name);

Beginning a Query

As the text talks about the START statement, and this won't be used in the same way with the label indexes, it's a bit hard to translate. But lets try.

Page 46:

match
    theater:Venue,
    newcastle:City,
    bard:Author
where
    theater.name='Theatre Royal' and
    newcastle.name='Newcastle' and
    bard.lastname='Shakespeare'

(Just like in the book, it doesn't do anything)

Declaring Information Patterns to Find

Page 46:

match
    (newcastle)<-[:STREET|CITY*1..2]-(theater)
    <-[:VENUE]-()-[:PERFORMANCE_OF]->()-[:PRODUCTION_OF]->
    (play)<-[:WROTE_PLAY]-(bard)

This is exactly the same.

Page 47:

match
    theater:Venue,
    newcastle:City,
    bard:Author,
    (newcastle)<-[:STREET|CITY*1..2]-(theater)
    <-[:VENUE]-()-[:PERFORMANCE_OF]->()-[:PRODUCTION_OF]->
    (play)<-[:WROTE_PLAY]-(bard)
where
    theater.name='Theatre Royal' and
    newcastle.name='Newcastle' and
    bard.lastname='Shakespeare'
return
    distinct play.title as play;

Contstraining Matches

Page 48:

match
    theater:Venue,
    newcastle:City,
    bard:Author,
    (newcastle)<-[:STREET|CITY*1..2]-(theater)
    <-[:VENUE]-()-[:PERFORMANCE_OF]->()-[:PRODUCTION_OF]->
    (play)<-[w:WROTE_PLAY]-(bard)
where
    theater.name='Theatre Royal' and
    newcastle.name='Newcastle' and
    bard.lastname='Shakespeare' and
    w.year > 1608
return
    distinct play.title as play;

Processing Results

Page 49:

match
    theater:Venue,
    newcastle:City,
    bard:Author,
    (newcastle)<-[:STREET|CITY*1..2]-(theater)
    <-[:VENUE]-()-[p:PERFORMANCE_OF]->()-[:PRODUCTION_OF]->
    (play)<-[:WROTE_PLAY]-(bard)
where
    theater.name='Theatre Royal' and
    newcastle.name='Newcastle' and
    bard.lastname='Shakespeare'
return
    play.title as play, count(p) as performance_count
order by
    performance_count desc;

Query Chaining

Page 50:

match
    bard:Author,
    (bard)-[w:WROTE_PLAY]->(play)
where
    bard.lastname='Shakespeare'
with
    play
order by
    w.year desc
return
    collect(play.title) as plays;

A Sensible First Iteration?

Create another index:

create index on :User(username)

Page 51:

create
    (alice:User {username: 'Alice'}),
    (bob:User {username: 'Bob'}),
    (charlie:User {username: 'Charlie'}),
    (davina:User {username: 'Davina'}),
    (edward:User {username: 'Edward'}),
    (alice)-[:ALIAS_OF]->(bob);

Page 51, 2nd:

match
    bob:User,
    charlie:User,
    davina:User,
    edward:User
where
    bob.username='Bob' and
    charlie.username='Charlie' and
    davina.username='Davina' and
    edward.username='Edward'
create
    (bob)-[:EMAILED]->(charlie),
    (bob)-[:CC]->(davina),
    (bob)-[:BCC]->(edward);

Page 52:

match
    bob:User,
    charlie:User,
    (bob)-[e:EMAILED]->(charlie)
where
    bob.username='Bob' and
    charlie.username='Charlie'
return
    e;

Second Time's the Charm

Page 53:

create
    (email_1:Email {id: '1', content: 'Hi Charlie, ... Kind regards, Bob'}),
    (bob)-[:SENT]->(email_1),
    (email_1)-[:TO]->(charlie),
    (email_1)-[:CC]->(davina),
    (email_1)-[:CC]->(alice),
    (email_1)-[:BCC]->(edward);

Dont' use this example yet, its incomplete. Instead, create some indexes:

create index on :Email(id);
create index on :Email(content);

Page 54:

match
    alice:User,
    bob:User,
    charlie:User,
    davina:User,
    edward:User
where
    alice.username='Alice' and
    bob.username='Bob' and
    charlie.username='Charlie' and
    davina.username='Davina' and
    edward.username='Edward'
create
    (email_1:Email {id: '1', content: 'email contents'}),
    (bob)-[:SENT]->(email_1),
    (email_1)-[:TO]->(charlie),
    (email_1)-[:CC]->(davina),
    (email_1)-[:CC]->(alice),
    (email_1)-[:BCC]->(edward),
    (email_2:Email {id: '2', content: 'email contents'}),
    (bob)-[:SENT]->(email_2),
    (email_2)-[:TO]->(davina),
    (email_2)-[:BCC]->(edward),
    (email_3:Email {id: '3', content: 'email contents'}),
    (davina)-[:SENT]->(email_3),
    (email_3)-[:TO]->(bob),
    (email_3)-[:CC]->(edward),
    (email_4:Email {id: '4', content: 'email contents'}),
    (charlie)-[:SENT]->(email_4),
    (email_4)-[:TO]->(bob),
    (email_4)-[:TO]->(davina),
    (email_4)-[:TO]->(edward),
    (email_5:Email {id: '5', content: 'email contents'}),
    (davina)-[:SENT]->(email_5),
    (email_5)-[:TO]->(alice),
    (email_5)-[:BCC]->(bob),
    (email_5)-[:BCC]->(edward);

I added the missing start(now match/where) at the top, and brought the create statements all into one, to shorten the code a bit.

Page 55:

match
    bob:User,
    (bob)-[:SENT]->(email)-[:CC]->(alias),
    (alias)-[:ALIAS_OF]->(bob)
where
    bob.username='Bob'
return
    email;

Evolving the Domain

Another theoretical example, don't use it, on Page 57:

match email:Email
where emai.id='1234'
create (alice)-[:REPLIED_TO]->(email);
create (davina)-[:FORWARDED]->(email)-[:TO]->(charlie);

Page 57, bottom:

match
    alice:User,
    bob:User,
    charlie:User,
    davina:User,
    edward:User
where
    alice.username='Alice' and
    bob.username='Bob' and
    charlie.username='Charlie' and
    davina.username='Davina' and
    edward.username='Edward'
 create
    (email_6:Email {id: '6', content: 'email'}),
    (bob)-[:SENT]->(email_6),
    (email_6)-[:TO]->(charlie),
    (email_6)-[:TO]->(davina),
    (reply_1:Email {id: '7', content: 'response'}),
    (reply_1)-[:REPLY_TO]->(email_6),
    (davina)-[:SENT]->(reply_1),
    (reply_1)-[:TO]->(bob),
    (reply_1)-[:TO]->(charlie),
    (reply_2:Email {id: '8', content: 'response'}),
    (reply_2)-[:REPLY_TO]->(email_6),
    (bob)-[:SENT]->(reply_2),
    (reply_2)-[:TO]->(davina),
    (reply_2)-[:TO]->(charlie),
    (reply_2)-[:CC]->(alice),
    (reply_3:Email {id: '9', content: 'response'}),
    (reply_3)-[:REPLY_TO]->(reply_1),
    (charlie)-[:SENT]->(reply_3),
    (reply_3)-[:TO]->(bob),
    (reply_3)-[:TO]->(davina),
    (reply_4:Email {id: '10', content: 'response'}),
    (reply_4)-[:REPLY_TO]->(reply_3),
    (bob)-[:SENT]->(reply_4),
    (reply_4)-[:TO]->(charlie),
    (reply_4)-[:TO]->(davina);

Page 58, bottom:

match
    email:Email,
    p=(email)<-[:REPLY_TO*1..4]-()<-[:SENT]-(replier)
where
    email.id='6'
return
    replier.username AS replier, length(p) - 1 AS depth
order by
    depth;

Page 60:

match
    alice:User,
    bob:User,
    charlie:User,
    davina:User
where
    alice.username='Alice' and
    bob.username='Bob' and
    charlie.username='Charlie' and
    davina.username='Davina'
create
    (email_11:Email {id: '11', content: 'email'}),
    (alice)-[:SENT]->(email_11)-[:TO]->(bob),
    (email_12:Email {id: '12', content: 'email'}),
    (email_12)-[:FORWARD_OF]->(email_11),
    (bob)-[:SENT]->(email_12)-[:TO]->(charlie),
    (email_13:Email {id: '13', content: 'email'}),
    (email_13)-[:FORWARD_OF]->(email_12),
    (charlie)-[:SENT]->(email_13)-[:TO]->(davina);

Page 61:

match
    email:Email,
    (email)<-[f:FORWARD_OF*]-()
where
    email.id='11'
return
    count(f);

Other approaches

node_auto_index

One other possibility would be to use the node_auto_index instead (by uncommenting the related statements in the neo4j.properties file, and setting the appropriate properties to be indexed).

This would then turn the query:

START   theater=node:venue(name='Theatre Royal') return theater;

into:

START   theater=node:node_auto_index(name='Theatre Royal') return theater;

This would be doable I guess.One could not only index name, but a property called label as well, to avoid namespace issues. But I guess this would

  1. contradict the efforts of labels in the 2.0 version, and
  2. lead to one gigantic index for all of the properties of all of the nodes.

So even though it works for the book, don't see it as a good way forward.

Filed under:
Jerome
Jerome says:
Jan 27, 2014 03:22 PM

Thanks for this. The confirmed errata references the author's website of updated examples which use the node_auto_index. However they are not label bound so could end up indexing nodes that do not need to be indexed because they have same property name as nodes that do need indexing. For example, author's site has node_keys_indexable=name because we want to index venue and city. But any future node types we add that happen to have a name property would also get indexed which would be wasteful if we had no need to index them.

Yannis Haralambous
Yannis Haralambous says:
Jan 30, 2014 11:52 AM

Thanks for writing that down, it is a big help for those starting to use Cypher.
A small correction: in the code you give for the bottom of page 58 you need parentheses around the node email:Email. Hence the correct code would be:
match
(email:Email),
p=(email)<-[:REPLY_TO*1..4]-()<-[:SENT]-(replier)
where
email.id='6'
return
replier.username AS replier, length(p) - 1 AS depth
order by
depth;

Another remark. It is interesting to create indexes on labels, **but** there is a serious drawback: these indexes are not created immediately, but whenever neo4j decides so. In my case, I created the index for the small example of people and emails (15 rows, 40 relationships) and half an hour later the index still has not been created...
I'm preparing a class on graph databases, and will use neo4j and Cypher. But seeing that indexes cannot be created on demand, I will not at all talk about START, and will use only MATCH... What a pity

Thomas
Thomas says:
Jul 25, 2014 03:22 PM

I had/have exactly the same probolems and thankfully found your modified CYPHER.
One problem I found was you declare performanc1:Performance twice
(performance1:Performance { date: 20120729 }),
(performance1:Performance)-[:PERFORMANCE_OF]->(production1),...
which leads to an error when loading the database.

jack
jack says:
Aug 23, 2014 07:27 AM

something is different in neo4j 2.1?
With the first example on page 47 I get:
'Parentheses are required to identify nodes in patterns'
changed to:
match
(theater:Venue),
(newcastle:City),
(bard:Author),
(newcastle)<-[:STREET|CITY*1..2]-(theater)
<-[:VENUE]-()-[:PERFORMANCE_OF]->()-[:PRODUCTION_OF]->
(play)<-[:WROTE_PLAY]-(bard)
where
theater.name='Theatre Royal' and
newcastle.name='Newcastle' and
bard.lastname='Shakespeare'
return
distinct play.title as play;

Works

Anees Mehdi
Anees Mehdi says:
May 12, 2015 09:29 AM

I think there is a small bug in CREATE query:
...
(production1)-[:PRODUCTION_OF]->(juliusCaesar),
(performance1:Performance { date: 20120729 }),
(performance1:Performance)-[:PERFORMANCE_OF]->(production1)
...
This will generate an error that 'performance1' already exists in the context or something similar. The solution is to remove ':Performance' from the second line i.e., the corrected code should look like
(production1)-[:PRODUCTION_OF]->(juliusCaesar),
(performance1:Performance { date: 20120729 }),
(performance1)-[:PERFORMANCE_OF]->(production1)

At least on my server, this is an error. May be I have enabled UNIQUENESS constraint of nodes or what.

Add comment

You can add a comment by filling out the form below. Plain text formatting.

Question: What is 23 plus 19?
Your answer: