Neo4j fulltext search ideas


Neo4j’s fulltext search is a never ending story. What I would like:

Sounds simple, doesn’t it?

Test data

I use the following sample node:

         float: 2.3,
         string: "foobar",
         boolean: true,
         date: date(),
         datetime: datetime(),
         time: time(),
         duration: duration('P14DT16H12M'),
         point2d: point({latitude:13.43, longitude:56.21}),
         point3d: point({latitude:13.43, longitude:56.21, height: 2}),
         pointC2d: point({x:13.43, y:56.21}),
         pointC3d: point({x:13.43, y:56.21, z: 2}),
         list_integer: [1,2,3],
         list_float: [1.1,1.2,3.3],
         list_string: ["foo","bar"],
         list_boolean: [True, False],
         list_date: [date(),date()],
         list_datetime: [datetime(),datetime()],
         list_time: [time(), time()],
         list_duration: [duration('P1DT1H1M'),duration('P2DT2H2M')],
         list_point2d: [point({latitude:13.43, longitude:56.21}),
                        point({latitude:14.43, longitude:57.21})],
         list_point3d: [point({latitude:13.43, longitude:56.21, height: 2}),
                        point({latitude:14.43, longitude:57.21, height: 3})],       
         list_pointC2d: [point({x:13.43, y:56.21}),point({x:14.43, y:57.21})],
         list_pointC3d: [point({x:13.43, y:56.21, z: 2}),point({x:14.43, y:57.21, z: 3})]
         }) return id(n)

This covers, to my knowledge, all possible data types for property values.

Use fulltext search on all fields. Fail

As per the neo4j fulltext search chapter one can define a fulltext index on a node label, across multiple properties. So, why not have a fulltext search on all labels, using all properties? A little helper to try this might look like this This could be done in cypher as well, but I am faster in python.


from settings import config
from neo4j_db import Connection

connection = Connection(config.neo4j, config.user, config.password)
graph = connection.graph(debug=config.debug)

graph.run('DROP INDEX _searchableText if exists')
labels = [r['label'] for r  in graph.run('call db.labels()')]
propnames = [r['propertyKey'] for r  in graph.run('call db.propertyKeys()')]
labelstring = '|'.join(labels)
propstring = ','.join([f'n.`{p}`' for p in propnames])
statement = f"CREATE FULLTEXT INDEX _searchableText FOR (n:{labelstring}) ON EACH [{propstring}]"

This approach sounds like a nice idea, but it turns out that neo4j doesn’t convert all it’s supported data types to string, hence you end up with an incomplete search.

Use special fulltext property

This approach needs to modify the content (which I don’t like doing).

match (n) 
with apoc.map.flatten(n) as flat,n  
set n.`_searchableText`= apoc.text.join(
    [k in keys(flat) where k <> '_searchableText'| k+' '+apoc.text.join(apoc.convert.toStringList(flat[k]),' ')],'  ')

So this gives an *_searchableText* The name is inspired by zope/plone

property containing this string:

date 2022-05-23  list_time 11:51:47.772Z 11:51:47.772Z  string foobar  list_duration P1DT1H1M P2DT2H2M  list_float 1.1 1.2 3.3  list_datetime 2022-05-23T11:51:47.772Z[UTC] 2022-05-23T11:51:47.772Z[UTC]  integer 23  pointC3d point({x: 13.43, y: 56.21, z: 2.0, crs: 'cartesian-3d'})  float 2.3  list_point2d point({x: 56.21, y: 13.43, crs: 'wgs-84'}) point({x: 57.21, y: 14.43, crs: 'wgs-84'})  duration P14DT16H12M  list_pointC3d point({x: 13.43, y: 56.21, z: 2.0, crs: 'cartesian-3d'}) point({x: 14.43, y: 57.21, z: 3.0, crs: 'cartesian-3d'})  datetime 2022-05-23T11:51:47.772Z[UTC]  list_integer 1 2 3  point2d point({x: 56.21, y: 13.43, crs: 'wgs-84'})  foobar3 123456789012345435345345345342  foobar2 2022-08-03  pointC2d point({x: 13.43, y: 56.21, crs: 'cartesian'})  list_date 2022-05-23 2022-05-23  list_boolean true false  list_point3d point({x: 56.21, y: 13.43, z: 2.0, crs: 'wgs-84-3d'}) point({x: 57.21, y: 14.43, z: 3.0, crs: 'wgs-84-3d'})  list_pointC2d point({x: 13.43, y: 56.21, crs: 'cartesian'}) point({x: 14.43, y: 57.21, crs: 'cartesian'})  boolean true  list_string foo bar  name propnode  time 11:51:47.772Z  point3d point({x: 56.21, y: 13.43, z: 2.0, crs: 'wgs-84-3d'})

Next, we need to create the fulltext index. Again, this could be either done on all node labels, or we assign one node label to all nodes, and index on this label.

match (n) set n:_
create fulltext index _searchableText for (n:_) on each [n._searchableText]

With this in place, we can search:

call db.index.fulltext.queryNodes('_searchableText','P2DT2H2M') yield node, score return node

This works, hurray.

But neo4j does not recommend it, though:

“Neo4j is not the db to use if full text searches across the entire db is an important use case. Use Neo4j for graph use cases. Use Elasticsearch or similar technology if you need full text search over your entire data set.”

This begs the question: what exactly is the intended use case for the fulltext search in neo4j?

One can integrate neo4j with elasticsearch, but this doesn’t play nicely with transactions.

So for now, I think this might be a good enough solution for use cases where not too many nodes with not too many values are involved.