Neo4j’s fulltext search is a never ending story. What I would like:
- search across all nodes and edges
- search all property values, and ideally names
Sounds simple, doesn’t it?
Test data
I use the following sample node:
MERGE (n:PROPNODE
{name:'propnode',
integer:23,
float: 2.3,
string: "foobar",
boolean: true,
date: date(),
datetime: datetime(),
time: time(),
duration: duration('P14DT16H12M'),
point2d: point({latitude:13.43, longitude:56.21}),
point3d: point({latitude:13.43, longitude:56.21, height: 2}),
pointC2d: point({x:13.43, y:56.21}),
pointC3d: point({x:13.43, y:56.21, z: 2}),
list_integer: [1,2,3],
list_float: [1.1,1.2,3.3],
list_string: ["foo","bar"],
list_boolean: [True, False],
list_date: [date(),date()],
list_datetime: [datetime(),datetime()],
list_time: [time(), time()],
list_duration: [duration('P1DT1H1M'),duration('P2DT2H2M')],
list_point2d: [point({latitude:13.43, longitude:56.21}),
point({latitude:14.43, longitude:57.21})],
list_point3d: [point({latitude:13.43, longitude:56.21, height: 2}),
point({latitude:14.43, longitude:57.21, height: 3})],
list_pointC2d: [point({x:13.43, y:56.21}),point({x:14.43, y:57.21})],
list_pointC3d: [point({x:13.43, y:56.21, z: 2}),point({x:14.43, y:57.21, z: 3})]
}) return id(n)
This covers, to my knowledge, all possible data types for property values.
Use fulltext search on all fields. Fail
As per the neo4j
fulltext search chapter one can define a fulltext index on a node
label, across multiple properties. So, why not have a fulltext search on
all labels, using all properties? A little helper to
try this might look like this This could be done in cypher as well, but I am faster
in python.
:
from settings import config
from neo4j_db import Connection
= Connection(config.neo4j, config.user, config.password)
connection = connection.graph(debug=config.debug)
graph
'DROP INDEX _searchableText if exists')
graph.run(= [r['label'] for r in graph.run('call db.labels()')]
labels = [r['propertyKey'] for r in graph.run('call db.propertyKeys()')]
propnames = '|'.join(labels)
labelstring = ','.join([f'n.`{p}`' for p in propnames])
propstring = f"CREATE FULLTEXT INDEX _searchableText FOR (n:{labelstring}) ON EACH [{propstring}]"
statement graph.run(statement)
This approach sounds like a nice idea, but it turns out that neo4j doesn’t convert all it’s supported data types to string, hence you end up with an incomplete search.
Use special fulltext property
This approach needs to modify the content (which I don’t like doing).
match (n)
with apoc.map.flatten(n) as flat,n
set n.`_searchableText`= apoc.text.join(
[k in keys(flat) where k <> '_searchableText'| k+' '+apoc.text.join(apoc.convert.toStringList(flat[k]),' ')],' ')
So this gives an *_searchableText* The name is inspired by zope/plone
property containing this string:
date 2022-05-23 list_time 11:51:47.772Z 11:51:47.772Z string foobar list_duration P1DT1H1M P2DT2H2M list_float 1.1 1.2 3.3 list_datetime 2022-05-23T11:51:47.772Z[UTC] 2022-05-23T11:51:47.772Z[UTC] integer 23 pointC3d point({x: 13.43, y: 56.21, z: 2.0, crs: 'cartesian-3d'}) float 2.3 list_point2d point({x: 56.21, y: 13.43, crs: 'wgs-84'}) point({x: 57.21, y: 14.43, crs: 'wgs-84'}) duration P14DT16H12M list_pointC3d point({x: 13.43, y: 56.21, z: 2.0, crs: 'cartesian-3d'}) point({x: 14.43, y: 57.21, z: 3.0, crs: 'cartesian-3d'}) datetime 2022-05-23T11:51:47.772Z[UTC] list_integer 1 2 3 point2d point({x: 56.21, y: 13.43, crs: 'wgs-84'}) foobar3 123456789012345435345345345342 foobar2 2022-08-03 pointC2d point({x: 13.43, y: 56.21, crs: 'cartesian'}) list_date 2022-05-23 2022-05-23 list_boolean true false list_point3d point({x: 56.21, y: 13.43, z: 2.0, crs: 'wgs-84-3d'}) point({x: 57.21, y: 14.43, z: 3.0, crs: 'wgs-84-3d'}) list_pointC2d point({x: 13.43, y: 56.21, crs: 'cartesian'}) point({x: 14.43, y: 57.21, crs: 'cartesian'}) boolean true list_string foo bar name propnode time 11:51:47.772Z point3d point({x: 56.21, y: 13.43, z: 2.0, crs: 'wgs-84-3d'})
Next, we need to create the fulltext index. Again, this could be either done on all node labels, or we assign one node label to all nodes, and index on this label.
match (n) set n:_
create fulltext index _searchableText for (n:_) on each [n._searchableText]
With this in place, we can search:
call db.index.fulltext.queryNodes('_searchableText','P2DT2H2M') yield node, score return node
This works, hurray.
But neo4j does not recommend it, though:
“Neo4j is not the db to use if full text searches across the entire db is an important use case. Use Neo4j for graph use cases. Use Elasticsearch or similar technology if you need full text search over your entire data set.”
This begs the question: what exactly is the intended use case for the fulltext search in neo4j?
One can integrate neo4j with elasticsearch, but this doesn’t play nicely with transactions.
So for now, I think this might be a good enough solution for use cases where not too many nodes with not too many values are involved.