Home

Anyone can say Anything about Anything - in LPG

Updated:
Created:

AAA

I quite like the idea of AAA in RDF: Anyone can say Anything about Anything (AAA). It comes from the RDF world, and learned about it in the Working Ontologist book. I have the feeling that it is not really possible to store the AAA in an RDF database (see RDF Confusion).

Having AAA in our database means that we need to store contradicting and/or redundant information about things in our database. Let’s have a look at the idea on how to store it in an LPG graph, e.g. using neo4j or memgraph.

Storing the information

Let’s say we have persons, Alice and Bob, and that Alice likes Bob. Or so we think:

The stored information

The green boxes are nodes, the strings on top of the box are the labels, followed by relevant properties. The first string on relations describes the type of the relation, again followed by properties. _id is the internal id of an object, so that we can talk about it.

Does the Person called Alice maybe also have the name Ally? Are the same persons? Is she poly and likes both Bob and Charlie, or not, and since when? Lots of questions…

Aggregation creates an image of reality

In order to create a picture of reality, we need to pick nodes and aggregate the information in them. The database holds contradicting information. We need to select which nodes to rely on, and that selection paints our picture of the world.

Aggregation information to create a perspective

The questions mentioned need to be handled by the aggregation mechanism - it is outside the system to give a clean answer. Having meta information around could be helpful - if we knew that ‘Person’ and ‘Human’ are compatible terms, both could be true.

Aggregated relations

Depending on the aggregation one also sees different relations between the objects:

Adding relations to the aggregation

The only thing that obvious is that there is more than one truth, and the perspective on the world depends on the facts that you pick.

Note: we know that r12 - r16 are aggregated relations between there are between ‘Agg’ nodes

Provenance

What’s left out of the picture is the provenance - who said it, when, with what certainty. The idea is to store this information on the original data nodes (n1-n5), in properties that won’t get aggregated (e.g. properties starting with _). This helps us around a limit of LPG: we can’t really talk about the assignment of labels, types and properties. If we bundle the information with the same context/provenance together in the same node, we have a way to store and differentiate it from other contexts.

Conclusion

Using this approach we can actually do AAA in LPG. It doesn’t bend the system too much, and it allows “truth decision” at read time. There is no need to decide what is true beforehand.