This is an AI translation of Benutzeroberfläche für Graphen
This page is still under development
Since I’ve been dealing with graphs, I’ve been missing a decent interface for viewing and editing graphs. Here are my thoughts on what I’d like to have.
As an introduction to the topic of graphs, Christoph Pingel wrote a nice article.
Update: GraphEditor
There is now an implementation, to be found at https://github.com/dbsystel/grapheditor
LPG and RDF
Description
First, the question of which graph model I’m thinking about here. Essentially, there are two models that have established themselves by the end of 2022: RDF and Labeled Property Graph ( LPG).
Source:
https://arxiv.org/abs/1910.09017
In RDF (right), there are identifiers in the core and subject-predicate-object statements that work with these identifiers:
@base <http://example.org/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rel: <http://www.perceive.net/schemas/relationship/> .
<#123>
rel:friendOf <#456> ;
foaf:name "Alice" .
In the example, there are two statements about the identifier
http://example.org/123:
- A
http://www.perceive.net/schemas/relationship/friendOffromhttp://example.org/456 - has the string
Aliceas value forhttp://xmlns.com/foaf/0.1/name
In this way, complex statements about identifiers / nodes and their relationships to each other can be described very well. It should be noted that almost everything except direct values are identifiers, which are usually described at the appropriate place, e.g. on http://xmlns.com/foaf/0.1/.
If one wants to make statements about relationships in conventional RDF, one takes the path via ” reification”, i.e., one makes a relationship itself a kind of node, over which statements can then be made again. To simplify this, the further development RDF* (RDF Star) allows simpler statements about relationships (and which is already used in the image above).
Advantages and Disadvantages
Modeling
As a model, I like LPG at first (and better than RDF) because it looks intuitively simple. This simplicity allows easy access for beginners or people whose role is not to understand graphs in all their abstraction.
Of course, in LPG one can also come up with the idea of using “relationship nodes” instead of edges.
With this, however, the path of clarity is already left, and both the queries become ugly to write, and the model really difficult to understand.
If one sticks to the clear separation of nodes (with properties) and edges between them (also with properties), one has a clear structure to which one can easily apply all the mathematical knowledge about graphs. This fact may also be responsible for the large number of available algorithms and functions in the area of LPG graph databases.
Another fundamental problem in RDF is that the subject-predicate-object structure is too short. For each statement, one might want to store where it comes from, or at least give it an ID.
Therefore, I proceed from LPG graphs in the following.
Schemas
In LPG, one can simply assign names to properties. It is equally simple that values are ultimately always scalars, i.e., strings, integers, etc. One doesn’t need to worry about anything else.
But one can also basically not worry about anything: in LPG
databases, there is initially no concept of namespaces, clearly defined
properties, relations and their relationship to each other (ontologies).
Because nothing is defined here, one cannot usually work with it as is
possible in the RDF area with, for example, OWL. This
is not “worse” than in relational databases, or in large parts of
programming in general, but ultimately leads to chaos, as beautifully
depicted by Dave McComb in the book The
Data Centric Revolution 
. Chaos because in this way every application with its
respective database becomes a small island, because the identifiers for
objects and their properties are not organized, and thus the data cannot
be easily brought into cross-application relationships. (A current trend
is the development of metadata catalogs, which would then have to be
driven to the expansion stage of ontologies.)
I.e., in the LPG area, it would be desirable to work with more precisely specified relations and properties, which are ideally defined within an ontology.
Regarding schemas (like e.g. SHACL in RDF), there are ” constraints” in the LGP area, with which one can specify which properties nodes must have that have a certain label. However, I don’t like that this gives up the great advantage of flexibility in modeling; basically, a rigid model is developed again as in relational databases. A node is given a label, and the node is then correct or not. Thus, a node cannot stand on its own first, and one can then freely look at which schemas the node corresponds to.
I believe, however, that one can do this better, and I present the idea in the next section.
Wish List
Semantic Support
Python
In Python, “Duck Typing” is implemented. I.e., instead of asking
whether the class of an object is derived from a suitable parent class,
one can also simply look whether suitable attributes and methods are
present on an object, and then work with it accordingly. I.e., if an
object has the suitable methods to make it ” iterable”, one
can use it accordingly. This idea is continued in Python with typing.Protocol,
a Protocol is basically a schema that describes the structure of an
object, and which can now also be checked at runtime (runtime_checkable).
See also:
- https://www.informatik-aktuell.de/entwicklung/programmiersprachen/python-static-duck-typing.html
- https://peps.python.org/pep-0544/
Own Approach
What I would like would be a system in which
- Properties
- Relations
- Schemas/Labels
are described. Christoph Pingel and I had a first approach to this as a self-referential schema. With this, an ontology can be described in an LPG.
If one has a description of the components (properties, relations, schemas) of the graph, one can naturally use them to connect different graphs, and also to draw logical conclusions within the graph (which still needs to be developed). But much more important is the support in GUIs. When editing a single node, for example, one can display an explanation for each (possible) property and each possible relation as to what is meant by it. Likewise, one can relatively easily check which schemas a node corresponds to.
Editing
Obviously, I finally want an editor with which I can edit nodes, edges, and the properties. I envision that a ( knowledge) graph is built up step by step - somehow the information has to get into the graph first. Of course, this is not so necessary if “only” data from other data sources are merged in a graph.
And when editing, there should be the above-described semantic support.
Query
Of course, I want to be able to query the graph in an interface. Primarily, I see the respective query language of the graph database, e.g. cypher. For the input component, I imagine at least syntax highlighting and a history navigable with arrow keys.
A Query Builder GUI would also be conceivable.
Result Set
The result of the query should be displayed. The tools I know so far go directly from input to display. However, I envision an intermediate step:
The intermediate step of the result set allows in the input to decide whether an possibly already existing result set should be:
- replaced
- extended
- reduced
This way, one can gradually approach a desired display. Furthermore, the result sets can also be cached (i.e., the IDs of nodes and edges). This gives views that can be called up later. If one has result sets, one also has ideal entry points for set-based navigation, as presented by David Huynh in 2008 in Freebase Parallax.
Views
Set Views
To display a result set, several display forms are interesting:
- Graph display 2D
- Graph display 3D
- Display on map
- Force Graph and other layout algorithms
- Timelines
Ideally, the user can switch between these different views without having to reload the result set.
At least in the graph display 2D & 3D, it should be possible to
change the position of the nodes. The position should be savable so that
a view can be called up again later; for this, the result set (IDs)
would have to be stored together with the respective positions. The
storage could of course take place in a respective own “View” node,
which either has the IDs and positions as property, or we have
View-[contains {pos: 1,2}]->Node connections.
Single View
A graph display of a single node is needed. This should make both the properties of a node accessible, as well as display the local environment of the node, i.e., the connected nodes, one or more hops away. If one clicks on one of the nodes, it switches to the corresponding local view.
One idea is that when a new node is selected, the old nodes remain in the image, but perhaps become fainter. I.e., each node gets a kind of ttl, which is decremented with each click.
Storage
To be able to work with multiple nodes, e.g., to compare nodes or copy values back and forth, a kind of storage would be useful, in which one can store multiple nodes, and from which one can also open a node again.
Open Source
I cannot imagine seriously recommending closed-source software to a customer for editing and storing important data, especially not if the data is security-relevant. Accordingly, the user interface would also have to be Open Source. Of course, this raises the question of the business model, but perhaps here what to do with trademarks.
Implementation
- The whole thing must be able to run server-based, to support own security models in further expansion stages. If the application is a pure JS application, all security would have to be in the database - an assumption that suits me.
- If most of the work, incl. layout, happens on the server, one can also use appropriate libraries, e.g. networkx.
- To that, I would imagine a very slim interaction framework, of course htmx comes to mind first.
- If possible, WebGL would be nice for the graph display, because of the performance.
Status Quo and Inspirations
Neo4j Browser
https://neo4j.com/developer/neo4j-browser/
- Input area for queries works well
- Graph display is clearly too slow with many nodes
- No editing capability
Neo4j Bloom
https://neo4j.com/developer/neo4j-bloom/
- Better performance
- Limited editing capability
- Not all data types are fully supported
- No freely editable properties on edges
- No free input of queries
- No semantic support
- (Still) no map display
- Not Open Source
NeoDash
https://neo4j.com/labs/neodash/
- Practical for own dashboards
- Doesn’t perform particularly well with larger data sets
- No editing capabilities
Parallax
- Super inspiring way of navigating in sets The concept is also implemented, for example, at semspect.
- Needs the connection of an editor
- No longer works
memgraph lab
- Better performance with larger graphs than neo4j browser
- Interesting display language GSS
- Not so good input of queries
- No arrow key history
- Syntax highlighting sometimes glitches
- Closed Source
The visualization component orb.js is available as Open Source, which is interesting for further use.
yworks
- Very beautiful layout algorithms
- Closed Source
- No semantic support when editing
- Is intended for editing displays, not the graph database itself