Home

User Interface for Graphs

Updated:
Created:

This is an AI translation of Benutzeroberfläche für Graphen

This page is still under development

Since I’ve been dealing with graphs, I’ve been missing a decent interface for viewing and editing graphs. Here are my thoughts on what I’d like to have.

As an introduction to the topic of graphs, Christoph Pingel wrote a nice article.

Update: GraphEditor

There is now an implementation, to be found at https://github.com/dbsystel/grapheditor

LPG and RDF

Description

First, the question of which graph model I’m thinking about here. Essentially, there are two models that have established themselves by the end of 2022: RDF and Labeled Property Graph ( LPG).

RDF Star Source: https://arxiv.org/abs/1910.09017

In RDF (right), there are identifiers in the core and subject-predicate-object statements that work with these identifiers:

@base <http://example.org/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rel: <http://www.perceive.net/schemas/relationship/> .

<#123>
  rel:friendOf <#456> ;
  foaf:name "Alice" .

In the example, there are two statements about the identifier http://example.org/123:

  1. A http://www.perceive.net/schemas/relationship/friendOf from http://example.org/456
  2. has the string Alice as value for http://xmlns.com/foaf/0.1/name

In this way, complex statements about identifiers / nodes and their relationships to each other can be described very well. It should be noted that almost everything except direct values are identifiers, which are usually described at the appropriate place, e.g. on http://xmlns.com/foaf/0.1/.

If one wants to make statements about relationships in conventional RDF, one takes the path via ” reification”, i.e., one makes a relationship itself a kind of node, over which statements can then be made again. To simplify this, the further development RDF* (RDF Star) allows simpler statements about relationships (and which is already used in the image above).

Advantages and Disadvantages

Modeling

As a model, I like LPG at first (and better than RDF) because it looks intuitively simple. This simplicity allows easy access for beginners or people whose role is not to understand graphs in all their abstraction.

Of course, in LPG one can also come up with the idea of using “relationship nodes” instead of edges.

flowchart LR; Alice -- source --> knows{{knows}} knows -- target --> Bob

With this, however, the path of clarity is already left, and both the queries become ugly to write, and the model really difficult to understand.

If one sticks to the clear separation of nodes (with properties) and edges between them (also with properties), one has a clear structure to which one can easily apply all the mathematical knowledge about graphs. This fact may also be responsible for the large number of available algorithms and functions in the area of LPG graph databases.

Another fundamental problem in RDF is that the subject-predicate-object structure is too short. For each statement, one might want to store where it comes from, or at least give it an ID.

Therefore, I proceed from LPG graphs in the following.

Schemas

In LPG, one can simply assign names to properties. It is equally simple that values are ultimately always scalars, i.e., strings, integers, etc. One doesn’t need to worry about anything else.

But one can also basically not worry about anything: in LPG databases, there is initially no concept of namespaces, clearly defined properties, relations and their relationship to each other (ontologies). Because nothing is defined here, one cannot usually work with it as is possible in the RDF area with, for example, OWL. This is not “worse” than in relational databases, or in large parts of programming in general, but ultimately leads to chaos, as beautifully depicted by Dave McComb in the book The Data Centric Revolution

. Chaos because in this way every application with its respective database becomes a small island, because the identifiers for objects and their properties are not organized, and thus the data cannot be easily brought into cross-application relationships. (A current trend is the development of metadata catalogs, which would then have to be driven to the expansion stage of ontologies.)

I.e., in the LPG area, it would be desirable to work with more precisely specified relations and properties, which are ideally defined within an ontology.

Regarding schemas (like e.g. SHACL in RDF), there are ” constraints” in the LGP area, with which one can specify which properties nodes must have that have a certain label. However, I don’t like that this gives up the great advantage of flexibility in modeling; basically, a rigid model is developed again as in relational databases. A node is given a label, and the node is then correct or not. Thus, a node cannot stand on its own first, and one can then freely look at which schemas the node corresponds to.

I believe, however, that one can do this better, and I present the idea in the next section.

Wish List

Semantic Support

Python

In Python, “Duck Typing” is implemented. I.e., instead of asking whether the class of an object is derived from a suitable parent class, one can also simply look whether suitable attributes and methods are present on an object, and then work with it accordingly. I.e., if an object has the suitable methods to make it ” iterable”, one can use it accordingly. This idea is continued in Python with typing.Protocol, a Protocol is basically a schema that describes the structure of an object, and which can now also be checked at runtime (runtime_checkable).

See also:

Own Approach

What I would like would be a system in which

  1. Properties
  2. Relations
  3. Schemas/Labels

are described. Christoph Pingel and I had a first approach to this as a self-referential schema. With this, an ontology can be described in an LPG.

If one has a description of the components (properties, relations, schemas) of the graph, one can naturally use them to connect different graphs, and also to draw logical conclusions within the graph (which still needs to be developed). But much more important is the support in GUIs. When editing a single node, for example, one can display an explanation for each (possible) property and each possible relation as to what is meant by it. Likewise, one can relatively easily check which schemas a node corresponds to.

Editing

Obviously, I finally want an editor with which I can edit nodes, edges, and the properties. I envision that a ( knowledge) graph is built up step by step - somehow the information has to get into the graph first. Of course, this is not so necessary if “only” data from other data sources are merged in a graph.

And when editing, there should be the above-described semantic support.

Query

Of course, I want to be able to query the graph in an interface. Primarily, I see the respective query language of the graph database, e.g. cypher. For the input component, I imagine at least syntax highlighting and a history navigable with arrow keys.

A Query Builder GUI would also be conceivable.

Result Set

The result of the query should be displayed. The tools I know so far go directly from input to display. However, I envision an intermediate step:

flowchart LR; Input --> ResultSet --> Display

The intermediate step of the result set allows in the input to decide whether an possibly already existing result set should be:

This way, one can gradually approach a desired display. Furthermore, the result sets can also be cached (i.e., the IDs of nodes and edges). This gives views that can be called up later. If one has result sets, one also has ideal entry points for set-based navigation, as presented by David Huynh in 2008 in Freebase Parallax.

Views

Set Views

To display a result set, several display forms are interesting:

Ideally, the user can switch between these different views without having to reload the result set.

At least in the graph display 2D & 3D, it should be possible to change the position of the nodes. The position should be savable so that a view can be called up again later; for this, the result set (IDs) would have to be stored together with the respective positions. The storage could of course take place in a respective own “View” node, which either has the IDs and positions as property, or we have View-[contains {pos: 1,2}]->Node connections.

Single View

A graph display of a single node is needed. This should make both the properties of a node accessible, as well as display the local environment of the node, i.e., the connected nodes, one or more hops away. If one clicks on one of the nodes, it switches to the corresponding local view.

One idea is that when a new node is selected, the old nodes remain in the image, but perhaps become fainter. I.e., each node gets a kind of ttl, which is decremented with each click.

Storage

To be able to work with multiple nodes, e.g., to compare nodes or copy values back and forth, a kind of storage would be useful, in which one can store multiple nodes, and from which one can also open a node again.

Open Source

I cannot imagine seriously recommending closed-source software to a customer for editing and storing important data, especially not if the data is security-relevant. Accordingly, the user interface would also have to be Open Source. Of course, this raises the question of the business model, but perhaps here what to do with trademarks.

Implementation

Status Quo and Inspirations

Neo4j Browser

https://neo4j.com/developer/neo4j-browser/

Neo4j Bloom

https://neo4j.com/developer/neo4j-bloom/

NeoDash

https://neo4j.com/labs/neodash/

Parallax

https://vimeo.com/1513562

memgraph lab

https://memgraph.com/lab

The visualization component orb.js is available as Open Source, which is interesting for further use.

yworks

https://www.yworks.com/