Advanced Modeling in HANA XSA Environment: Graph Pattern Matching
Pattern matching is a type of a graph query which helps understanding relationships, detecting patterns, finding subgraphs that match the given pattern. Patterns are a combination of edges and vertices. A vertex is a target or a source of an edge. For instance, traveling from Bern (vertex) to Zürich (vertex) using an SBB Train (attribute of an edge). Now to scale it up, there are many destinations, many routes and different modes of transportation. It is often necessary to perform complex calculations in order to find all the patterns for the question asked. This blog gives an overview of possibilities of performing pattern matching using native HANA methods such as a Calculation View. In this blog both a visual as well as a cypher code method for performing pattern matching is described in detail.
In HANA XS Advanced Modeling Environment, graph nodes can be used to take advantage of advanced graph algorithms. As an initial step of a Calculation View this type of nodes can be used to enhance its basic capabilities. For this purposes, integrated functionalities such as, Shortest Path, Neighborhood, Strongly Connected Components as well as Pattern Matching are available.
Pattern Matching can either be done using Visual Editor or Cypher Code. Note that in both cases as a prerequisite a Graph Workspace must be available in the specified HDI container.
Figure 1 Pattern Matching in a Calculation View
Graph Nodes in Calculation Views for Pattern Matching
The Graphical Editor allows a straightforward way for designing graphs. In this case, all the vertices and edges can be defined using graphical environment. In a similar way, vertex or edge filters can be specified. For these purposes an input parameter or a fixed value can be taken. Lastly, all the necessary output columns can be chosen and mapped in this environment.
When adding a graph node in a Calculation view, it must be the initial node. No operations, such as aggregations or projections, before it can be used. You can add as many vertices and edges as necessary. Once it is done filters can be used. In an example below, an additional Filter was defined on the attributes of an edge E1 or a vertex V2 using an input parameter from the end user or some fixed values. For instance, specific entities can be filtered out. All the necessary output columns can then be chosen in the mapping tab.
The example below illustrates searching for a subgraph of three vertices and two edges that meet the following criteria.
1. One of the vertices is the target of the two edges (E1 and E2)
2. One of the edges has the attribute value SBB and the other Swiss Air Lines (SWISS)
Basically, looking for a target that can be reached by SWISS and SBB. After defining the basic logic, it can be applied for massive data volumes to find all the subgraphs that meet the defined criteria.
Figure 2 Graphical Pattern Matching
Cypher Code for Pattern Matching
Graphs can also be queried as well as updated using cypher language. Cypher is a declarative language therefore the major advantage here is its simplicity. Cypher allows complex database queries, removing the necessity for any additional backend configuration. One more advantage of cypher is its open source property. Since late 2015 it was adapted by many DB providers. One of them is SAP with its well-known HANA Database. An integrated graph node for calculation views can be used for this matter, where both graphical as well as pattern cypher editor are available for consumption.
The Syntax here is built upon two clauses. A MATCH and a RETURN clause. Initially, all vertices and paths are identified. Additionally, it can be enhanced with the optional SQL like “WHERE” conditions. For this matter “, =,<,>, <>,>=, <= ” can be used to perform comparisons and keywords “AND”, “OR” and “NOT” can be used for Boolean operators.
Lastly, a RETURN clause depicts all the output columns which are relevant for the result. The example below illustrates the same pattern matching issue as done with the “Graphical Pattern Editor”.
It is worth to note that cypher enforces the uniqueness of edges, although vertices can differ. Thus, allowing more flexibility in contrast to the standard pattern matching. As expected, the syntax is comprised in vertex-edge-vertex, where the subgraphs are divided with a comma, vertices are written in “()” and edges in “” as shown in the example above. The length of the path is however limited to 15.
As soon as all the necessary components of a cypher statement are set, the output column can be generated as shown in the figure below:
Figure 3 Pattern Matching with Cypher Code
Since SPS04 a built-in “OPENCYPHER_TABLE” function can be directly used for pattern matching purposes in an SQL Expression. In this case the following syntax can be used:
SELECT * FROM OPENCYPHER_TABLE (GRAPH WORKSPACE “SCHEMA”.”WORKSPACE_NAME” QUERY ‘ Custom Cypher expression ‘);
Additionally, SAP HANA Graph Reference Guide is recommended when perusing this method.