Digital Essence

Digitalization is essentially about a common understanding of data. We have a long way to go until there is no more copying and pasting to fill out the tax declaration. How can we have a common understanding, while being as flexible as possible? Can we learn from the success of the Internet, or the x86 CPUs?

When I finished my tax declaration, I didn’t have the impression that digitalization was progressing rapidly. I had to copy numerous figures from one document to the online form. In another domain, my doctor refers me to the specialist by writing and mailing a letter.

The necessary IT infrastructure is in place – but progress on digitalization of everyday business processes involving multiple parties is slow. One organization can go digital, but if interaction between multiple organizations is required, things get complicated.

The essential problem is the lack of a common understanding of the data involved. Although there are standard ways to represent data syntactically (XML, JSON), many domains lack standards defining the meaning of data. Common situations are:

  • Standards may exist. Creating standards amongst competitors is hard, and resulting standards are often designed by large committees, leading to arcane formats.
    • SWIFT, for example, is the standard for inter-bank communication – designed in the days where COBOL was the prevalent programming language for commercial systems.
    • But many industry standards like HTML and HTTP are quite good.

ONe standard box, arrows go into it

  • There is a dominant player, who may enforce standards. Although the tax office seems to be pretty powerful to me, it is apparently not powerful (or willing) enough to establish standards.
  • Transformation engines translate between formats created by different organizations. If the formats are not simple and crystal clear, essential semantic is often lost in translation. A transformation engine, and the related Enterprise Service Bus are helpful to connect systems, but semantic transformations have to be addressed without creating a tangled web of point-to-point translations.

3 Boxes, connected by wires and a cogwheels

  • Federated formats divide a domain into subdomains, and establish a standard or a dominant player’s scheme for each subdomain. This is often the only feasible approach – but divergent definitions of core concepts are problematic: Subdomains overlap; and non-standardized areas exist:

3 partly overlapping boxes, arrows pointing to ? in middle

  • Common core, federated extensions aim to simplify the federated formats, by covering the essentials of the domain in a common core. Flexibility is provided by extensions to the core, which may overlap. However, extensions must always map to the common core, allowing scheme users to reach into the core, ignoring extensions. Phrased differently, each extension must fully understand the core; whereas the core is not required to know about or understand extensions.

Boxes around ccenter core, arrows pointing to core

The common core, federated extension schema (Common Core, for short) seems to be the most promising way to address this essential problem of digitalization between organizations, so I will explain it with some examples.

Simple examples of Common Core

While this scheme sounds complicated, it is actually used in many domains:

  • A color can be described as an RGB value, a CMYK value, a HSV value, a Pantone number, a common name like “blue”, “mauve”, or “peach”, and undoubtedly a few other formats. The core may mandate an RGB value; it suffices that the other values can be expressed as RGB (even if some subtleties might be lost – extensions provide more detail).
  • The x86 CPU started as a small core, with numerous extensions added over time. Many former extensions became core features.
  • HTTP and HTML emerged as a standard way of interacting over a network and defined a platform. Unlike other similar undertakings of the time, they defined a simple core and allowed extensions (like extension headers and verbs for HTTP; custom tags and scripting for HTML).

Extensions by various parties came and went; the core remained quite stable. I can still run an early x86 program on today’s CPU; the latest browser can still access a site designed in early HTML and running on a very old web server supporting the first HTTP version.

This, you may say, are technical standards, and have nothing to do with digital business. Let’s do an example with something very common to many kinds of business transactions, an order.

 

Business Example: Order State

Whether I order socks from Siroop, a mobile subscription from Swisscom, buy some shares on a stock exchange, or a ticket from Swiss, I place an order, which gets fulfilled. I get something, and I pay for it. There is a set of core data, identifying me, the counterparty (seller), the good, the price, and the state the order is in (just came in, delivered, paid).

But details vary: A flight is not designated in the same way as the socks. So we have extension describing orders in particular domains; these extensions may be defined by different parties: Swiss might not use the same scheme as a competitor, Siroop and Amazon may designate goods differently. But when all extensions map to a common core, I can have an API (and an App) listing all orders I have currently outstanding, listing the good (probably just mapped to a string in the Core).

The state space of orders may vary considerably – I may be able to cancel a subscription; but not cancel a flight once the plane’s door is closed. The core state diagram shown below represents a generic order; extensions allow for specifics.

UML State Diagram of Order

Extensions always map each state to a core state; they also can register to be invoked when a core attribute (or state) is set directly, so they can modify an action or record additional detail.

This exemplifies the goal and mantra of a common core scheme: As common as necessary, as flexible as possible.

 

Implementation

In IT history, complicated and heavyweight technologies often lead to closed (and sometimes very professional) communities. Relatively simple standards such as XML, JSON, HTTP, HTML overtook more complex competitors and lead to open communities with wide adoption.

If the Internet would have been designed back in the seventies with its nowadays capacity in mind, it probably would have been so complicated that it would never have taken off. Instead, the set of protocols was very simple and readable.

For digital business information, data must be validated against the common core, and the federated extensions it claims to adhere to. However, dynamic loading of evolving federated extensions precludes static schemas, which are typical for relational databases. Fortunately, modern databases like MongoDB have become mainstream; they are designed from ground up for storing flexible formats like JSON.  Data modelling and validation remains important, but should be decoupled from the rigid requirements of storage.

Exploration, prototyping and quick end-to-end testing helps to find and understand the core abstractions and extensions.  As a Python programmer, I have the advantage that I can explore APIs easily and build production grade software exploiting them with the same set of tools. The application domain is complicated enough; so I prefer tools with low conceptional overhead.

 

Digital Essence

A common understanding of shared data is the essence of digitalization:

  • Define a simple, small, but powerful core.
  • Embrace extensions, always mapping them back to the core.
  • Implement light-weight.
  • Test and iterate, until your understanding is as common as necessary and as flexible as possible.