Many times I was asked to publish my blogs in English…I’m not sure wheter my translation is as good as Google’s, but I give it a try !;)
In this post we would like to introduce you a proof of concept that our team was working on the last few weeks. As mentioned in some other blogs, we are trying out new technologies such as Apache Hadoop and combine it with business intelligence tools from SAP. So we have developed an end-to-end BI & big data solution. I will explain this case in detail after a little introduction to big data…
Big Data – Behind the buzzword
Big data is considered to be a business intelligence related discipline or even a part of it. While traditional BI focuses on transactional data, based on business systems such as ERP or CRM, big data pursues the goal of analyzing and understanding unstructured mass volume and high velocity data coming from rather unconventional sources like sensors, video streams and all kind of so called «Internet of Things» solutions. Unlike classical business intelligence, it is not so important to view one single transaction from a specific source system. Big data is more about understanding the hidden patterns and relations in this huge growing amount of data on our planet.
With this target in mind we long time faced the problem that existing technology was not capable of performing data analysis that way. Especially because computationally intensive algorithms, like advanced data mining models, are necessary to gain this kind of insights. But this has changed. New technologies like Apache Hadoop and so called No SQL systems have appeared. Furthermore in-memory and column store database technology, combined in products like SAP HANA, empower core business systems to become more capable of real time analysis. But should an organization really build two analytical streams? One big data and one BI architecture? Hence the most interesting question is:
«How can we combine proven business intelligence tools with this new powerful big data technology to get out the most integrated (real time) view on our organization and it’s processes (and beyond)?»
Temperature and humidity centrally analyzed
Back to our case. What if we could combine the heating costs of one or many buildings with it’s temperature and humidity to get further insights about future expenses? Mixed up with real estate master data (e.g usage, room type, etc.) interesting patterns and findings can be gained! So what did we do? We placed a mobile sensor (with 3G connection) in our office in Zurich. Every second this sensor measures temperature and humidity and transfers that information trough a web service to an Apache Hadoop cluster in the cloud. Imagine this with thousands of sensors in thousands of rooms and multiply it with seconds of days, weeks and months. With the time we will have lots of data – let’s call it big data 😉 On Hadoop this mass information can be stored cheaper than on any special hardware appliance. That’s because it is based on commodity hardware.
On top of Hadoop we have configured a new SAP Business Warehouse (BW) system running on SAP HANA. The mentioned master data is stored in that system. To combine this with our big data results we have used the so called Smart Data Access (SDA) in HANA that virtually accesses the measures on Apache Hadoop at runtime, without physically copying it as aggregation to the BW on HANA system.
That’s how our architecture looks like:
The goal of our POC was to integrate all this components into one SAP BI & big data solution. And we can proudly say that it works! So technology is not the show stopper. In our opinion the most challenging and important step will be the definition of a good business case in advance of such projects.
What do you think about this case?