Container as a Service (CaaS) – Part I

As announced in late 2017, the Swisscom Application Cloud team is evaluating the latest container orchestration frameworks to meet the growing customer demand of Kubernetes based container management services. First proof of concepts with interested customers have started based on Cloud Foundry Container Runtime (CFCR). In addition, the commercial product out of CFCR, Pivotal Container Serivce (PKS), has been tested during their beta phase. Further platforms that were tested in this regards will be covered in following blog posts. This is a sneak preview on our evaluation work and does not provide any commitment for an offered service.

Comparing Cloud Foundry Container Runtime with Pivotal Container Service (PKS)

Preconditions and Requirements

Swisscom wants to offer all services in its portfolio in a uniform and homogeneous way.

Swisscom One Cloud Vision: combining all cloud offerings for our customers

Swisscom wants to provide Kubernetes clusters on demand as a managed service (Press Release). As a service provider we have additional requirements particularly in terms of multi-tenancy and network isolation and security. We have pure IaaS cloud offerings as well and are looking if and how we can offer CaaS at scale to our customers in a sustainable way at competitive prices, either managed by us or as self-service offerings.

Multi-tenancy roles and scenarios (k8s API is available to all master nodes!)

This diagram depicts possible multi-tenancy scenarios requested from our customers: shared clusters, dedicated clusters and mixed models, all requiring fine granular and hierarchic permission model customer driven by IDP federation and sophisticated network security capabilities.

We are working on integrating all our services under a single pane of glass: homogeneous access on all distribution channels, producing services once, exposing them by the Open Service Broker API (OSBAPI) and consume it anywhere.

This is our long-term strategy and will require a lot of work to harmonize connectivity, APIs, identities, billing and monitoring systems. As Cloud Foundry already uses OSB we have some of the services already exposed and are working on others like Oracle’s EXADATA.

The integration of CaaS (on demand Kubernetes clusters) should ideally be done in the very same way:

 

Swisscom Market Place – homogeneous offering over all services

Having already placed considerable invests in our VMware infrastructure, we are determined to stick to the already productive installation consisting of vSphere and NSX-V in three availability zones. NSX-T could offer a lot of additional possibilities but is out of scope for now. I’ll cover the additional features provided by NSX-T later in my CaaS blog post series.

Zoning in our production setup: need to access vSphere out of the workload

One particularity in our base set up was a problem for all tested frameworks: we are separating management and workload zones in dedicated vCenter/NSX domains. As Bosh and Kubernetes for vSphere (CPI) need direct vSphere access, we had to find a solution to provide this access over a proxy. Apart from patching the frameworks and bringing these patches back to the community we have built a gateway proxy to find a convenient workaround for Kubernetes based products that had all the same problem with defining a rather common https proxy setting.

Cloud Foundry Container Runtime

CFCR is the open source Kubernetes foundation. It is the base for all the products we are currently testing.
BOSH is the orchestrator for Cloud Foundry and Pivotal and enables a cloud agnostic deployment and management  of the complete lifecycle of k8s (Kubernetes) clusters by providing abstraction through the CPI (Cloud Provider Interface).  Kubernetes together with BOSH is also known as KUBO.
The installation went quite smoothly after we had figured out how to set the proxy needed for our vCenter access. We submitted a merge request back to the community. It got merged i.e. implemented already and might be part of CFCR v0.13.

With Flannel (on NSX-V) and very limited ABAC we did not see another option than to deploy one cluster per tenant. We had to set up a complete Bosh / Kubernetes cluster per customer and exposed the Kubernetes API endpoint (together with a predefined range of ports) over an external load balancer (F5). ABAC was the preferred security model for the vanilla kubo when we did the first installation. We kept it very simple and started with this out-of-the-box setup.

Apart from the installation for our customers, we have an additional installation where we want to try out and continuously improve this offering internally. As a next step we will perform the cluster upgrade process and a switch to RBAC with a proper UAA integration, if possible even with the integration of our target Identity Broker, OpenAM. Another important topic will be the automated orchestration of the external load balancer as well as the NSX-V security groups and firewall rules.

The first impression of the CFCR releases reveals a powerful framework that allows to run containers at scale. The tooling around administration and automation of such an installation is not yet at a service provider level and will demand considerable development effort. As Kubernetes has gained a lot of momentum and is moving fast it will be a challenge to maintain service provider additions. We are in close collaboration with Cloud Foundry, VMware and Pivotal and will have to decide if we are willing and able to do this investment in development now.

Pivotal Container Service

We could test PKS already as beta 0.8 version and found an additional layer (PKS API using the on-demand broker OSB API) adding valuable administrative functionality to ease the management of Kubernetes cluster lifecycle. Pivotal offers its operation manager component (OVA for vSphere) that provides a central Bosh installation as the base. Many other components can easily be added to this application as so-called tiles (compressed tarballs with additional configuration screens). In our proof of concept, we only installed PKS and Harbor:

 

Pivotal Operation Manager with PKS and Harbor Tiles

An excellent introduction into the overall architecture can be seen in these brand-new VMware videos:

Official Press Release from Pivotal: Secure, Multitenant Kubernetes in Minutes: Pivotal Container Service Goes GA

Key findings

In regards of multi-tenancy we found that we can install only one PKS component and this will use exactly one workload and one service network.

Assign AZ and Networks in PKS Tile

If you are forced to use NSX-V you will have to establish a micro-segmentation using NSX distributed firewalls. The other option would be multi-instance for all components, from operation manager down to PKS and harbor increasing considerably the operation and maintenance efforts.

A couple of improvements are delivered in the GA version 1.0 of PKS and will gladly be mentioned here:

  • 3 plans are now definable (a plan is like a t-shirt-size for a Kubernetes cluster)
  • Running privileged containers is allowable per plan
  • UAA integration is improved, but using the internal PKS UAA seems to be the only option

https://docs.pivotal.io/runtimes/pks/1-0/release-notes.html

Harbor 1.4.1 comes with the possibility to integrate the PKS UAA directly.

https://docs.pivotal.io/partners/vmware-harbor/release-notes.html#v1.4.1

Simple Reference Example with vSphere Persistent Storage and NGINX ingress

Setup NGINX with RBAC:

https://github.com/Kubernetes/ingress-nginx/blob/master/deploy/README.md

Find our deployment manifests in our repository:

https://github.com/swisscom/ac-caas-wordpress-sample

RBAC Support

With PKS 1.0 the default authorization mode changed to RBAC but can be defined per plan.

Authentication Mode can be set per plan in PKS tile

With a tighter integration into the PKS UAA the login mechanism have changed:

https://docs.pivotal.io/runtimes/pks/1-0/manage-users.html#uaa-admin

RBAC consideration for NGINX service:

https://github.com/Kubernetes/ingress-nginx/blob/master/deploy/rbac.md

We’ll present the results of our tests regarding Kubernetes RBAC in one of the next parts.

NSX-V Support

Bosh is creating multiple security groups per deployment and even tiles in operation manager (i.e. harbor tile).

Bosh-created security groups shown in vCenter Web Client

The distributed firewall rules, however, are not created, yet, and we will investigate on how to automatically orchestrate a proper tenancy isolation based on NSX micro-segmentation.

 

Distributed Firewall Rules that have to be created

Open Issues and Feature Requests

Authorization Mode and IDP Federation

We need a SAML federation with our ID Broker (OpenAM) and want to have users and groups controlled directly by our customers. We will explore how we can map these external artefacts in a fine granular way within the RBAC permission model in Kubernetes.

Network Segregation/Segmentation, Firewalling

Our customers demand a very high level of control on network level even between pods and even containers. We will investigate the possibilities of Flannel, Calico and Istio and will have to elaborate on automation in NSX-V to replace the out-of-the-box features of NSX-T. Our cloud labs team in Palo Alto is investigating the built-in support of NSX-T in PKS/CFCR. We hope that we can swiftly find ways to use NSX-T also in our production enviroments.

Open Service Broker API

We would like to use directly the OSBAPI of PKS to integrate this service for providing on demand Kubernetes clusters in our own service ecosystem. We are discussing this integration means that will surely be used to integrate PKS into the market place of Cloud Foundry itself.

Proxy Support for Ops Manager, PKS and Harbor

Ops Manager is currently not supporting http proxy with authentication. We proposed a patch to Pivotal. Our gateway proxy is a quick fix for the kubernetes components using vSphere dynamic storage features but Bosh also requires direct access to the ESXi hosts. Of course, we could extend our gateway proxy, but it would be preferable to have this proxy setting respected throughout all components properly.

Credits

I’d like to thank my colleagues for the very cool collaboration and review of this blog post. Special thanks go to Patrik Nagel from Pivotal and his team for his very friendly and persistent support. And finally, to all involved parties and our customers for all the valuable feedback helping to shape an attractive service for all of us.