PKS and OSB for Kubernetes on vRealize Automation

Swisscom’s portfolio of Cloud Services is continuously enlarged and we want to centralize and reuse service definition, provisioning and management on all our distribution channels (large enterprises, SME and even residential customers) by offering them in a homogeneous and easy to consume way on a central portal.

Motivation and Requirements

The Identity & Access Management with VMware’s vRealize Automation and Forgerock’s OpenAM provides a solid foundation for our OneIDB platform to enable our customers to bring their own identities (federated identity providers) and is already in use for the large enterprises segment on our Swisscom Enterprise Service Cloud. Our other Cloud Platforms are managed by Cloud Foundry (Application Cloud), Abiquo  (Enterprise Cloud for SAP Solutions) or vCloud Director (DCS+) but all are based on VMware’s ESXi/vCenter/NSX-V as a common SDDC fundament.

Additional services like managed OS, Oracle’s EXADATA, MS SQL Server are provided by our specialized teams partly on dedicated new platforms, partly in the traditional IT landscape that is continuously being migrated to our new cloud platforms. This transformation is not only a technical challenge but an important organizational topic as well. Shifting to the DevOps paradigm and towards an agile lean management where decisions are taken mostly from bottom up is creating very interesting architectural tasks when it comes to integrate an organically grown IT landscape into the vision of providing a single pane of glass on all systems for our customers, automating provisioning of network connectivity and security and harmonizing identity management, billing and monitoring.

Open Service Broker for Service Integration Layer

We are building a service integration layer by using the Open Service Broker API as a first step to build up a common service catalog that can be consumed from all our platforms.

 

 

During our POC with PKS I developed a OSB Integration for PKS:

 

 

This was the starting point for my research for some show cases and blueprints on how we could integrate the services from different platforms in Swisscom. The OSB API can be easily implemented using the Spring OSB implementation, I had to implement only one service class to make it work. The other pieces in my toolbox are the PKS API and the NXS-V API. I did not have to touch BOSH directly so far.

PKS CLI Functionality

daniele@jumphost:~$ ./login.sh

API Endpoint: pks-api.swisscloud.io
User: daniele

daniele@jumphost:~$ pks

The Pivotal Container Service (PKS) CLI is used to create, manage, and delete Kubernetes clusters. To deploy workloads to a Kubernetes cluster created using the PKS CLI, use the Kubernetes CLI, kubectl.

Version: 1.0.0-build.3

Note: The PKS CLI is under development, and is subject to change at any time.

Usage:
  pks [command]

Available Commands:
  cluster         View the details of the cluster
  clusters        Show all clusters created with PKS
  create-cluster  Creates a kubernetes cluster, requires cluster name and an external host name
  delete-cluster  Deletes a kubernetes cluster, requires cluster name
  get-credentials Allows you to connect to a cluster and use kubectl
  help            Help about any command
  login           Login to PKS
  logout          Logs user out of the PKS API
  plans           View the preconfigured plans available
  resize          Increases the number of worker nodes for a cluster

Flags:
  -h, --help      help for pks
      --version   version for pks
Use "pks [command] --help" for more information about a command.

As you can see, PKS facilitates cluster creation and management providing a very easy to use CLI interface.

During the POC phase we hit some shortcomings of the current product offering like

  • Only one network is supported for all clusters
  • Only one AZ is supported per plan (cluster size like T-shirt sizes)
  • No network security on NSX-V (bosh creates security groups but does not orchestrate micro-segmentation)

The first two open points are addressed with the Pivotal R&D and we hope that multi AZs will be delivered with the next release. Multiple networks require more design work and discussions with the PMs because the PKS API should be kept as clean and as easy to use as possible.

PKS API for vRealize Automation Vision

After having successfully extended the Kubernetes Blueprint for vRA with a load balancer and the vSphere cloud provider interface for k8s (introduced TLA), I wanted to apply the principles of OSB and PKI API to the vRA blueprint as well, because this gives us the technical possibility to use the already introduced orchestrator vRA on the ESC and DCS+ (over vCloud Director endpoint in vRA) platform to produce k8s cluster in an uniform way – at least at the API level. PKS comes with already hardened CFCR runtimes (introduced TLA) whereas the vRA blueprint is relying on the open source and license free community builds. At the same time, we are working on a Red Hat OpenShift Container Platform (OCP) blueprint for vRA as well and are implementing all requirements for offline installations (Red Hat Satellite, Harbor Docker registry). Once we have this blueprint, it would be possible to deliver even OCP (introduced TLA) in the same way:

 

 

My vision of a first step to a OSB integration of k8s and OCP clusters on already three different cloud platforms. The boxes in red on the PKS/APC side are highlighting the shortcomings of PKS in the current release: a strong tenant isolation as we need it for a service provider grade multi-tenancy has to be built by Swisscom and multiple networks are currently only supported if we deploy one Ops-Manager/PKS installation per network (at least 3 VMs per installation without HA). On the vRA side we have already multiple uplink topologies for physical network separation per customer in place, on ESC provided fully by vRA and our custom “NSX Proxy” extension, on DCS+ is vCloud Director capable to setup separate networks per customer/tenants/vOrgs.

Open Issue: as the vSphere cloud provider interface requires a config file containing credentials it is impossible to offer this as an unmanaged service with root access to any of these VMs as for the customer. We are currently looking for a solution to this problem and/or other k8s compatible (block-) storage providers we could delivery safely.

PKS API for vRealize Automation Concept

PKS uses the cluster name (short name, not fqdn) like a primary key, it must be unique. In addition, the UUID of the bosh deployment is returned as PKS cluster UUID. I decided to go on vRA with the deployment name and with the resource UUID as equivalents in the vRA world.

cluster – View the details of the cluster

Fetch the deployment by its name and convert the vRA catalog resource to a PKS cluster.

clusters- Show all clusters created with PKS

Fetch all deployments by the k8s blueprint and convert the vRA catalog resources to PKS clusters.

create-cluster – Creates a kubernetes cluster, requires cluster name and an external host name

Fetch the blueprint and create a request according to the parameters passed. OK, there is more to consider for this one if you take charge of the network security as well… please check out the paragraph on micro-segmentation.

delete-cluster -Deletes a kubernetes cluster, requires cluster name

Fetch the deployment by its name and delete it on vRA.

get-credentials – Allows you to connect to a cluster and use kubectl

Fetch the deployment by its name, get the provisioning request and pick the kubeconfig from the result of the software component deployment variable.

help – Help about any command

Provide a link to this blog 😉

login – Login to PKS

Not implemented, yet. Need a clear guidance on how to harmonize IAM for the different silos.
My implementation uses spring boot profiles and can be started either as PKS – usable with a valid UAA-token – or as VRA – usable with a valid vRA token. It would be nice to use only the OneIDB token from OpenAM or a very sophisticated approach that selects the implementation according to the selected provider in the OSB catalog.

logout – Logs user out of the PKS API

Not implemented, yet. Need a clear guidance on how to harmonize IAM for the different silos.

plans – View the preconfigured plans available

Fetch the blueprint by its name, and convert the component profiles defined for this blueprint (see Component Profiles in virtualjad’s blog).

resize – Increases the number of worker nodes for a cluster

Fetch the deployment by its name, execute scale in / scale out on the nodes and join them to the cluster with the token provided in the provisioning request of the cluster.

Not yet implemented, because it is a lot of work: subscribe to event broker, locate the newly created VMs and execute the provisioning and join process.

Micro-Segmentation on NSX-V for PKS and vRA blueprints

Micro-segmentation for BOSH deployments

For the vRA implementation, I decided to use Security Tags that are created on demand and passed to the vRA provisioning request with custom properties. These tags are then used in Security Groups to set up the same micro-segmentation as I already did for PKS.

Kubernetes for vRealize Automation Extensions

Watch API

K8s offers the “watch api” to efficiently detect changes in the deployed components and configurations. For my prototype I used the watch on services and ingresses to automate load balancer service definitions on vRA provided on-demand load balancers.

Ingress watch – automation of load balancer service

For any change in services and ingresses a spring application event will be published and if the application detects that the service an ingress is pointing to is available, the parameters for the new load balancer service get collected and a vRA reconfigure load balancer resource action gets performed.

Hint: Unfortunately, the vRA load balancer is limited in defining worker pools. In the current version you have only one containing all worker nodes at your disposition. For PKS, where I use the native NSX load balancer API, I can use more fine-grained definitions and create multiple server pools and application profiles according to the needs of the deployed components. But as I didn’t want to give up the advantage of a proper visual representation of the k8s clusters that comes free with vRA I accepted this disadvantage. As an alternative you could use a load balancer that is not represented in vRA or take the risk, that vRA cannot handle the load balancer anymore after you have used the native NSX API.

OSB API for PKS

Service Definition

/**
 * This interface is implemented by service brokers to process requests to retrieve the service catalog.
 *
 * @author sgreenberg@pivotal.io
 */
public interface CatalogService {

	/**
	 * Return the catalog of services provided by the service broker.
	 *
	 * @return the catalog of services
	 */
	Catalog getCatalog();

	/**
	 * Get a service definition from the catalog by ID.
	 *
	 * @param serviceId  The ID of the service definition in the catalog
	 * @return the service definition, or null if it doesn't exist
	 */
	ServiceDefinition getServiceDefinition(String serviceId);

}

I have implemented this interface by calling PKS.plans() and aggregating additional data on provider (PKS/VRA) and organization/space or tenant/subtenant according to the target cloud platform. This is done by looking up to which namespaces the current user is granted access.

ServiceInstanceService

/**
 * This interface is implemented by service brokers to process requests related to provisioning, updating,
 * and deprovisioning service instances.
 *
 * @author sgreenberg@pivotal.io
 * @author Scott Frederick
 */
public interface ServiceInstanceService {

	/**
	 * Create (provision) a new service instance.
	 *
	 * @param request containing the details of the request
	 * @return a {@link CreateServiceInstanceResponse} on successful processing of the request
	 * @throws ServiceInstanceExistsException if a service instance with the given ID is already known to the broker
	 * @throws ServiceBrokerAsyncRequiredException if the broker requires asynchronous processing of the request
	 * @throws ServiceBrokerInvalidParametersException if any parameters passed in the request are invalid
	 */
	CreateServiceInstanceResponse createServiceInstance(CreateServiceInstanceRequest request);

	/**
	 * Get the details of a service instance.
	 *
	 * @param request containing the details of the request
	 * @return a {@link GetServiceInstanceResponse} on successful processing of the request
	 * @throws ServiceInstanceDoesNotExistException if a service instance with the given ID is not known to the broker
	 * @throws ServiceBrokerOperationInProgressException if a an operation is in progress for the service instance
	 */
	default GetServiceInstanceResponse getServiceInstance(GetServiceInstanceRequest request) {
		throw new UnsupportedOperationException("This service broker does not support retrieving service instances. " +
				"The service broker should set 'instances_retrievable:false' in the service catalog, " +
				"or provide an implementation of the fetch instance API.");
	}
	
	/**
	 * Get the status of the last requested operation for a service instance.
	 *
	 * @param request containing the details of the request
	 * @return a {@link GetLastServiceOperationResponse} on successful processing of the request
	 * @throws ServiceInstanceDoesNotExistException if a service instance with the given ID is not known to the broker
	 */
	default GetLastServiceOperationResponse getLastOperation(GetLastServiceOperationRequest request) {
		throw new UnsupportedOperationException("This service broker does not support getting the status of " +
				"an asynchronous operation. " +
				"If the service broker returns '202 Accepted' in response to a provision, update, or deprovision" +
				"request, it must also provide an implementation of the get last operation API.");
	}

	/**
	 * Delete (deprovision) a service instance.
	 *
	 * @param request containing the details of the request
	 * @return a {@link DeleteServiceInstanceResponse} on successful processing of the request
	 * @throws ServiceInstanceDoesNotExistException if a service instance with the given ID is not known to the broker
	 * @throws ServiceBrokerAsyncRequiredException if the broker requires asynchronous processing of the request
	 */
	DeleteServiceInstanceResponse deleteServiceInstance(DeleteServiceInstanceRequest request);

	/**
	 * Update a service instance.
	 *
	 * @param request containing the details of the request
	 * @return an {@link UpdateServiceInstanceResponse} on successful processing of the request
	 * @throws ServiceInstanceUpdateNotSupportedException if particular change is not supported
	 *         or if the request can not currently be fulfilled due to the state of the instance
	 * @throws ServiceInstanceDoesNotExistException if a service instance with the given ID is not known to the broker
	 * @throws ServiceBrokerAsyncRequiredException if the broker requires asynchronous processing of the request
	 * @throws ServiceBrokerInvalidParametersException if any parameters passed in the request are invalid
	 */
	default UpdateServiceInstanceResponse updateServiceInstance(UpdateServiceInstanceRequest request) {
		throw new UnsupportedOperationException("This service broker does not support updating service instances. " +
				"The service broker should set 'plan_updateable:false' in the service catalog, " +
				"or provide an implementation of the update instance API.");
	}
}

I have implemented this interface by calling PKS.create-cluster(name), PKS.clusters(), PKS.cluster(id), PKS.delete-cluster(name), PKS.resize(name).

Service Binding

/**
 * This interface is implemented by service brokers to process requests to create and delete service instance bindings.
 *
 * @author sgreenberg@pivotal.io
 * @author Scott Frederick
 */
public interface ServiceInstanceBindingService {

	/**
	 * Create a new binding to a service instance.
	 *
	 * @param request containing the details of the request
	 * @return a {@link CreateServiceInstanceBindingResponse} on successful processing of the request
	 * @throws ServiceInstanceBindingExistsException if a binding with the given ID is already known to the broker
	 * @throws ServiceInstanceDoesNotExistException if a service instance with the given ID is not known to the broker
	 * @throws ServiceBrokerBindingRequiresAppException if the broker only supports application binding but an
	 *                                                  app GUID is not provided in the request
	 */
	CreateServiceInstanceBindingResponse createServiceInstanceBinding(CreateServiceInstanceBindingRequest request);

	/**
	 * Get the details of a binding to a service instance.
	 *
	 * @param request containing the details of the request
	 * @return a {@link GetServiceInstanceBindingResponse} on successful processing of the request
	 * @throws ServiceInstanceDoesNotExistException if a service instance with the given ID is not known to the broker
	 * @throws ServiceInstanceBindingDoesNotExistException if a binding with the given ID is not known to the broker
	 * @throws ServiceBrokerOperationInProgressException if a an operation is in progress for the service binding
	 */
	default GetServiceInstanceBindingResponse getServiceInstanceBinding(GetServiceInstanceBindingRequest request) {
		throw new UnsupportedOperationException("This service broker does not support retrieving service bindings. " +
				"The service broker should set 'bindings_retrievable:false' in the service catalog, " +
				"or provide an implementation of the fetch binding API.");
	}

	/**
	 * Delete a service instance binding.
	 *
	 * @param request containing the details of the request
	 * @throws ServiceInstanceDoesNotExistException if a service instance with the given ID is not known to the broker
	 * @throws ServiceInstanceBindingDoesNotExistException if a binding with the given ID is not known to the broker
	 */
	void deleteServiceInstanceBinding(DeleteServiceInstanceBindingRequest request);
}

I didn’t go too deep into the service binding so far. I just used it to get the credentials by PKS.get-credentials(name) in getServiceInstanceBinding(). This implementation has to be revisited in a later stage of my project.

PKS API for vRealize Automation Next Steps

Implementation

The current implementation of the service described in this document was more an exploration of the challenges that this level of automation will bring. There is still a lot of work to do:

  • Parse all k8s deployment manifests properly where further automation is required
  • React on redeployments and scaling properly (changing IPs, DFW rules and LB services)
  • React properly on deployment/VM deletion via vRA frontend

For the full integration within our vision

  • Find solutions for the different login token providers
  • Take care about different billing systems and monitoring
  • Integrate into ITSM systems.

My application is already containerized and can be deployed on vRA containers as well as on k8s clusters.

Open Source

Swisscom is committer in various open source projects. I’d like to open source this implementation and find a community that is interested in pushing this idea.

Next Blog Post

In the next blog post, I will show you how we are deploying OpenShift Container Platform on vRA at scale and I hope I can tell you some great news from the Cloud Foundry Summit in Boston.