Introduction
This wiki will outline how we plan to orchestrate the execution of CDAP programs on top of Kubernetes.
...
- Docker Registry - a stateless server-side application used for storing and distributing Docker images.
- Docker Hub - might be too heavyweight and reliant on external services for our use case.
- Quay (from CoreOS) - not free or open source, so not high on the list.
Miscellaneous
- There is an experimental project which supports running Spark programs on Kubernetes. "The feature set is currently limited and not well-tested. This should not be used in production environments." https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes-cloud.html
MR on Kubernetes seems to be project with very little usage. "This is not robust code. Do not use in production.": https://github.com/turbobytes/kubemr
- To get familiar with how Docker works:
TODO:
- Have some numbers around building a Docker image.
- How can Kubernetes be the runtime under the Twill API, instead of YARN? What are the issues with this integration? What in the Twill API can't be supported?
- Is there a programmatic API (or at least RESTful) around Kubernetes command-line?
- How can CDAP master talk to the Kubernetes master to get program status (or any of the Kubernetes interactions)?
- How long will a Docker image take to run a CDAP program - with and without a base image that has as much as possible of the common stuff?
- How can we leverage functionality in Kubernetes to avoid a dependency on Zookeeper? Or should we just use etcd regardless of whether we're using Kubernetes or not?
- Do we need provisioner hooks? For instance, to kick off an instance of Docker Registry after provisioning a Kubernetes cluster?
- Do research about difficulty of use for YARN vs Kubernetes, ZooKeeper vs etcd.