Currently, the traffic between the router and app-fabric-server is unencrypted. There is sensitive data that flows between these nodes; for example, passwords and access ids being stored in the secure store.
This is a proposal to enable SSL between these nodes so that the data could be encrypted.
The broader goal is to have the option to authenticate and secure all communication between various components. TLS/SSL provides a way to do that. The way it provides authentication is through signed X.509 certificates. The certificates are signed by a trusted third party, generally a certificate authority. When a new connection is established one or both parties send their certificates to the other party. The other party then verifies the certificate it received as belonging to their peer.
The problem with using CA-signed certificates is that we would need one for each component that we want to authenticate and given that we don't know which node most of these components would end up running on, the certificates would need to be distributed to all nodes in the cluster. Getting many certificates signed may not be cost effective.
Securtiy considerations for keystores and truststores
Truststores do not contain sensitive information, it is reasonable to create a single truststore for an entire cluster. On a production cluster, such a truststore would often contain a single CA certificate (or certificate chain), since you would typically choose to have all certificates issued by a single CA.
Keystores on the other hand contain private keys and need to be secured. It might not be a good idea to distribute them on all nodes of an unsecured cluster. Generally, keystore on a node only contains the keys for the components running on that to reduce the risk but as we can not pre-determine where various cdap components will run the keystore on all nodes will need to contain keys for all components. This increases the risk.
One way to mitigate this is risk is to store all the private keys in a secured storage and provide, as configuration, the keys that various components can use to access their private data. This way when a server component comes up, it can get its private keys from the storage, if configured to do so. This will prevent having to keep keystores with private data on unsecured nodes.
Another way to do authentication is to utilize the fact that our components talk to zookeeper to register and discover services. We can stored shared secrets in ACL controlled locations that only
TLS/SSL
Transport Layer Security (TLS) and its predecessor, Secure Sockets Layer (SSL), both frequently referred to as "SSL", are cryptographic protocols that provide communications security over a computer network.
When secured by TLS, connections between the client and server have one or more of the following properties.
- Security: The connection is secured by symmetric key cryptography. The key for this symmetric encryption is generated at the beginning of a connection and is based on a shared secret between the client and the server.
- Authentication: The communicating parties can optionally be authenticated.
- Integrity: Each message transmitted includes a message integrity check using a message authentication code
The router supports SSL in server mode—external entities can enable SSL for their connection to the router—but the router currently does not have the option to enable SSL in client mode.
We need the following to enable SSL between router and app-fabric-server:
- Enable SSL in client mode on the router:
- Needs a key store
- Needs a certificate
- Enable app-fabric-server to accept SSL connection requests:
- Needs a key store
- Needs a certificate
Certificates
Certificates are needed for each entity that needs to be uniquely identified. These are generated by the client and provided through configuration. In this case, we need certificates for the router and the app-fabric-server. The same certificate could be used by the client and the server on the router.
Access to the certificate needs to be secure. Right now, we put the certificate on the disk; this is not safe to do on an insecure node. The assumption is that as the app-fabric-server will be running on an insecure node, we would like to provide a safer option.
One way to access the certificate in a safe manner would be to put it in a KMS. The access to a KMS is already over SSL (customer configured). We can then fetch the certificate on server initialization and put it in an in-memory key store. If the customer does not use KMS, we can fall back to the current file-based implementation. We can extend this to include other methods to access the certificate securely in the future.
Configuration
- SSL.enabled would enable SSL everywhere.
- SSL port for the app-fabric-server.
- SSL port for the router.
- Key store type: KMS or key store file type.
- Key store path: could be a file or KMS URI.
- Key store password: if using a file, the password for the key store file.
- Keystore key password: if using a file, the password for the key in the key store.
- Keystore router key: if using KMS, then the key under which the certificate is stored.
- Keystore app-fabric key: if using KMS, then the key under which the certificate is stored.
Design
The user would need to provide the key under which the certificate for app-fabric-server is stored in KMS. During server initialization, if ssl is enabled and KMS related properties are set, we would fetch the certificate from KMS and create a local keystore. This is to avoid the user from having to d
Performance Impact
We would need to run performance tests to figure out the impact of enabling SSL. Based on current research, the cost should be manageable, adding about 2-5% of CPU overhead. This would need to be verified.
If the impact is higher or if we deem the impact to be significant, we can choose to separate the SSL enabling flag for the Router server, as it is currently, and use another flag for the traffic between router and app-fabric-server.
Alternative Approach #1
Router and App Fabric can both generate key-pairs when they come up and write their respective public keys to an ACL-controlled znode on zookeeper. Router encrypts a registration request to app-fabric using app-fabric's public key, app-fabric decrypts it using its private key and router’s public key and authorizes router as a client. The router and app-fabric server can then exchange a symmetric key. Once this handshake is complete, any messages exchanged could be encrypted using the shared symmetric key.
Pros:
- Does not depend on the customer having KMS.
- The customer does not need to generate and distribute certificates for various components.
Cons:
- We need to handle the handshake.
- We need to handle the encryption.
- Since the public key storage is dependent on zookeeper, any changes there could require changes in our handshake code.
- Customers would probably want some guarantees about the security, this is easier if we are using an already proven library.
Alternative Approach #2
Newer versions of Netty (>4.0) have a richer SSL handling APIs than the version that we are using(3.6). We could upgrade Netty, this would require some work. It would then be easier to add SSL between various components. The certificate handling would still be similar to the original proposal.
Alternative Approach #3
Use zookeeper for sharing a secret between the client and the server. Use SASL for authentication and establishment of a security layer between client and server applications.