An Introduction to Kerberos
Kerberos Lives in Hades
When I joined Datastax, I was immediately cast into Hades to come to terms with it's guardian, Kerberos. I was given an arcane document instructing me what secret phrases I would need to invoke to conjure this creature and persuade him to guard the information I was confiding in Cassandra. It took a few tries to get it right. I have the scars to prove it. But I survived to pass my experience on to new people delving into Kerberos based security for the first time.
Kerberos is the gold standard in security, designed to provide users on a network access to the network's services with a single sign on. DSE 3.0 has been kerberized so that it can be integrated into a Kerberos protected network.
I will be your guide for your first steps into this secure environment. First, rest assured, the arcane phrases we were using when I first started testing our Kerberos security have been vastly simplified. You should be able to digest them fairly easily from our documentation. The following is a little background that might illuminate why we do what we do.
User Principals
In Kerberos, system users are given user principals to represent the identity of a user on the system. A principal is made up of three parts constructed as follows: primary/instance@realm. A realm is essentially a database of identities and their credentials. By convention, realms are usually an all-upper case version of your networks domain, eg DATASTAX.COM. The instance is (usually) an optional clause modifying the primary name of the principal, eg admin. This allows a single user to take on several roles. Most users will have a principal that looks like this: mark@DATASTAX.COM. Some users may also have principals that look like this: mark/admin@DATASTAX.COM.
Cassandra also needs to know about the principals you create. For each user principal, you need to add that principal to cassandra using cqlsh 3. If you are also using cassandra's authorization system, you'll need to grant these principals permissions as well. When creating and referring to user principals in cassandra, you will need to use the full principal name including the realm. To create your first user, you'll need to set up a principal named cassandra in kerberos and use that to log in. This is the default super user. We recommend that after doing so you immediately establish a new super user and remove cassandra as a super user from both kerberos and cassandra.
Service Principals
Kerberos doesn't just protect your network services from intruders. It also protects users from false services. Each network service is given a service principal. Service principals have a special convention applied to them, specifically that each instance of a service (running on different nodes/hosts) has its own principal, generally in the format of service/fully qualified domain name@realm. The fully qualified domain name is that of the host that the service is running on.
Hadoop requires that we set up two service principals, one named after the unix user name that starts the service. When you install from packages, you will run the service as dse. If you install from tarball, you may adjust the name of this principal as appropriate. Cassandra uses this service principal as well. The other service principal required by Hadoop is named HTTP. Solr also depends on the HTTP principal. These requirements come together to mean that you must create in kerberos two service principals per node and their names are fixed. For example: dse/austin01@DATASTAX.COM, HTTP/austin01@DATASTAX.COM, dse/austin02@DATASTAX.COM, HTTP/austin02@DATASTAX.COM....
Keytabs
When a user authenticates with kerberos, they request a ticket from a central server called the Key Distribution Center, or KDC. This ticket is decrypted by their password and can be used to authenticate them with the services they are trying to access. But as I mentioned before, Kerberos is also responsible for authenticating the services to you. How does a service provide the equivalent of a password? If you store the password in a script, it's not particularly secure. The answer to this dilemma is creating a keytab file. You ask the KDC to create a permanent file version of specific credentials. You can, in fact, put several credentials into the same keytab (the tab stands for table, it stores a table of credentials). Scripts and programs can use this file to authenticate as a particular service.
Datastax security relies on using keytabs. You should create one that stores your dse and HTTP principals for each node on that node itself. You must exercise maximum security on this file. It should only be readable by the user that is running the Datastax service. Only Datastax should be using this file, and it doesn't need to have a human-memorable password. When you create your service principals, you can ask Kerberos to give them a random password. This provides an additional layer of security.
Quality of Protection (QOP)
The primary purpose of Kerberos is to authenticate users and network services to each other. However, it can also provide the ability to sign and encrypt data passing between network services and users if you desire it (similar to SSL). The Quality of Protection refers to how much protection your data has as it moves across the network.
If you plan on using Kerberos, be sure to set the QOP setting in Datastax to the appropriate level. DSE will use whatever the standard encryption setting is in your kerberos set up. The QOP setting in our configuration files refers to whether kerberos will do authentication only (auth); authentication and data signing (auth-int); or authentication, data signing, and encryption (auth-conf).
Hades, revisted
When I started my journey in Hades, I had to use these arcane phrases in half a dozen different Hadoop and Solr configuration files, sometimes repeating the information several times. Our engineers have simplified that for you to the point that you can provide this information once in your dse.yaml file. The rest is automatically taken care of.