Security
Authentication
Currently the only supported authentication mechanism is Kerberos, which is disabled by default. For Kerberos to work a Kerberos KDC is needed, which the users needs to provide. The secret-operator documentation states which kind of Kerberos servers are supported and how they can be configured.
Kerberos is supported starting from HDFS version 3.3.x |
1. Prepare Kerberos server
To configure HDFS to use Kerberos you first need to collect information about your Kerberos server, e.g. hostname and port. Additionally you need a service-user, which the secret-operator uses to create create principals for the HDFS services.
2. Create Kerberos SecretClass
Afterwards you need to enter all the needed information into a SecretClass, as described in secret-operator documentation.
The following guide assumes you have named your SecretClass kerberos-hdfs
.
3. Configure HDFS to use SecretClass
The last step is to configure your HdfsCluster to use the newly created SecretClass.
spec:
clusterConfig:
authentication:
tlsSecretClass: tls # Optional, defaults to "tls"
kerberos:
secretClass: kerberos-hdfs # Put your SecretClass name in here
The kerberos.secretClass
is used to give HDFS the possibility to request keytabs from the secret-operator.
The tlsSecretClass
is needed to request TLS certificates, used e.g. for the Web UIs.
4. Verify that Kerberos authentication is required
Use stackablectl stacklet list
to get the endpoints where the HDFS namenodes are reachable.
Open the link (note that the namenode is now using https).
You should see a Web UI similar to the following:
The important part is
Security is on.
You can also shell into the namenode and try to access the file system:
kubectl exec -it hdfs-namenode-default-0 -c namenode — bash -c 'kdestroy && bin/hdfs dfs -ls /'
You should get the error message org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
.
5. Access HDFS
In case you want to access your HDFS it is recommended to start up a client Pod that connects to HDFS, rather than shelling into the namenode. We have an integration test for this exact purpose, where you can see how to connect and get a valid keytab.
Authorization
For authorization we developed hdfs-utils, which contains an OPA authorizer and group mapper. This matches our general OPA authorization mechanisms.
It is recommended to enable Kerberos when doing Authorization, as otherwise you don’t have any security measures at all. There still might be cases where you want authorization on top of a cluster without authentication, as you don’t want to accidentally drop files and therefore use different users for different use-cases. |
In order to use the authorizer you need a ConfigMap containing a rego rule and reference that in your HDFS cluster.
In addition to this you need a OpaCluster that serves the rego rules - this guide assumes it’s called opa
.
---
apiVersion: v1
kind: ConfigMap
metadata:
name: hdfs-regorules
labels:
opa.stackable.tech/bundle: "true"
data:
hdfs.rego: |
package hdfs
import rego.v1
default allow = true
This rego rule is intended for demonstration purposes and allows every operation. For a production setup you probably want to take a look at our integration tests for a more secure set of rego rules. Reference the rego rule as follows in your HdfsCluster:
spec:
clusterConfig:
authorization:
opa:
configMapName: opa
package: hdfs
How it works
Take all your knowledge about HDFS authorization and throw it in the bin. The approach we are taking for our authorization paradigm departs significantly from traditional Hadoop patterns and POSIX-style permissions. |
In short, the current rego rules ignore the file ownership, permissions, ACLs and all other attributes files can have. All of this is state in HDFS and clashes with the infrastructure-as-code approach (IaC).
Instead, HDFS will send a request detailing who (e.g. alice/test-hdfs-permissions.default.svc.cluster.local@CLUSTER.LOCAL
) is trying to do what (e.g. open
, create
, delete
or append
) on what file (e.g. /foo/bar
).
OPA then makes a decision if this action is allowed or not.
Instead of chown
-ing a directory to a different user to assign write permissions, you should go to your IaC Git repository and add a rego rule entry specifying that the user is allowed to read and write to that directory.
Group memberships
We encountered several challenges while implementing the group mapper, the most serious of which being that the GroupMappingServiceProvider
interface only passes the shortUsername
when asking for group memberships.
This does not allow us to differentiate between e.g. hbase/hbase-prod.prod-namespace.svc.cluster.local@CLUSTER.LOCAL
and hbase/hbase-dev.dev-namespace.svc.cluster.local@CLUSTER.LOCAL
, as the GroupMapper will get asked for hbase
group memberships in both cases.
Users could work around this to assign unique shortNames by using hadoop.security.auth_to_local
.
This is however a potentially complex and error-prone process.
We also tried mapping the principals from Kerberos 1:1 to the HDFS shortUserName
, so the GroupMapper has access to the full userName
.
However, this did not work, as HDFS only allows "simple usernames", which are not allowed to contain a /
or @
.
Because of these issues we do not use a custom GroupMapper and only rely on the authorizer, which in turn receives a complete UserGroupInformation
object, including the shortUserName
and the precious full userName
.
This has the downside that the group memberships used in OPA for authorization are not known to HDFS.
The implication is thus that you cannot add users to the superuser
group, which is needed for certain administrative actions in HDFS.
We have decided that this is an acceptable approach as normal operations will not be affected.
In case you really need users to be part of the superusers
group, you can use a configOverride on hadoop.user.group.static.mapping.overrides
for that.
Wire encryption
In case Kerberos is enabled, Privacy
mode is used for best security.
Wire encryption without Kerberos as well as other wire encryption modes are not supported.