Lets discuss the Hadoop Security Implementation. Setting up a Secure Hadoop Cluster and the proper maintenance is a daily task for the "Hadoop Admin". But sometimes it seems to be a hectic process and time consuming. Let's make it simple by the following steps.
1.Installing the Key Distribution Center in the NameNode
2.Setting up the Kerberos client on all the Hadoop nodes
3.Selecting the Domain Name and Realm
4.Configure Kerberos
5.Create the KDC database Using “kdb5_util”
6.Setting up the administrator principal for KDC
7.Add administrators to the ACL file
8.Start the Kerberos Daemons
9.Add administrators to Kerberos Database
10.Change the permissions on the log files in /var/log.
11.Add host principal
12.Add root principal
13.Setting up Hadoop service principals
14.Creating a keytab file for the Hadoop services
15.Create the Cloudera Manager Server principal and keytab
16.Deploy the CM Server keytab with proper permission
17.Create the cmf.principal
18.Configure the Kerberos Default Realm in CM
19.Create the hdfs Super User Principal
20.Distributing the keytab file for all the slaves
21.Create Kerberos Principal for Each User Account
22.Prepare the Cluster for Each User
23.Verify that Kerberos Security is working
24. Let's do a simple Map-Reduce job as a secured user on the cluster.
Let's Go through each steps in detail. Lets see the cluster details first. for demonstrating this, I'm Using the following nodes.
Cluster Details: Service Name: Alias Name, FQDN
Namenode: nn, nn.example.com
SecondaryNamenode: snn, snn.example.com
Datanode(s): dnX: dnX.example.com
(X--1,2,3,4,5, etc.)
1.Installing the Key Distribution Center in the NameNode
We need to install the KDC server on the Namenode. Folowing are the packages required to install on KDC Server.
[root@nn ~]# yum
install openldap-clients
[root@nn ~]# yum install krb5-server krb5-libs krb5-workstation
[root@nn ~]# yum install pam_krb5-2.3.11-9.el6.x86_64
|
To verify
the packages
[root@nn ~]# rpm -qa | grep -i krb*
krb5-workstation-1.10.3-10.el6_4.6.x86_64
krb5-server-1.10.3-10.el6_4.6.x86_64
krb5-libs-1.10.3-10.el6_4.6.i686
pam_krb5-2.3.11-9.el6.x86_64
krb5-libs-1.10.3-10.el6_4.6.x86_64
[root@nn ~]#
|
2.Setting up the Kerberos client on all the Hadoop nodes
In each of the Hadoop node (master node and slave node), we
need to install the Kerberos client. This is done by installing the client
packages and libraries on the Hadoop nodes. On All the Hadoop Nodes including the NameNode we need to
install the following packages.
[root@dn1~]# yum install krb5-libs
krb5-workstation
[root@dn1~]# yum install pam_krb5-2.3.11-9.el6.x86_64
To verify the packages
[root@dn1~]# rpm -qa | grep -i krb*
pam_krb5-2.3.11-9.el6.x86_64
krb5-workstation-1.10.3-10.el6_4.6.x86_64
krb5-libs-1.10.3-10.el6_4.6.x86_64
krb5-libs-1.10.3-10.el6_4.6.i686
[root@l4dridap2657 ~]#
|
3.Selecting the DomainName and REALM
For demonstrating this section I'm selecting the following DOMAIN and REALM.
DOMAIN Name:
TEST.HADOOP.COM
REALM Name:
TEST.HADOOP.COM
|
4.Configure Kerberos
Once Kerberos is installed on the required machines, we can go ahead and configure the Kerberos Server. We have mainly three configuration files.
krb5.conf
file in /etc/
kdc.conf file in /var/kerberos/krb5kdc
kadm5.acl file in /var/kerberos/krb5kdc directories respectively.
[root@nn ]#vim /etc/krb5.conf
[logging]
default = FILE:/var/log/krb5libs.log
kdc =
FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
[libdefaults]
default_realm = TEST.HADOOP.COM
dns_lookup_realm = false
dns_lookup_kdc = false
max_life = 1d
max_renewable_life = 7d
ticket_lifetime
= 24h
renew_lifetime = 7d
forwardable = true
default_tgs_enctypes = aes256-cts aes128-cts
arcfour-hmac des3-hmac-sha1 des-hmac-sha1 des-cbc-md5 des-cbc-crc
default_tkt_enctypes = aes256-cts aes128-cts
arcfour-hmac des3-hmac-sha1 des-hmac-sha1 des-cbc-md5 des-cbc-crc
[realms]
TEST.HADOOP.COM
= {
kdc =
nn.example.com:88
admin_server = nn.example.com:749
default_domain = TEST.HADOOP.COM
}
[domain_realm]
.example.com
= TEST.HADOOP.COM
example.com
= TEST.HADOOP.COM
|
[root@nn ]#vim /var/kerberos/krb5kdc/kdb.conf
[kdcdefaults]
kdc_ports = 88
kdc_tcp_ports = 88
[realms]
TEST.HADOOP.COM = {
#master_key_type = aes256-cts
acl_file = /var/kerberos/krb5kdc/kadm5.acl
dict_file = /usr/share/dict/words
admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab
max_life = 1d
max_renewable_life = 7d
supported_enctypes = aes256-cts:normal aes128-cts:normal
des3-hmac-sha1:normal arcfour-hmac:normal des-hmac-sha1:normal
des-cbc-md5:normal des-cbc-crc:normal
}
|
5.Create the KDC database Using “kdb5_util”
The Kerberos server is installed and configured. Let's go ahead and create the kerberos database.
[root@nn]# kdb5_util create -r TEST.HADOOP.COM -s
Loading random data
Initializing database '/var/kerberos/krb5kdc/principal'
for realm 'TEST.HADOOP.COM',
master key name 'K/M@TEST.HADOOP.COM'
You will be prompted for the database Master
Password.
It is important that you NOT FORGET this
password.
Enter KDC database master key:
Re-enter KDC database master key to verify:
[root@nn ]#
|
Enter the database password, This will be the master password, keep this password safe. we need this password in future to do any admin related activities.
This will create four files in the directory
specified in your kdc.conf file:
Two Kerberos database files, principal.db and principal.ok;
The Kerberos administrative
database file, principal.kadm5;
& The administrative database lock file, principal.kadm5.lock.
[root@nn]# ls
/var/kerberos/krb5kdc/principal*
/var/kerberos/krb5kdc/principal
/var/kerberos/krb5kdc/principal.kadm5
/var/kerberos/krb5kdc/principal.kadm5.lock
/var/kerberos/krb5kdc/principal.ok
[root@nn]#
|
6.Setting up the administrator principal for KDC
Once the KDC database is created,
the administrator principal should be configured in the database. To do this,
first add the administrator principal in the /var/ kerberos/krb5kdc/kadm.acl file that contains the access
control list (ACL) that is used by the kadmind
daemon to manage the Kerberos database access.
7.Add administrators to the ACL file
You need to create an Access Control
List(ACL) file, and put the Kerberos principal of at least one of the
administrators into it. The file name should match the value you have set for
"acl_file" in your kdc.conf file.
[root@nn krb5kdc]# vi kadm5.acl
*/admin@TEST.HADOOP.COM *
[root@nn krb5kdc]#
|
8.Start the Kerberos Daemons
At this point, you are ready to
start the Kerberos daemon.
[root@nn /]# service krb5kdc start
[root@nn /]# service kadmin start
[root@nn /]# chkconfig krb5kdc on
[root@nn /]# chkconfig kadmin on
And to verify
[root@nn /]# chkconfig --list | grep -i krb5kdc
krb5kdc 0:off 1:off
2:on 3:on 4:on
5:on 6:off
[root@nn /]# chkconfig --list | grep -i kadmin
kadmin 0:off 1:off
2:on 3:on 4:on
5:on 6:off
[root@nn /]#
|
krb5kdc is the KDC server, while the kadmin daemon enables administrators to connect from remote machines and perform
Kerberos (KDC) administration using the kadmin client.
9.Add administrators to Kerberos Database
You
need to add administrative principles to the Kerberos database. To do this, use
kdamin.local on the KDC. The administrative principle you
create should be the ones you added to the ACL file. We have two methods for
accessing the kadmin shell, one without password and the other with password.
Both the way is listed below. Here kadmin.local need to be executed from the KDC server where
as kadmin can be initialized from any machine in the
cluster.
[root@nn]# kadmin.local
Authenticating as principal root/admin@TEST.HADOOP.COM
with password.
kadmin.local:
quit
[root@nn]#
OR
[root@snn]# kadmin
Authenticating as principal root/admin@TEST.HADOOP.COM
with password.
Password for root/admin@TEST.HADOOP.COM:
kadmin:
q
[root@nn]#
|
Now let’s create the admin
principal.
[root@nn]# kadmin.local
Authenticating as principal root/admin@TEST.HADOOP.COM
with password.
kadmin.local:
addprinc root/admin@TEST.HADOOP.COM
WARNING: no policy specified for root/admin@TEST.HADOOP.COM;
defaulting to no policy
Enter password for principal "root/admin@TEST.HADOOP.COM":
Re-enter password for principal "root/admin@TEST.HADOOP.COM":
Principal "root/admin@TEST.HADOOP.COM" created.
kadmin.local:
|
10.Change the permissions on the log files in /var/log.
[root@nn /]# cd /var/log
[root@nn /]# chmod o+w krb5kdc.log
[root@nn /]# chmod o+w kadind.log
|
11.Add host principal
[root@nn /]# kadmin.local -r TEST.HADOOP.COM
kadmin:
kadmin: add_principal -randkey
host/nn.example.com
kadmin: ktadd host/nn.example.com
kadmin:q
[root@nn /]#
|
12.Add root principal
[root@nn /]# kadmin -p root/admin@TEST.HADOOP.COM
kadmin add_principal -randkey
host/nn.example.com
kadmin: ktadd host/nn.example.com@TEST.HADOOP.COM
kadmin: quit
[root@nn
/]#
|
Note:
For those who are using Cloudera Parcels, this is enough; provided the Kerberos should be enabled on the cluster on the required services and the DataNode transiver port and WebUI ports must be one which is less than 1024. LDAP and NT Domain should be configured for those who need to access the cluster through the centralized login. The DataNode folder permission should be set to 700 in the Kerberos enabled environment.
For those Who are using package based installation or rpm/tarball based installation need to follow the remaining steps.
13.Setting up Hadoop service principals
In
CDH4 onwards, there are three users (hdfs, mapred, and yarn) that are used to
run the various Hadoop daemons. All the Hadoop Distributed File System
(HDFS)-related daemons such as NameNode, DataNode, and Secondary NameNode are
run under the hdfs user, while for
MRV1, the MapReduce-related daemons such as JobTracker and TaskTracker run
using the mapred user. For MRV2, the
yarn user runs ResourceManager and
NodeManager, while the mapred user
runs the JobHistory server and the MapReduce application.
We
need to create the hdfs, mapred, and yarn principals in KDC to ensure Kerberos authentication for the
Hadoop daemons. We have http
services exposed by all these services, so we need to create an http service
principal as well. We use the following kadmin
commands to create these principals:
[root@nn]# kadmin
Authenticating as principal root/admin@TEST.HADOOP.COM with
password.
Password for root/admin@TEST.HADOOP.COM:
kadmin: addprinc –randkey hdfs/nn.example.com@TEST.HADOOP.COM
kadmin: addprinc –randkey mapred/nn.example.com@TEST.HADOOP.COM
kadmin: addprinc –randkey http/nn.example.com@TEST.HADOOP.COM
kadmin: addprinc –randkey yarn/nn.example.com@TEST.HADOOP.COM
kadmin:
|
As
a part of the Hadoop cluster setup, all the HDFS-related directories that are
exclusively used by the hdfs daemons such as the NameNode directory, the
DataNode directory, and log directories, should have the permissions with hdfs
as user and group. Also, all folders inside Hadoop and in the local filesystem
used by the MapReduce daemons exclusively such as the MapReduce local
directory; log directories should have mapred as user and group. All
directories that are used between hdfs and mapred daemons should have Hadoop as
the user group.
14.Creating a keytab file for the Hadoop services
A
keytab is a file containing pairs of Kerberos principals and encrypted keys
derived from the Kerberos password. This file is used for headless
authentication with KDC when the services run in the background without human
intervention. The keytab file is created using the kadmin commands.
The
hdfs and mapred users run multiple Hadoop daemons in background, so we need to
create the keytab file for the hdfs and mapred users. We also need to add the
http principal to these keytabs, so that the Web UI associated with Hadoop are
authenticated using Kerberos.
[root@nn /]# kadmin
Authenticating as principal root/admin@TEST.HADOOP.COM with
password.
Password for root/admin@TEST.HADOOP.COM:
kadmin:
kadmin: xst -norandkey -k hdfs.keytab
hdfs/nn.example.com@TEST.HADOOP.COM
kadmin: xst -norandkey -k mapred.keytab
hdfs/nn.example.com@TEST.HADOOP.COM
kadmin: xst -norandkey -k yarn.keytab
hdfs/nn.example.com@TEST.HADOOP.COM
kadmin: xst -norandkey -k http.keytab
hdfs/nn.example.com@TEST.HADOOP.COM
kadmin: xst -norandkey -k impala.keytab
hdfs/nn.example.com@TEST.HADOOP.COM
kadmin: xst -norandkey -k hue.keytab hdfs/nn.example.com@TEST.HADOOP.COM
kadmin: xst -norandkey -k oozie.keytab
hdfs/nn.example.com@TEST.HADOOP.COM
kadmin: xst -norandkey -k flume.keytab
hdfs/nn.example.com@TEST.HADOOP.COM
kadmin: xst -norandkey -k sqoop.keytab
hdfs/nn.example.com@TEST.HADOOP.COM
kadmin: xst -norandkey -k hive.keytab
hdfs/nn.example.com@TEST.HADOOP.COM
kadmin:quit
[root@nn /]#
|
15.Create the Cloudera Manager Server principal and keytab
Create
Cloudera Manager Server principal and keytab for the CM Server.
kadmin: addprinc -randkey
cloudera-scm/admin@TEST.HADOOP.COM
kadmin: xst -k
cmf.keytab cloudera-scm/admin@TEST.HADOOP.COM
|
16.Deploy the CM Server keytab with proper permission
#mv cmf.keytab /etc/cloudera-scm-server
#chown cloudera-scm:cloudera-scm
/etc/cloudera-scm-server/cmf.keytab
#chmod 600 /etc/cloudera-scm-server/cmf.keytab
|
17.Create the cmf.principal
Create
a file called cmf.principal and
Add the following content in that file
#vi cmf.principal
cloudera-scm/admin@TEST.HADOOP.COM
|
Move
cmf.principal to /etc/cloudera-scm-server
and
change the permission
# mv cmf.principal /etc/cloudera-scm-server/cmf.principal
chown
cloudera-scm:cloudera-scm
/etc/cloudera-scm-server/cmf.principal
chmod 600 /etc/cloudera-scm-server/cmf.principal
|
18.Configure the Kerberos Default Realm in CM
Here the default realm is TEST.HADOOP.COM
19.Create the hdfs Super User Principal
kadmin: addprinc
hdfs@TEST.HADOOP.COM
|
20.Distributing the keytab file for all the slaves
Once
the keytab file is created, it has to move to the /etc/hadoop/conf folder. The keytab file has to be secured so that
only the owner of keytab can see this file. For this, the hdfs and mapred owner
of the keytab file is changed, and the file permission is changed to 400. The
service principals for hdfs, mapred, and http has a fully qualified domain name
associated with the username. The service principal is host-specific
and is unique for each of the nodes in the cluster. Move the keytab file to the
conf folder and secure it
# mv hdfs.keytab mapred.keytab
/etc/hadoop/conf/
# chown hdfs:hadoop
/etc/hadoop/conf/hdfs.keytab
# chmod 400 /etc/hadoop/conf/hdfs.keytab
# chmod 400 /etc/hadoop/conf/mapred.keytab
|
The
keytab file should be created specific to each node in the cluster.
Distributing and managing the keytab file in a large cluster is time consuming
and error prone. So it is better to use deployment tools and automate this
deployment.
21.Create Kerberos Principal for Each User Account
In
the kadmin.local or kadmin shell, use the following command to create a
principal for your account by replacing YOUR-LOCAL-REALM.COM with the name of
your realm, and replacing USERNAME with a username:
Syntax:
kadmin: addprinc USERNAME@YOUR-LOCAL-REALM.COM
# addprinc USER1@TEST.HADOOP.COM
|
When
prompted, enter a password twice and keep the password safe, required later.
22.Prepare the Cluster for Each User
Before
you and other users can access the cluster, there are a few tasks you must do
to prepare the hosts for each user.
Make
sure all hosts in the cluster have a Linux user account with the same name as
the first component of that user's principal name. For example, the Linux
account a0686465 should exist on every box if the user's principal name is user1@TEST.HADOOP.COM. You can use
LDAP instead.
Note:
Each account must have a user ID that is greater
than or equal to 1000. In the /etc/hadoop/conf/taskcontroller.cfg file, the
default setting for the banned.users property is mapred, hdfs, and bin to
prevent jobs from being submitted via those user accounts. The default setting
for the min.user.id property is 1000 to prevent jobs from being submitted with
a user ID less than 1000, which are conventionally Unix super users.
Create
a subdirectory under /user on HDFS for each user account (for example, /user/user1).
Change the owner and group of that directory to be the user.
# hadoop fs -mkdir /user/user1
# hadoop fs -chown user1 /user/user1
|
23.Verify that Kerberos Security is working
After
you have Kerberos credentials, you can verify that Kerberos security is working
on your cluster by trying to run MapReduce jobs. to get the Kerberos credentials for your user
account, login as the user through command prompt,
$ kinit user1@TEST.HADOOP.COM
$
|
Enter
a password when prompted.
24. Let's do a simple Map-Reduce job as a secured user on the cluster.
Submit
a sample pi calculation as a test MapReduce job. Use the following command if
you use a parcel-based setup for Cloudera Manager:
$ hadoop jar
/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-examples.jar pi 10
10000
Number of Maps = 10
Samples per Map = 10000
...
Job Finished in 30.958 seconds
Estimated value of Pi is
3.14120000000000000000
|
No comments :
Post a Comment