Sunday 16 August 2015

Detailed Procedure On Securing Hadoop Cluster Using Kerberos


Lets discuss the Hadoop Security Implementation. Setting up a Secure Hadoop Cluster and the proper maintenance is a daily task for the "Hadoop Admin". But sometimes it seems to be a hectic process and time consuming. Let's make it simple by the following steps.

1.Installing the Key Distribution Center in the NameNode

2.Setting up the Kerberos client on all the Hadoop nodes

3.Selecting the Domain Name and Realm

4.Configure Kerberos

5.Create the KDC database Using “kdb5_util”

6.Setting up the administrator principal for KDC

7.Add administrators to the ACL file

8.Start the Kerberos Daemons

9.Add administrators to Kerberos Database

10.Change the permissions on the log files in /var/log.

11.Add host principal

12.Add root principal

13.Setting up Hadoop service principals

14.Creating a keytab file for the Hadoop services

15.Create the Cloudera Manager Server principal and keytab

16.Deploy the CM Server keytab with proper permission

17.Create the cmf.principal

18.Configure the Kerberos Default Realm in CM

19.Create the hdfs Super User Principal

20.Distributing the keytab file for all the slaves

21.Create Kerberos Principal for Each User Account

22.Prepare the Cluster for Each User

23.Verify that Kerberos Security is working

24. Let's do a simple Map-Reduce job as a secured user on the cluster.

Let's Go through each steps in detail. Lets see the cluster details first. for demonstrating this, I'm Using the following nodes.

Cluster Details: Service Name: Alias Name, FQDN
Namenode: nn, nn.example.com
SecondaryNamenode: snn, snn.example.com
Datanode(s): dnX: dnX.example.com
(X--1,2,3,4,5, etc.)

1.Installing the Key Distribution Center in the NameNode

We need to install the KDC server on the Namenode. Folowing are the packages required to install on KDC Server.

[root@nn ~]# yum install openldap-clients

[root@nn ~]# yum install krb5-server krb5-libs krb5-workstation

[root@nn ~]# yum install pam_krb5-2.3.11-9.el6.x86_64
 
To verify the packages


[root@nn ~]# rpm -qa | grep -i krb*
krb5-workstation-1.10.3-10.el6_4.6.x86_64

krb5-server-1.10.3-10.el6_4.6.x86_64

krb5-libs-1.10.3-10.el6_4.6.i686

pam_krb5-2.3.11-9.el6.x86_64

krb5-libs-1.10.3-10.el6_4.6.x86_64
[root@nn ~]#

   2.Setting up the Kerberos client on all the Hadoop nodes

In each of the Hadoop node (master node and slave node), we need to install the Kerberos client. This is done by installing the client packages and libraries on the Hadoop nodes. On All the Hadoop Nodes including the NameNode we need to install the following packages.

[root@dn1~]# yum install krb5-libs krb5-workstation
[root@dn1~]# yum install pam_krb5-2.3.11-9.el6.x86_64
To verify the packages
[root@dn1~]# rpm -qa | grep -i krb*
pam_krb5-2.3.11-9.el6.x86_64
krb5-workstation-1.10.3-10.el6_4.6.x86_64
krb5-libs-1.10.3-10.el6_4.6.x86_64
krb5-libs-1.10.3-10.el6_4.6.i686
[root@l4dridap2657 ~]#

3.Selecting the DomainName and REALM

For demonstrating this section I'm selecting the following DOMAIN and REALM.


DOMAIN Name:
TEST.HADOOP.COM

REALM Name:

TEST.HADOOP.COM

4.Configure Kerberos

Once Kerberos is installed on the required machines, we can go ahead and configure the Kerberos Server. We have mainly three configuration files.
krb5.conf  file in /etc/ 
kdc.conf file in /var/kerberos/krb5kdc  
kadm5.acl file in /var/kerberos/krb5kdc directories respectively.



[root@nn ]#vim /etc/krb5.conf
[logging]
 default = FILE:/var/log/krb5libs.log
 kdc = FILE:/var/log/krb5kdc.log
 admin_server = FILE:/var/log/kadmind.log

[libdefaults]
 default_realm = TEST.HADOOP.COM
 dns_lookup_realm = false
 dns_lookup_kdc = false
 max_life = 1d
 max_renewable_life = 7d
 ticket_lifetime = 24h
 renew_lifetime = 7d
 forwardable = true
 default_tgs_enctypes = aes256-cts aes128-cts arcfour-hmac des3-hmac-sha1 des-hmac-sha1 des-cbc-md5 des-cbc-crc
 default_tkt_enctypes = aes256-cts aes128-cts arcfour-hmac des3-hmac-sha1 des-hmac-sha1 des-cbc-md5 des-cbc-crc

[realms]
 TEST.HADOOP.COM = {
  kdc = nn.example.com:88
  admin_server = nn.example.com:749
  default_domain = TEST.HADOOP.COM
 }

[domain_realm]
 .example.com = TEST.HADOOP.COM
 example.com = TEST.HADOOP.COM



[root@nn ]#vim /var/kerberos/krb5kdc/kdb.conf

[kdcdefaults]

 kdc_ports = 88

 kdc_tcp_ports = 88



[realms]

 TEST.HADOOP.COM = {

  #master_key_type = aes256-cts

  acl_file = /var/kerberos/krb5kdc/kadm5.acl

  dict_file = /usr/share/dict/words

  admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab

  max_life = 1d

  max_renewable_life = 7d

  supported_enctypes = aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal

 }

5.Create the KDC database Using “kdb5_util”

The Kerberos server is installed and configured. Let's go ahead and create the kerberos database.

[root@nn]# kdb5_util create -r TEST.HADOOP.COM -s
Loading random data

Initializing database '/var/kerberos/krb5kdc/principal' for realm 'TEST.HADOOP.COM',

master key name 'K/M@TEST.HADOOP.COM'

You will be prompted for the database Master Password.

It is important that you NOT FORGET this password.
Enter KDC database master key:

Re-enter KDC database master key to verify:
[root@nn ]#
 
Enter the database password, This will be the master password, keep this password safe. we need this password in future to do any admin related activities.
This will create four files in the directory specified in your kdc.conf file
Two Kerberos database files, principal.db and principal.ok;  
The Kerberos administrative database file, principal.kadm5
& The administrative database lock file, principal.kadm5.lock.

[root@nn]# ls /var/kerberos/krb5kdc/principal*

/var/kerberos/krb5kdc/principal 

/var/kerberos/krb5kdc/principal.kadm5 

/var/kerberos/krb5kdc/principal.kadm5.lock 

/var/kerberos/krb5kdc/principal.ok
[root@nn]#

6.Setting up the administrator principal for KDC

Once the KDC database is created, the administrator principal should be configured in the database. To do this, first add the administrator principal in the /var/ kerberos/krb5kdc/kadm.acl file that contains the access control list (ACL) that is used by the kadmind daemon to manage the Kerberos database access.

7.Add administrators to the ACL file

You need to create an Access Control List(ACL) file, and put the Kerberos principal of at least one of the administrators into it. The file name should match the value you have set for "acl_file" in your kdc.conf file.
[root@nn krb5kdc]# vi kadm5.acl
*/admin@TEST.HADOOP.COM    *
[root@nn krb5kdc]#

8.Start the Kerberos Daemons

At this point, you are ready to start the Kerberos daemon.


[root@nn /]# service krb5kdc start
[root@nn /]# service kadmin start
[root@nn /]# chkconfig krb5kdc on
[root@nn /]# chkconfig kadmin on
And to verify
[root@nn /]# chkconfig --list | grep -i krb5kdc

krb5kdc  0:off   1:off   2:on    3:on    4:on    5:on    6:off

[root@nn /]# chkconfig --list | grep -i kadmin

kadmin   0:off   1:off   2:on    3:on    4:on    5:on    6:off

[root@nn /]#
 
krb5kdc is the KDC server, while the kadmin daemon enables administrators to connect from remote machines and perform Kerberos (KDC) administration using the kadmin client.

9.Add administrators to Kerberos Database

You need to add administrative principles to the Kerberos database. To do this, use kdamin.local on the KDC. The administrative principle you create should be the ones you added to the ACL file. We have two methods for accessing the kadmin shell, one without password and the other with password. Both the way is listed below. Here kadmin.local need to be executed from the KDC server where as kadmin can be initialized from any machine in the cluster.
[root@nn]# kadmin.local
Authenticating as principal root/admin@TEST.HADOOP.COM with password.

kadmin.local:  quit
[root@nn]#

OR

[root@snn]# kadmin
Authenticating as principal root/admin@TEST.HADOOP.COM with password.

Password for root/admin@TEST.HADOOP.COM:

kadmin:  q
[root@nn]#
Now let’s create the admin principal.


[root@nn]# kadmin.local
Authenticating as principal root/admin@TEST.HADOOP.COM with password.
kadmin.local:  addprinc root/admin@TEST.HADOOP.COM
WARNING: no policy specified for root/admin@TEST.HADOOP.COM; defaulting to no policy
Enter password for principal "root/admin@TEST.HADOOP.COM":

Re-enter password for principal "root/admin@TEST.HADOOP.COM":
Principal "root/admin@TEST.HADOOP.COM" created.
kadmin.local:

10.Change the permissions on the log files in /var/log.


[root@nn /]# cd /var/log
[root@nn /]# chmod o+w krb5kdc.log
[root@nn /]# chmod o+w kadind.log

11.Add host principal


[root@nn /]# kadmin.local  -r TEST.HADOOP.COM

kadmin: 

kadmin:  add_principal  -randkey  host/nn.example.com

kadmin:  ktadd  host/nn.example.com

kadmin:q
[root@nn /]#

12.Add root principal


 [root@nn /]# kadmin  -p  root/admin@TEST.HADOOP.COM

 kadmin  add_principal  -randkey  host/nn.example.com

 kadmin:  ktadd host/nn.example.com@TEST.HADOOP.COM

 kadmin:  quit
 [root@nn /]#

 

Note:

For those who are using Cloudera Parcels, this is enough; provided the Kerberos should be enabled on the cluster on the required services and the DataNode transiver port and WebUI ports must be one which is less than 1024. LDAP and NT Domain should be configured for those who need to access the cluster through the centralized login. The DataNode folder permission should be set to 700 in the Kerberos enabled environment.

For those Who are using package based installation or rpm/tarball based installation need to follow the remaining steps.

 

13.Setting up Hadoop service principals

In CDH4 onwards, there are three users (hdfs, mapred, and yarn) that are used to run the various Hadoop daemons. All the Hadoop Distributed File System (HDFS)-related daemons such as NameNode, DataNode, and Secondary NameNode are run under the hdfs user, while for MRV1, the MapReduce-related daemons such as JobTracker and TaskTracker run using the mapred user. For MRV2, the yarn user runs ResourceManager and NodeManager, while the mapred user runs the JobHistory server and the MapReduce application.
We need to create the hdfs, mapred, and yarn principals in KDC to ensure Kerberos authentication for the Hadoop daemons. We have http services exposed by all these services, so we need to create an http service principal as well. We use the following kadmin commands to create these principals:


[root@nn]# kadmin

Authenticating as principal root/admin@TEST.HADOOP.COM with password.

Password for root/admin@TEST.HADOOP.COM:

kadmin: addprinc –randkey hdfs/nn.example.com@TEST.HADOOP.COM

kadmin: addprinc –randkey mapred/nn.example.com@TEST.HADOOP.COM

kadmin: addprinc –randkey http/nn.example.com@TEST.HADOOP.COM

kadmin: addprinc –randkey yarn/nn.example.com@TEST.HADOOP.COM
kadmin:

As a part of the Hadoop cluster setup, all the HDFS-related directories that are exclusively used by the hdfs daemons such as the NameNode directory, the DataNode directory, and log directories, should have the permissions with hdfs as user and group. Also, all folders inside Hadoop and in the local filesystem used by the MapReduce daemons exclusively such as the MapReduce local directory; log directories should have mapred as user and group. All directories that are used between hdfs and mapred daemons should have Hadoop as the user group.

14.Creating a keytab file for the Hadoop services

A keytab is a file containing pairs of Kerberos principals and encrypted keys derived from the Kerberos password. This file is used for headless authentication with KDC when the services run in the background without human intervention. The keytab file is created using the kadmin commands.
The hdfs and mapred users run multiple Hadoop daemons in background, so we need to create the keytab file for the hdfs and mapred users. We also need to add the http principal to these keytabs, so that the Web UI associated with Hadoop are authenticated using Kerberos.



[root@nn /]# kadmin

Authenticating as principal root/admin@TEST.HADOOP.COM with password.

Password for root/admin@TEST.HADOOP.COM:



kadmin:

kadmin: xst -norandkey -k hdfs.keytab hdfs/nn.example.com@TEST.HADOOP.COM

kadmin: xst -norandkey -k mapred.keytab hdfs/nn.example.com@TEST.HADOOP.COM

kadmin: xst -norandkey -k yarn.keytab hdfs/nn.example.com@TEST.HADOOP.COM

kadmin: xst -norandkey -k http.keytab hdfs/nn.example.com@TEST.HADOOP.COM

kadmin: xst -norandkey -k impala.keytab hdfs/nn.example.com@TEST.HADOOP.COM

kadmin: xst -norandkey -k hue.keytab hdfs/nn.example.com@TEST.HADOOP.COM

kadmin: xst -norandkey -k oozie.keytab hdfs/nn.example.com@TEST.HADOOP.COM

kadmin: xst -norandkey -k flume.keytab hdfs/nn.example.com@TEST.HADOOP.COM

kadmin: xst -norandkey -k sqoop.keytab hdfs/nn.example.com@TEST.HADOOP.COM

kadmin: xst -norandkey -k hive.keytab hdfs/nn.example.com@TEST.HADOOP.COM
kadmin:quit
[root@nn /]#

15.Create the Cloudera Manager Server principal and keytab

Create Cloudera Manager Server principal and keytab for the CM Server.


kadmin:  addprinc  -randkey  cloudera-scm/admin@TEST.HADOOP.COM

kadmin:  xst  -k  cmf.keytab  cloudera-scm/admin@TEST.HADOOP.COM

16.Deploy the CM Server keytab with proper permission


#mv  cmf.keytab  /etc/cloudera-scm-server

#chown cloudera-scm:cloudera-scm  /etc/cloudera-scm-server/cmf.keytab

#chmod  600  /etc/cloudera-scm-server/cmf.keytab

17.Create the cmf.principal

Create a file called cmf.principal and Add  the following content in that file


#vi cmf.principal
cloudera-scm/admin@TEST.HADOOP.COM 

Move  cmf.principal  to /etc/cloudera-scm-server and change the permission


# mv cmf.principal /etc/cloudera-scm-server/cmf.principal

chown  cloudera-scm:cloudera-scm  /etc/cloudera-scm-server/cmf.principal

chmod  600  /etc/cloudera-scm-server/cmf.principal

18.Configure the Kerberos Default Realm in CM

Here the default realm is TEST.HADOOP.COM

19.Create the hdfs Super User Principal


kadmin:  addprinc  hdfs@TEST.HADOOP.COM

20.Distributing the keytab file for all the slaves

Once the keytab file is created, it has to move to the /etc/hadoop/conf folder. The keytab file has to be secured so that only the owner of keytab can see this file. For this, the hdfs and mapred owner of the keytab file is changed, and the file permission is changed to 400. The service principals for hdfs, mapred, and http has a fully qualified domain name associated with the username. The service principal is host-specific and is unique for each of the nodes in the cluster. Move the keytab file to the conf folder and secure it


# mv hdfs.keytab mapred.keytab /etc/hadoop/conf/

# chown hdfs:hadoop /etc/hadoop/conf/hdfs.keytab

# chmod 400 /etc/hadoop/conf/hdfs.keytab

# chmod 400 /etc/hadoop/conf/mapred.keytab
The keytab file should be created specific to each node in the cluster. Distributing and managing the keytab file in a large cluster is time consuming and error prone. So it is better to use deployment tools and automate this deployment.

21.Create Kerberos Principal for Each User Account

In the kadmin.local or kadmin shell, use the following command to create a principal for your account by replacing YOUR-LOCAL-REALM.COM with the name of your realm, and replacing USERNAME with a username:
Syntax:
kadmin:  addprinc USERNAME@YOUR-LOCAL-REALM.COM




# addprinc USER1@TEST.HADOOP.COM

When prompted, enter a password twice and keep the password safe, required later.

 22.Prepare the Cluster for Each User


Before you and other users can access the cluster, there are a few tasks you must do to prepare the hosts for each user.
Make sure all hosts in the cluster have a Linux user account with the same name as the first component of that user's principal name. For example, the Linux account  a0686465 should exist on every box if the user's principal name is user1@TEST.HADOOP.COM. You can use LDAP instead.
Note:

Each account must have a user ID that is greater than or equal to 1000. In the /etc/hadoop/conf/taskcontroller.cfg file, the default setting for the banned.users property is mapred, hdfs, and bin to prevent jobs from being submitted via those user accounts. The default setting for the min.user.id property is 1000 to prevent jobs from being submitted with a user ID less than 1000, which are conventionally Unix super users.
   
Create a subdirectory under /user on HDFS for each user account (for example, /user/user1). Change the owner and group of that directory to be the user.

# hadoop fs -mkdir /user/user1
# hadoop fs -chown user1  /user/user1


23.Verify that Kerberos Security is working

After you have Kerberos credentials, you can verify that Kerberos security is working on your cluster by trying to run MapReduce jobs. to get the Kerberos credentials for your user account, login as the user through command prompt,


$ kinit user1@TEST.HADOOP.COM
$

 Enter a password when prompted.

24. Let's do a simple Map-Reduce job as a secured user on the cluster. 

Submit a sample pi calculation as a test MapReduce job. Use the following command if you use a parcel-based setup for Cloudera Manager:

$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-examples.jar pi 10 10000
Number of Maps = 10
Samples per Map = 10000
...
Job Finished in 30.958 seconds
Estimated value of Pi is 3.14120000000000000000

 

 

 

 

 

 

 

 

 

No comments :