Hadoop Admin Tips

Friday, 14 November 2014

Changing MySQL DB Data Directory Gracefully

Recently I have come across a an error stating "insufficient space to process...." error in mysql. The available solution is to change the default MySQL data directory. Due to the space unavailability in the root(/) partition some times we will come across such situation. By doing a blind location change through the configuration file may cause issues.

The default MySQL directory is /var/lib/mysql . few things need to be noticed before changing the MySQL directory. The MySQL data directory has been mapped to many of its configuration. So keen consideration need to be done before changing he default location to the costume one. Below are the steps need to be followed for changing the default location Gracefully.

Step1:
Switch user to root. You need root privilege to do this operation.

[anu@c3 ~]$ su -
Password: <ENTER ROOT PASSWORD>
[root@c3 ~]#

Step2:
Sop the mysql server. be sure no service is accessing mysql server during the migration process. Execute the following commands inn terminal.

[root@c3 ~]# service mysqld stop
Stopping mysqld:                        [ OK ]
[root@c3 ~]#

Step3:
create new data directory for mysql and give proper ownership to mysql user.

[root@c3 ~]#mkdir /home/mysql
[root@c3 ~]#chown -R mysql.mysql /home/mysql
[root@c3 ~]#

Step4:
Move the entire data directory to new location(Here in my case there is enough space in the /home. So I am taking the /home as the location for the mysql data directory). This step will consume some time, because the entire database has to be moved to the new location.

[root@c3 ~]#mv /var/lib/mysql/* /home/mysql
[root@c3 ~]#

Step5:
Remove the original mysql folder from the location /var/lib.

[root@c3 ~]#rm -rf /var/lib/mysql
[root@c3 ~]#

Step6:
Edit the mysql configuration file and map the new data location. In the mysqld part change the location fro default to new location. the details below make sense.

[root@c3 ~]# vim /etc/my.cnf

    [mysqld]
    #datadir=/var/lib/mysql
    #socket=/var/lib/mysql/mysql.sock
    #user=mysql
Change to
    datadir=/home/mysql
    socket=/home/mysql/mysql.sock
    user=mysql

Step7:
Create symbolic link to the original location so that the configuration mismatch error can be avoided.

[root@c3 ~]#ln -s /home/mysql /var/lib
[root@c3 ~]# ls -la /var/lib/mysql
lrwxrwxrwx. 1 root root 12 Oct 21 20:09 /var/lib/mysql -> /home/mysql/
[root@c3 ~]#

Step7:
Let's Start mysql service

[root@c3 ~]# service mysqld start
Starting mysqld:                           [ OK ]
[root@c3 ~]#

Step8:
Now, try to access the mysql shell

[root@c3 ~]# mysql -u root -p
Enter password: <ENTER MySQL PASSWORD>
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.1.73 Source distribution

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
+--------------------+
2 rows in set (0.00 sec)

mysql>

Step9:
Consider changing the tmpdir of mysql as well. You should edit your my.cnf

[root@c3 ~]# vim /etc/my.cnf
    [mysqld]    datadir=/home/mysql
    socket=/home/mysql/mysql.sock
    user=mysql
    tmpdir = /new/tmp/location

Step10:
Cheers!..

Saturday, 4 January 2014

Password Less SSH Authentication On All The Nodes Of Hadoop Cluster

In many cases the administrator has to log on to the remote nodes in the network. in case of a small network it is easy way to co-ordinate them one by one. If we consider a Data Center, it may consist of thousands of nodes connected together and it will be a difficult job to go and work with each nodes. We can make use of SSH (Secure SHELL). It is one of the most trusted open source network protocol that can be used to log on to the remote node/machine in the same network. We can use it to transfer files across nodes using a secure protocol called SCP (Secure Copy).

We can use open SSH either of the two ways, one using the remote machine password and the another one is using password less ssh login using the ssh Keys. Let's see how to setup password-less login using SSH keys to connect to remote Linux servers without entering password.

Setup SSH Password less Login

Hadoop cluster constitute a large number of linux machines. It is difficult to go and configure each machines in the cluster as they are large in number. So It is better to setup password less SSH login from the admin machine to all the linux machines in the network so that remotely we can administrate the cluster and synchronize the cluster configuration files using SCP protocol etc..

Let's have a look at the network configuration.

192.168.1.101 n1.xyz.com n1

192.168.1.102 n2.xyz.com n2

192.168.1.103 n3.xyz.com n3

192.168.1.104 n4.xyz.com n4

192.168.1.105 n5.xyz.com n5

Here 192.168.1.101 is the admin machine. We need to setup the SSH Password Less Login from this machine to all other nodes.

Install Open SSH clients on all the nodes.

Install open SSH server on the admin machine from which the administrator can log on to the client machine without password (Password less SSH).

#yum -y install openssh-clients

Step 1: Create Authentication SSH-Kegen Keys on admin machine– (192.168.1.101)

First login into admin server 192.168.1.101 with user root and generate a pair of public keys using following command.

[root@n1 ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): (Press Enter)
Enter passphrase (empty for no passphrase): (Press Enter)
Enter same passphrase again: (Press Enter)

Step 2: Create .ssh Directory on all the remaining nodes

Use SSH from server 192.168.1.101 to connect server 192.168.1.102 using root as user and create .ssh directory under it, using following command.

[root@n1 ~]# ssh root@192.168.1.102 mkdir -p .ssh
The authenticity of host '192.168.40.102 (192.168.40.102)' can't be established.
RSA key fingerprint is d1:d4:0a:d8:af:87:e3:a4:72:1d:63:a2:e4:13:68:a1.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.40.102' (RSA) to the list of known hosts.
root@192.168.40.102's password:(Enter Your Password Here) 
[root@n1 ~]#

Step 3: Upload Generated Public Keys to all the remaining nodes

Use SSH from server 192.168.1.101 and upload new generated public key (id_rsa.pub) on server 192.168.1.102 under root‘s .ssh directory as a file name authorized_keys.

[root@n1 ~]# cat .ssh/id_rsa.pub | ssh root@192.168.1.102 'cat >> .ssh/authorized_keys'
root@192.168.40.102's password: Enter Your Password Here

Step 4: Set Permissions on all the remaining nodes

Due to different SSH versions on servers, we need to set permissions on .ssh directory and authorized_keys file.

[root@n1 ~]$ ssh root@192.168.1.102 "chmod 700 .ssh; chmod 640 .ssh/authorized_keys"
root@192.168.1.102's password: [Enter Your Password Here]

Step 5: Login from 192.168.1.101 to 192.168.1.* node without Password

From now onwards you can log into 192.168.1.102 as root user from server 192.168.1.101 as root user without password.

[root@n1 ~]$ ssh root@192.168.1.102

Step 6: Let's disable the SSH Strict_Host_key_Checking to avoid RSA key fingerprint verification.

Uncomment the line # StrictHostKeyChecking ask and change the value from ask to no

# vi /etc/ssh/ssh_config

StrictHostKeyChecking no

Step 2 to step 6 has to be done every node.

Wednesday, 25 December 2013

Issue in Bringing up the "System eth0" in Virtual Machines.

There are many cases in a Linux machine the default Ethernet port "System-eth0" or simply referred as the "eth0" will not come up after the installation of the operating system. there are many error cases will come up . T-shoot will go to a long loop of T-Shoot. The first case where it come up is in case of virtualization. There is simple way to create a virtual machine from an existing system. This is called the VM Cloning. It can occur in case of thin and thick client provisioning of the VM cloning. Let us discuss this issue in detail.

First of all let's create a fresh VM and see whether it is true or not.

Let's check the network configuration whether it is controlled by the "eth0"or not. If we look at the network connection it is clear that the incoming connection is through the "System eth0"

Being this VM is installed from the base, it is using the "System eth0" for the network connectivity for its duplex network connectivity.

Now let's move to "/etc/sysconfig/network-scripts/" directory we can see the configuration files corresponding to the system eth0 i.e. ifcfg-eth0 only by default, as seen below.

WE have a base template Linux VM named "base" (It can be on any platform like VMware OR Oracle VirtualBox etc..). If we need to add a new node to the cluster in the DC(Data Center) We Just Clone from the appropriate template VM. Now let's go and clone the VM and see what happen next.

Here we will do the same steps and see what's the difference.

In this cloned machine instead of System eth0, we have Auto eth1 and Auto eth2. We can see the corresponding configuration files as well in the network configuration directory as follows.

However if you clone a VMWare or Oracle VirtualBox VM, you’ll notice that it kills your network interfaces throwing errors. Let's try to bring up the eth0 and I would like to list out the possible errors coming up.

Now check the status of all the network connection in the machines.

[root@devhost ~]# chkconfig --list | grep network

network 0:off 1:off 2:on 3:on 4:on 5:on 6:off

[root@devhost ~]#

Error: 1

[root@localhost ~]# ifup eth0

Error: No suitable device found: no device found for connection 'System eth0'.

[root@localhost ~]#

#ifup eth0

Device eth0 does not seem to be present, delaying initialization

When we trying to restart the network service we may get some error like this..

Error: 2

[root@localhost ~]# service network restart

Shutting down loopback interface: [ OK ]

Bringing up loopback interface: [ OK ]

Bringing up interface eth0: Error: No suitable device found: no device found for connection 'System eth0'.

[FAILED]

[root@localhost ~]#

Now let's try to check the network monitoring tools like mii-tool and ethtool and find the possible errors.

Error: 3

[root@localhost ~]# mii-tool eth0

SIOCGMIIPHY on 'eth0' failed: No such device

[root@localhost ~]#

Error: 4

[root@localhost ~]# ethtool eth0

Settings for eth0:

Cannot get device settings: No such device

Cannot get wake-on-lan settings: No such device

Cannot get message level: No such device

Cannot get link status: No such device

No data available

[root@localhost ~]#

These are the possible errors comes in while bringing up the eth0 OR system eth0 interface.

So... how to troubleshoot this issue?

Solution:

We started from cloning the VM from the "base" VM. While cloning the VM it will clone the entire VM state. It include the MAC-address/HW-address as well from the base VM to the Cloned one. Machine with same MAC Address will not communicate each other in a network. In order to overcome this the VM will create a new NIC configuration file by default. It is named as either Auto eth1 OR Auto eth2. For confirmation try to clone one more VM from the base machine, and compare the configuration file of eth0 in /etc/sysconfig/network-scripts/. we can see that the MAC Address of eth0 in all machines are the same as that of "base" VM. Let's try to overcome this situation.

What’s happening behind the scene here is, when you clone your VM (VirtualBox OR VMWare) apply a new MAC Address to your network interfaces but they don’t update the Linux configuration files to mirror these changes and so the kernel doesn’t firstly can’t find or start the interface that matches it’s configuration (with the old MAC Address) and it finds a new interface (the new MAC Address) that it has no configuration information for. The result is that only your networking service can only start the loopback networking interface and eth0 is dead.

I don't feel it as a rocket science to fix it, see how to do that.

Let's see the ifcfg-eth0 configuration file

[root@localhost network-scripts]# cat /etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE="eth0"

BOOTPROTO="dhcp"

NM_CONTROLLED="yes"

ONBOOT="yes"

HWADDR=00:0C:29:62:5C:CF

MTU=1500

TYPE=Ethernet

DEFROUTE=yes

PEERDNS=yes

PEERROUTES=yes

IPV4_FAILURE_FATAL=yes

IPV6INIT=no

NAME="System eth0"

UUID=5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03

[root@localhost network-scripts]#

Step: 1

Remove and regenerate the Kernel's networking interfaces rules. The Rules are kept in the location /etc/udev/rules.d/ directory. here we can see a file named 70-persistent-net.rules

# rm -f /etc/udev/rules.d/70-persistent-net.rules

Step: 2

Once we remove the rules related to net interfaces we need to update the operating system about the new MAC/HW address. for this a reboot is necessary.

# reboot

Once the OS is up login as root and check the IP configuration

[root@localhost Desktop]# ifconfig

eth0 Link encap:Ethernet HWaddr 00:0C:29:59:84:BE

inet addr:192.168.40.181 Bcast:192.168.40.255 Mask:255.255.255.0

inet6 addr: fe80::20c:29ff:fe59:84be/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:191 errors:0 dropped:0 overruns:0 frame:0

TX packets:85 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:29908 (29.2 KiB) TX bytes:11864 (11.5 KiB)

lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

inet6 addr: ::1/128 Scope:Host

UP LOOPBACK RUNNING MTU:16436 Metric:1

RX packets:16 errors:0 dropped:0 overruns:0 frame:0

TX packets:16 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:960 (960.0 b) TX bytes:960 (960.0 b)

[root@localhost Desktop]#

Step: 3

UPDATE your interface configuration file. Copy the HW address and replace the hw address with the system eth0 configuration file in the /etc/sysconfig/network-scripts/ifcfg-eth0

# vim /etc/sysconfig/networking/devices/ifcfg-eth0

Step: 4

Remove the MACADDR entry or update it to the new MACADDR for the interface (listed in this file: /etc/udev/rules.d/70-persistent-net.rules/ or from ifconfig command).

Step: 5

Remove the UUID entry

Step: 6

Save and exit the file. On the top we can see the eth0 is active.

Now if we look on the top network connection it is clear that the system eth0 is updated and is active.

if we try to restart the network service we will get the expected output.

[root@localhost Desktop]# service network restart

Shutting down loopback interface: [ OK ]

Bringing up loopback interface: [ OK ]

Bringing up interface eth0: Active connection state: activating

Active connection path: /org/freedesktop/NetworkManager/ActiveConnection/3

state: activated

Connection activated

[ OK ]

[root@localhost Desktop]#

If required we can remove the unwanted interfaces like Auto-eth1 and Auto-eth2...

Try to find the corresponding output with the commands which we have tried which has given unexpected output earlier. and find the difference.

Hence the Issue is solved.

Relation with NM-Controller with MACADDR/HWADDR

Let's go further more in detail with a bit of information in the configuration level. is there any relation with the NM-Controller in the ifcfg-eth* file and MACADDR/HWADDR. I would like to say yes.

For addressing the NIC in the configuration file we are using a parameter MACADDR/HWADDR and the value is the MAC-ID or the HW-Address of the NIC of that machine. If we are using the parameter as MACADDR then the parameter value must be set to NM-Controlled="no" otherwise If we are using the parameter as HWADDR then the parameter value must be set to NM-Controlled="yes". the reverse condition will result the same error to repeat throughout the T-Shoot.

Scenario: 1

HWADDR=00:0C:29:59:84:BE

NM_CONTROLLED="yes"

Connection is successful.

Scenario: 2

HWADDR=00:0C:29:59:84:BE

NM_CONTROLLED="no"

Here the connection is unsuccessful.

Scenario: 3

MACADDR=00:0C:29:59:84:BE

NM_CONTROLLED="no"

Here the situation may change, sometimes it will give the connection sometimes not.

Scenario: 4

MACADDR=00:0C:29:59:84:BE

NM_CONTROLLED="yes"

Those who are experimenting with this issue can have some more interesting shuffling to get the exact result.

If any error please revert back to me through mail.
Email: anulsasidharan@gmail.com