Wednesday 25 December 2013

Issue in Bringing up the "System eth0" in Virtual Machines.



There are many cases in a Linux machine the default Ethernet port "System-eth0" or simply referred as the "eth0" will not come up after the installation of the operating system. there are many error cases will come up . T-shoot will go to a long loop of T-Shoot. The first case where it come up is in case of virtualization. There is simple way to create a virtual machine from an existing system. This is called the VM Cloning. It can occur in case of thin and thick client provisioning of the VM cloning. Let us discuss this issue in detail.
First of all let's create a fresh VM and see whether it is true or not.
Let's check the network configuration whether it is controlled by the "eth0"or not. If we look at the network connection it is clear that the incoming connection is through the "System eth0"


Being this VM is installed from the base, it is using the "System eth0" for the network connectivity for its duplex network connectivity. 
Now let's move to "/etc/sysconfig/network-scripts/" directory we can see the configuration files corresponding to the system eth0 i.e. ifcfg-eth0 only by default, as seen below.



WE have a base template Linux VM named "base" (It can be on any platform like VMware OR Oracle VirtualBox etc..). If we need to add a new node to the cluster in the DC(Data Center) We Just Clone from the appropriate template VM. Now let's go and clone the VM and see what happen next.

Here we will do the same steps and see what's the difference. 



In this cloned machine instead of System eth0, we have Auto eth1 and Auto eth2. We can see the corresponding configuration files as well in the network configuration directory as follows. 


However if you clone a VMWare or Oracle VirtualBox VM, you’ll notice that it kills your network interfaces throwing errors. Let's try to bring up the eth0 and  I would like to list out the possible errors coming up.

Now check the status of all the network connection in the machines.


[root@devhost ~]# chkconfig --list | grep network
network              0:off  1:off  2:on   3:on   4:on   5:on   6:off
[root@devhost ~]#

Error: 1
[root@localhost ~]# ifup eth0
Error: No suitable device found: no device found for connection 'System eth0'.
[root@localhost ~]#


OR

#ifup eth0
Device eth0 does not seem to be present, delaying initialization

When we trying to restart the network service we may get some error like this..

Error: 2


[root@localhost ~]# service network restart
Shutting down loopback interface:                          [  OK  ]
Bringing up loopback interface:                            [  OK  ]
Bringing up interface eth0:  Error: No suitable device found: no device found for connection 'System eth0'.
                                                           [FAILED]
[root@localhost ~]#

Now let's try to check the network monitoring tools like mii-tool and ethtool and find the possible errors.

Error: 3

[root@localhost ~]# mii-tool eth0
SIOCGMIIPHY on 'eth0' failed: No such device
[root@localhost ~]#

Error: 4
[root@localhost ~]# ethtool eth0
Settings for eth0:
Cannot get device settings: No such device
Cannot get wake-on-lan settings: No such device
Cannot get message level: No such device
Cannot get link status: No such device
No data available
[root@localhost ~]#


These are the possible errors comes in while bringing up the eth0 OR system eth0 interface.

So... how to troubleshoot this issue?


Solution:



We started from cloning the VM from the "base" VM. While cloning the VM it will clone the entire VM state. It include the MAC-address/HW-address as well from the base VM to the Cloned one. Machine with same MAC Address will not communicate each other in a network. In order to overcome this the VM will create a new NIC configuration file by default. It is named as either Auto eth1 OR Auto eth2. For confirmation try to clone one more VM from the base machine, and compare the configuration file of eth0 in /etc/sysconfig/network-scripts/. we can see that the MAC Address of eth0 in all machines are the same as that of "base" VM. Let's try to overcome this situation.

What’s happening behind the scene here is, when you clone your VM (VirtualBox OR VMWare) apply a new MAC Address to your network interfaces but they don’t update the Linux configuration files to mirror these changes and so the kernel doesn’t firstly can’t find or start the interface that matches it’s configuration (with the old MAC Address) and it finds a new interface (the new MAC Address) that it has no configuration information for. The result is that only your networking service can only start the loopback networking interface and eth0 is dead.

I don't feel it as a rocket science to fix it, see how to do that.
Let's see the ifcfg-eth0 configuration file

[root@localhost network-scripts]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE="eth0"
BOOTPROTO="dhcp"
NM_CONTROLLED="yes"
ONBOOT="yes"
HWADDR=00:0C:29:62:5C:CF
MTU=1500
TYPE=Ethernet
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=yes
IPV6INIT=no
NAME="System eth0"
UUID=5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03
[root@localhost network-scripts]#

Step: 1

Remove and regenerate the Kernel's networking interfaces rules. The Rules are kept in the location /etc/udev/rules.d/ directory. here we can see a file named 70-persistent-net.rules


# rm -f /etc/udev/rules.d/70-persistent-net.rules

Step: 2
Once we remove the rules related to net interfaces  we need to update the operating system about the new MAC/HW address. for this a reboot is necessary.
# reboot


Once the OS is up login as root and check the IP configuration


[root@localhost Desktop]# ifconfig
eth0        Link encap:Ethernet  HWaddr 00:0C:29:59:84:BE 
          inet addr:192.168.40.181  Bcast:192.168.40.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe59:84be/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:191 errors:0 dropped:0 overruns:0 frame:0
          TX packets:85 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:29908 (29.2 KiB)  TX bytes:11864 (11.5 KiB)

lo        Link encap:Local Loopback 
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:16 errors:0 dropped:0 overruns:0 frame:0
          TX packets:16 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:960 (960.0 b)  TX bytes:960 (960.0 b)

[root@localhost Desktop]#



Step: 3
UPDATE your interface configuration file. Copy the HW address and replace the hw address with the system eth0 configuration file in the /etc/sysconfig/network-scripts/ifcfg-eth0

# vim /etc/sysconfig/networking/devices/ifcfg-eth0

Step: 4
Remove the MACADDR entry or update it to the new MACADDR for the interface (listed in this file: /etc/udev/rules.d/70-persistent-net.rules/ or from ifconfig command).

Step: 5
Remove the UUID entry

Step: 6
Save and exit the file. On the top we can see the eth0 is active. 




Now if we look on the top network connection it is clear that the system eth0 is updated and is active.

if we try to restart the network service we will get the expected output.


[root@localhost Desktop]# service network restart
Shutting down loopback interface:                          [  OK  ]
Bringing up loopback interface:                            [  OK  ]
Bringing up interface eth0:  Active connection state: activating
Active connection path: /org/freedesktop/NetworkManager/ActiveConnection/3
state: activated
Connection activated
                                                           [  OK  ]
[root@localhost Desktop]#


If required we can remove the unwanted interfaces like Auto-eth1 and Auto-eth2...

Try to find the corresponding output with the commands which we have tried which has given unexpected output earlier. and find the difference.


Hence the Issue is solved.

Relation with NM-Controller with MACADDR/HWADDR



Let's go further more in detail with a bit of information in the configuration level. is there any relation with the NM-Controller in the ifcfg-eth* file and MACADDR/HWADDR. I would like to say yes.
For addressing the NIC in the configuration file we are using a parameter MACADDR/HWADDR and the value is the MAC-ID or the HW-Address of the NIC of that machine. If we are using the parameter as MACADDR then the parameter value must be set to  NM-Controlled="no" otherwise If we are using the parameter as HWADDR then the parameter value must be set to  NM-Controlled="yes". the reverse condition will result the same error to repeat throughout the T-Shoot.

Scenario: 1

HWADDR=00:0C:29:59:84:BE 
NM_CONTROLLED="yes"


Connection is successful.



Scenario: 2

HWADDR=00:0C:29:59:84:BE 
NM_CONTROLLED="no"

Here the connection is unsuccessful.


Scenario: 3



MACADDR=00:0C:29:59:84:BE 
NM_CONTROLLED="no"

Here the situation may change, sometimes it will give the connection sometimes not.

Scenario: 4



MACADDR=00:0C:29:59:84:BE 
NM_CONTROLLED="yes"


Those who are experimenting with this issue can have some more interesting shuffling to get the exact result. 



If any error please revert back to me through mail. 
Email: anulsasidharan@gmail.com














No comments :