Wednesday, December 29, 2010

Magic SysRq Key

You may come across the situation that your Linux server is hung or freeze and your system is not responding even for Ctrl+Alt+Del in console itself and you must need to do a hard reboot by pressing reset button. As everyone know, the hard reboots is not good and can crash the File systems. so what to do now?

There is a way in Linux,

Hold down the Right Alt and SysRq keys and press this sequence:

R E I S U B

This will cleanly unmount the drives, terminate the processes and nicely reboot your machine.

of course, To get this worked, you need to “enable” this feature on the running kernel first !

On 2.6 kernel

echo 1 > /proc/sys/kernel/sysrq

This will do the trick.

In Some distributions, you may have a way to enable this feature at boot time.

On Fedora and RHEL, edit the file /etc/sysctl.conf, and change the line kernel.sysrq = 0 to kernel.sysrq = 1

Detailed document

The magic SysRq key is a key combination understood by the Linux kernel, which allows the user to perform various low level commands regardless of the system's state. It is often used to recover from freezes, or to reboot a computer without corrupting the file system.

To be able to use this functionality the CONFIG_MAGIC_SYSRQ option has to be enabled at kernel compile time.

Linux is very stable, but sometimes a kernel panic could happen, sometimes this stops the X-server and you cant change to the console. What should be done? Hitting the reset button and risk file system integrity?
NO! There is a possibility to shut down the system cleanly or find out the source of the kernel panic. For this purpose there is a kernel option called "Magic SysRQ Key" in the section kernel hacking.
If this option is enabled you can use a set of keyboard commands.
Alt+SysRq+r takes keyboard and mouse control from the X server. This can be useful if the X-Server crashed, you can change to a console and kill the X-Server or check the error log.
Alt+SysRQ+k kills all processes on the current terminal. Its a bad idea to do this on a console where X is running. The graphic will stop and you cant see what you type.
Alt+SysRQ+b is like a reset: a reboot without umounting or sync.
Alt+SysRQ+o shuts down via APM.
Alt+SysRQ+s writes all data from the disc cache to the harddiscs, its a sync.
Alt+SysRQ+u remounts all mounted filesystem readonly. After using this key, you can reboot the system with Alt+SysRQ+b without harming the system.
Alt+SysRQ+m prints memory information to the console.
Alt+SysRQ+e sends SIGTERM to all processes except init.
Alt+SysRQ+i sends SIGKILL to all processes except init.
Alt+SysRQ+l sends SIGKILL to all processes, inclusive init. (The system is not working after using this.)...
To shut down the system after a really bad kernel panic, do the following:
Alt+SysRQ+e (sends TERM-signal, processes can shutdown properly (e.g. save data))
Alt+SysRQ+u (a sync will be done when unmounting anyway)
Alt+SysRQ+i (for the processes that didn't listen for the TERM signal, this is a kill -9 process)
Alt+SysRQ+b (reboot).

To be able to use the SysRq feature, you need enable

echo "1" > /proc/sys/kernel/sysrq

or add an entry to /etc/sysctl.conf:

kernel.sysrq = 1

By the way, its a good idea to print this out and store near your PC, when it happens, you can't check back here...
Ref: http://en.wikipedia.org/wiki/Magic_SysRq_key

Friday, December 24, 2010

Linux Run Level

For each run level, scripts are run to start each individual service, instead of having a few large files to edit by hand. These scripts are located in /etc/rc.d/init.d, and most take as an option start or stop. This is to allow the specific programs to start (on bootup) or stop (on shutdown).

This setup involves a bunch of directories under /etc/rc.d/. These are:

rc0.d Contains scripts to run when the system shuts down. Technically, halt or shutdown bring the system to runlevel 0. This directory is mostly made up of kill commands.

rc1.d through rc3.d Scripts to run when the system changes runlevels. Runlevel 1 is usually single-user mode, runlevel 2 is for multi-user setup without NFS, and runlevel 3 is full multi-user and networking.

Runlevel 4 is typically unused.

rc5.d Scripts to start the system in X11 mode. This is the same as runlevel 3, with the exception that the xdm program starts, which provides a graphical login screen.

rc6.d Scripts to run when the system reboots. These scripts are called by a reboot command.

init.d Actually contains all of the scripts. The files in the rc?.d directories are really links to the scripts in the init.d directory.

The Boot Sequence

Now that we know where files are located, let's look at what happens in a normal Red Hat boot sequence.

Once the system boots, /etc/rc.d/rc.sysinit is run first. The starting runlevel (specified in /etc/inittab) is found, and the /etc/rc.d/rc script is run, with the sole option being the runlevel we want to go to. For most startups, this is runlevel 3.

The rc program looks in the /etc/rc.d/rc3.d directory, executing any K* scripts (of which there are none in the rc3.d directory) with an option of stop. Then, all the S* scripts are started with an option of start. Scripts are started in numerical order—thus, the S10network script is started before the S85httpd script. This allows you to choose exactly when your script starts without having to edit files. The same is true of the K* scripts.

Let's look at what happens when we switch runlevels—say from runlevel 3 (full networking and multi-user mode) to runlevel 1 (single-user mode).

First, all the K* scripts in the level to which the system is changing are executed. My Caldera Preview II (Red Hat 2.0) setup has seven K scripts and one S script in the /etc/rc.d/rc.1/ directory. The K scripts shut down nfs, sendmail, lpd, inet, cron, and syslog. The S script then kills off any remaining programs and executes init -t1 S, which tells the system to really go into single-user mode.

Once in single-user mode, you can switch back to full multi-user mode by typing init 3.

Side-stepping init

There are two additional points I can make here.

Customizing the splash image in GRUB

Customizing the splash image in GRUB

The splash image is the image shown in the background when GRUB (the GRand Unified Bootloader) is displaying the list of operating systems you can boot. Typically, this is the corporate logo of your Linux distribution. But its very simple to customize it to an image of your choice. All you need is the GIMP and gzip. My GIMP version is 1.2. Even older versions may do the job.

Here's how:(You need to have root access)
1)Start the GIMP.
2)Click on File->New or type Ctrl+N
3)In the new image dialog, change Width to 640 pixels and Height to 480 pixels. (The image should be of size 640x480 pixels.) Now click OK.
4)Create the image which you would like to be the splash image. It's quite fun to experiment with the various tools of the GIMP!
5)After you have finished creating the image, hit Alt+i or right click on the image and click on Image->Mode->Indexed...
6)In the Indexed Color Conversion dialog that appears, click on the radio button "Generate optimal Palette" and in "# of colors" enter 14. Click OK.(The image should be of only 14 colors)
7)Now right-click on the image and click on File->Save As...Save the file as splash.xpm in a directory of your choice.
8)Now open a terminal window and navigate to the directory where you have saved splash.xpm
9)Now key in gzip splash.xpm
10)You will find that a file named splash.xpm.gz is created in the directory where splash.xpm used to exist.
11)Copy this splash.xpm.gz to the /boot/grub directory. You may want to back up the pre-existing splash.xpm.gz file in the /boot/grub directory first.

That's it! When you reboot, you will find your image in the background, with the menu of operating systems etc. in the foreground.

Thursday, December 23, 2010

Linux File System Hierarchy Standard

The `/boot/` Directory

The /boot/ directory contains static files required to boot the system, such as the Linux kernel. These files are essential for the system to boot properly

The `/dev/` Directory

The /dev/ directory contains file system entries which represent devices that are attached to the system. These files are essential for the system to function properly.

The `/etc/` Directory

The /etc/ directory is reserved for configuration files that are local to the machine. No binaries are to be placed in /etc/. Any binaries that were once located in /etc/ should be placed into /sbin/ or /bin/.

The X11/ and skel/ directories are subdirectories of the /etc/ directory:

/etc
|- X11/
|- skel/

The /etc/X11/ directory is for X Window System configuration files, such as xorg.conf. The /etc/skel/ directory is for "skeleton" user files, which are used to populate a home directory when a user is first created.

The `/lib/` Directory

The /lib/ directory should contain only those libraries needed to execute the binaries in /bin/ and /sbin/. These shared library images are particularly important for booting the system and executing commands within the root file system.

The `/media/` Directory

The /media/ directory contains subdirectories used as mount points for removeable media, such as 3.5 diskettes, CD-ROMs, and Zip disks.

The `/mnt/` Directory

The /mnt/ directory is reserved for temporarily mounted file systems, such as NFS file system mounts. For all removeable media, use the /media/ directory

The `/opt/` Directory

The /opt/ directory provides storage for large, static application software packages.

A package placing files in the /opt/ directory creates a directory bearing the same name as the package. This directory, in turn, holds files that otherwise would be scattered throughout the file system, giving the system administrator an easy way to determine the role of each file within a particular package.

For example, if sample is the name of a particular software package located within the /opt/ directory, then all of its files are placed in directories inside the /opt/sample/ directory, such as /opt/sample/bin/ for binaries and /opt/sample/man/ for manual pages.

Large packages that encompass many different sub-packages, each of which accomplish a particular task, are also located in the /opt/ directory, giving that large package a way to organize itself. In this way, our sample package may have different tools that each go in their own sub-directories, such as /opt/sample/tool1/ and /opt/sample/tool2/, each of which can have their own bin/, man/, and other similar directories.

The `/proc/` Directory

The /proc/ directory contains special files that either extract information from or send information to the kernel.

Due to the great variety of data available within /proc/ and the many ways this directory can be used to communicate with the kernel.

The `/sbin/` Directory

The /sbin/ directory stores executables used by the root user. The executables in /sbin/ are only used at boot time and perform system recovery operations. Of this directory, the FHS says:

/sbin contains binaries essential for booting, restoring, recovering, and/or repairing the system in addition to the binaries in /bin. Programs executed after /usr/ is known to be mounted (when there are no problems) are generally placed into /usr/sbin. Locally-installed system administration programs should be placed into /usr/local/sbin.

At a minimum, the following programs should be in /sbin/:

arp, clock,halt,
init, fsck.*, grub,
ifconfig, mingetty, mkfs.*,
mkswap, reboot, route,
shutdown, swapoff, swapon

The `/srv/` Directory

The /srv/ directory contains site-specific data served by your system running Red Hat Enterprise Linux. This directory gives users the location of data files for a particular service, such as FTP, WWW, or CVS. Data that only pertains to a specific user should go in the /home/ directory.

The `/sys/` Directory

The /sys/ directory utilizes the new sysfs virtual file system specific to the 2.6 kernel. With the increased support for hot plug hardware devices in the 2.6 kernel, the /sys/ directory contains information similarly held in /proc/, but displays a hierarchical view of specific device information in regards to hot plug devices.

To see how certain USB and FireWire devices are actually mounted, refer to the /sbin/hotplug and /sbin/udev man pages

The `/usr/` Directory

The /usr/ directory is for files that can be shared across multiple machines. The /usr/ directory is often on its own partition and is mounted read-only.

The `/usr/local/` Directory

The FHS says:

The /usr/local hierarchy is for use by the system administrator when installing software locally. It needs to be safe from being overwritten when the system software is updated. It may be used for programs and data that are shareable among a group of hosts, but not found in /usr.

The /usr/local/ directory is similar in structure to the /usr/ directory. It has the following subdirectories, which are similar in purpose to those in the /usr/ directory:

/usr/local
      |- bin/
      |- etc/
      |- games/
      |- include/
      |- lib/
      |- libexec/
      |- sbin/
      |- share/
      |- src/

In Red Hat Enterprise Linux, the intended use for the /usr/local/ directory is slightly different from that specified by the FHS. The FHS says that /usr/local/ should be where software that is to remain safe from system software upgrades is stored. Since software upgrades can be performed safely with RPM Package Manager (RPM), it is not necessary to protect files by putting them in /usr/local/. Instead, the /usr/local/ directory is used for software that is local to the machine.

For instance, if the /usr/ directory is mounted as a read-only NFS share from a remote host, it is still possible to install a package or program under the /usr/local/ directory.

The `/var/` Directory

Since the FHS requires Linux to mount /usr/ as read-only, any programs that write log files or need spool/ or lock/ directories should write them to the /var/ directory. The FHS states /var/ is for:

...variable data files. This includes spool directories and files, administrative and logging data, and transient and temporary files.
System log files, such as messages/ and lastlog/, go in the /var/log/ directory. The /var/lib/rpm/ directory contains RPM system databases. Lock files go in the /var/lock/ directory, usually in directories for the program using the file. The /var/spool/ directory has subdirectories for programs in which data files are stored.

Sunday, December 19, 2010

Linux block size

Filesystem block size

The block size specifies size that the filesystem will use to read and write data. Larger block sizes will help improve disk I/O performance when using large files, such as databases. This happens because the disk can read or write data for a longer period of time before having to search for the next block.

On the downside, if you are going to have a lot of smaller files on that filesystem, like the /etc , there the potential for a lot of wasted disk space.

For example, if you set your block size to 4096, or 4K, and you create a file that is 256 bytes in size, it will still consume 4K of space on your harddrive. For one file that may seem trivial, but when your filesystem contains hundreds or thousands of files, this can add up.

Determine the block size on hard disk filesystem for disk quota

by LinuxTitli on May 8, 2006 · 12 comments

When configuring user disk quotas I need to find out the block size on my SCSI hard disk drive. For example if I am using a block size of 1024 then setting block size to 102400 blocks limit my user to 100MB of disk space.

Therefore, it is necessary to determine the correct block size; otherwise, I will end up assigning wrong disk quota limit.

You can use dumpe2fs command, which prints the super block and blocks group information for the filesystem present on device. You need to type dumpe2fs command as the root user:

# tune2fs -l /dev/sdb3 | grep -i 'Block size'

Or

# dumpe2fs /dev/sdb3 | grep -i 'Block size'

Output:

Block size: 4096

Now setting a user quota of 40960 would limit a user to 10MB of disk space.

Please note that dumpe2fs command used to determine the actual size of a block on the filesystem (and not BLOCK SIZE OF FILESYSTEM not harddisk).

SAR Command

The sar command collect, report, or save UNIX / Linux system activity information. It will save selected counters in the operating system to the /var/log/sa/ directory . From the collected data, you get lots of information about your server:

1. CPU utilization
2. Memory paging and its utilization
3. Network I/O, and transfer statistics
4. Process creation activity
5. All block devices activity
6. Interrupts/sec etc.

sar output can be used for identifying server bottlenecks. However, analyzing information provided by sar can be difficult, so use kSar, which can take sar output and plot a nice easy to understand graph over period of time

User Level: Intermediate
Sadc (system activity data collector) is the program that gathers performance data. It pulls its data out of the virtual /proc filesystem, then it saves the data in a file (one per day) named /var/log/sa/saDD where DD is the day of the month.

Two shell scripts from the sysstat package control how the data collector is run. The first script, sa1, controls how often data is collected, while sa2 creates summary reports (one per day) in /var/log/sa/sarDD. Both scripts are run from cron. In the default configuration, data is collected every 10 minutes and summarized just before midnight.

If you suspect a performance problem with a particular program, you can use sadc to collect data on a particular process (with the -x argument), or its children (-X), but you will need to set up a custom script using those flags.

As Dr. Heisenberg showed, the act of measuring something changes it. Any tool that collects performance data has some overall negative impact on system performance, but with sar, the impact seems to be minimal. I ran a test with the sa1 cron job set to gather data every minute (on a server that was not busy) and it didn't cause any serious issues. That may not hold true on a busy system.

Creating reports

If the daily summary reports created by the sa2 script are not enough, you can create your own custom reports using sar. The sar program reads data from the current daily data file unless you specify otherwise. To have sar read a particular data file, use the -f /var/log/sa/saDD option. You can select multiple files by using multiple -f options. Since many of sar's reports are lengthy, you may want to pipe the output to a file.

To create a basic report showing CPU usage and I/O wait time percentage, use sar with no flags. It produces a report similar to this:

01:10:00 PM CPU %user %nice %system %iowait %idle
01:20:00 PM all 7.78 0.00 3.34 20.94 67.94
01:30:00 PM all 0.75 0.00 0.46 1.71 97.08
01:40:00 PM all 0.65 0.00 0.48 1.63 97.23
01:50:00 PM all 0.96 0.00 0.74 2.10 96.19
02:00:00 PM all 0.58 0.00 0.54 1.87 97.01
02:10:00 PM all 0.80 0.00 0.60 1.27 97.33
02:20:01 PM all 0.52 0.00 0.37 1.17 97.94
02:30:00 PM all 0.49 0.00 0.27 1.18 98.06
Average: all 1.85 0.00 0.44 2.56 95.14

If the %idle is near zero, your CPU is overloaded. If the %iowait is large, your disks are overloaded.

To check the kernel's paging performance, use sar -B, which will produce a report similar to this:

11:00:00 AM pgpgin/s pgpgout/s fault/s majflt/s
11:10:00 AM 8.90 34.08 0.00 0.00
11:20:00 AM 2.65 26.63 0.00 0.00
11:30:00 AM 1.91 34.92 0.00 0.00
11:40:01 AM 0.26 36.78 0.00 0.00
11:50:00 AM 0.53 32.94 0.00 0.00
12:00:00 PM 0.17 30.70 0.00 0.00
12:10:00 PM 1.22 27.89 0.00 0.00
12:20:00 PM 4.11 133.48 0.00 0.00
12:30:00 PM 0.41 31.31 0.00 0.00
Average: 130.91 27.04 0.00 0.00

Raw paging numbers may not be of concern, but a high number of major faults (majflt/s) indicate that the system needs more memory. Note that majflt/s is only valid with kernel versions 2.5 and later.

For network statistics, use sar -n DEV. The -n DEV option tells sar to generate a report that shows the number of packets and bytes sent and received for each interface. Here is an abbreviated version of the report:

# sar -r -n DEV -f /var/log/sa/sa19
Linux 2.6.9-42.ELsmp (util102) 12/19/2010

12:00:01 AM IFACE rxpck/s txpck/s rxbyt/s txbyt/s rxcmp/s txcmp/s rxmcst/s
12:10:01 AM lo 59.83 59.83 5067.06 5067.06 0.00 0.00 0.00
12:10:01 AM eth0 1210.52 1062.13 712124.65 498563.98 0.00 0.00 0.00
12:10:01 AM eth1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:01 AM sit0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:20:01 AM lo 52.41 52.41 4546.30 4546.30 0.00 0.00 0.00
12:20:01 AM eth0 1084.95 876.36 677283.00 280620.08 0.00 0.00 0.00
12:20:01 AM eth1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:20:01 AM sit0 0.00 0.00 0.00 0.00 0.00 0.00 0.00

To see network errors, try sar -n EDEV, which shows network failures, you can see following output there is no network errors.

# sar -r -n EDEV -f /var/log/sa/sa19

Linux 2.6.9-42.ELsmp (util102) 12/19/2010

12:00:01 AM IFACE rxerr/s txerr/s coll/s rxdrop/s txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s
12:10:01 AM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:01 AM eth0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:01 AM eth1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:01 AM sit0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:20:01 AM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:20:01 AM eth0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Reports on current activity

Sar can also be used to view what is happening with a specific subsystem, such as networking or I/O, almost in real time. By passing a time interval (in seconds) and a count for the number of reports to produce, you can take an immediate snapshot of a system to find a potential bottleneck.

For example, to see the basic report every second for the next 10 seconds, use sar 1 10. You can run any of the reports this way to see near real-time results.

Benchmarking

Even if you have plenty of horsepower to run your applications, you can use sar to track changes in the workload over time. To do this, save the summary reports (sar only saves seven) to a different directory over a period of a few weeks or a month. This set of reports can serve as a baseline for the normal system workload. Then compare new reports against the baseline to see how the workload is changing over time. You can automate your comparison reports with AWK or your favorite programming language.

In large systems management, benchmarking is important to predict when and how hardware should be upgraded. It also provides ammunition to justify your hardware upgrade requests.

Digging deeper

Most hardware performance problems are related to the disks, memory, or CPU. Perhaps more frequently, application programming errors or poorly designed databases cause serious performance issues.

Whatever the problems, sar and friends can give you a comprehensive view of how things are working and help track down bottlenecks to fix a sluggish system. The examples here just scratch the surface of what sar can do. If you take a look at the man pages, it should be easy to customize a set of reports for your needs.

Saturday, December 18, 2010

The /etc/skel Directory

The /etc/skel directory contains files and directories that are automatically copied over to a new user's home directory when such user is created by the useradd program.

A home directory, also called a login directory, is the directory on Unix-like operating systems that serves as the repository for a user's personal files, directories and programs, including personal configuration files. It is also the directory that a user is first in after logging into the system. The /etc directory and its subdirectories contain the many important configuration files for the system.

The useradd program is located in the /usr/sbin/ directory, and on most systems it is accessible only by the root (i.e., administrative) user. On some systems this program might be called adduser.

/etc/skel allows a system administrator to create a default home directory for all new users on a computer or network and thus to make certain that all users begin with the same settings or environment.

Several user configuration files are placed in /etc/skel by default when the operating system is installed. Typically they might include .bash_profile, .bashrc, .bash_logout, dircolors, .inputrc and .vimrc. The dots preceding the names of these files indicate that they are hidden files, i.e., files that are not normally visible in order to avoid visual clutter and help reduce the chances of accidental damage.

The contents of /etc/skel can be viewed by using the ls (i.e., list) command with its -a option (which shows all files and directories, including hidden ones), i.e.,

ls -a /etc/skel

The location of /etc/skel can be changed by editing the line that begins with SKEL= in the configuration file /etc/default/useradd. By default this line says SKEL=/etc/skel.

It is usually better to keep /etc/skel as small as possible and put system-wide configuration items into global configuration files such as /etc/profile. This is because the latter makes it much easier to update existing users' files because its settings take effect as soon as the system is turned on and apply to new users and old uses alike.

When a user is removed from the system by an administrator with the userdel command, that user's home directory, including the files and directories that have been copied into it from /etc/skel, remains intact.

The name of the directory skel is derived from the word skeleton, because the files it contains form the basic structure for users' home directories.

Linux KiLL command

Standard Signals

Linux supports the standard signals listed below. Several signal numbers are architecture dependent, as indicated in the "Value" column. (Where three values are given, the first one is usually valid for alpha and sparc, the middle one for i386, ppc and sh, and the last one for mips. A - denotes that a signal is absent on the corresponding architecture.)

First the signals described in the original POSIX.1-1990 standard.

Signal	Value	Action	Comment
SIGHUP
			or death of controlling process
SIGINT	2	Term	Interrupt from keyboard
SIGQUIT	3	Core	Quit from keyboard
SIGILL	4	Core	Illegal Instruction
SIGABRT	6	Core	Abort signal from abort(3)
SIGFPE	8	Core	Floating point exception
SIGKILL	9	Term	Kill signal
SIGSEGV	11	Core	Invalid memory reference
SIGPIPE	13	Term	Broken pipe: write to pipe with no readers
SIGALRM	14	Term	Timer signal from alarm(2)
SIGTERM	15	Term	Termination signal
SIGUSR1	30,10,16	Term	User-defined signal 1
SIGUSR2	31,12,17	Term	User-defined signal 2
SIGCHLD	20,17,18	Ign	Child stopped or terminated
SIGCONT	19,18,25	Cont	Continue if stopped
SIGSTOP	17,19,23	Stop	Stop process
SIGTSTP	18,20,24	Stop	Stop typed at tty
SIGTTIN	21,21,26	Stop	tty input for background process
SIGTTOU	22,22,27	Stop	tty output for background process

Thus, if it is desired to terminate a process with a PID of 485, the following will usually be sufficient:

kill 485

The kill command has a misleading name because it does not actually kill processes. Rather, it sends signals to them. Each process is supplied with a set of standard signal handlers by the operating system in order to deal with incoming signals. When no signal is explicitly included in the command, signal 15, named SIGTERM, is sent by default. If this fails, the stronger signal 9, called SIGKILL, should be used. For example, the following command would nearly guarantee that process 485 would be killed:

kill -9 485

The only situation in which signal 9 will fail is if the process is in the midst of making a system call, which is a request to the kernel (i.e., the core of the operating system) for some action such as process creation. In this situation, the process will die once it has returned from the system call.

There are more than 60 signals that can be used with kill, but, fortunately, most users will only need to be aware of signal 9. The full list is contained in the file /usr/include/linux/signal.h and can be viewed by using kill with its -l option, i.e.,

kill -l
SIGTERM vs. SIGKILL
Sending signals to processes using kill on a Unix system is not a new topic for most systems administrators, but I've been asked many times about the difference between kill and kill -9. Anytime you use kill on a process, you're actually sending the process a signal (in almost all situations - I'll get into that soon). Standard C applications have a header file that contains the steps that the process should follow if it receives a particular signal. You can get an entire list of the available signals on your system by checking the man page for kill. Consider a command like this: kill 2563 This would send a signal called SIGTERM to the process. Once the process receives the notice, a few different things can happen: * the process may stop immediately * the process may stop after a short delay after cleaning up resources * the process may keep running indefinitely The application can determine what it wants to do once a SIGTERM is received. While most applications will clean up their resources and stop, some may not. An application may be configured to do something completely different when a SIGTERM is received. Also, if the application is in a bad state, such as waiting for disk I/O, it may not be able to act on the signal that was sent. Most system administrators will usually resort to the more abrupt signal when an application doesn't respond to a SIGTERM: kill -9 2563 The -9 tells the kill command that you want to send signal #9, which is called SIGKILL. With a name like that, it's obvious that this signal carries a little more weight. Although SIGKILL is defined in the same signal header file as SIGTERM, it cannot be ignored by the process. In fact, the process isn't even made aware of the SIGKILL signal since the signal goes straight to the kernel init. At that point, init will stop the process. The process never gets the opportunity to catch the signal and act on it. However, the kernel may not be able to successfully kill the process in some situations. If the process is waiting for network or disk I/O, the kernel won't be able to stop it. Zombie processes and processes caught in an uninterruptible sleep cannot be stopped by the kernel, either. A reboot is required to clear those processes from the system.

Signal Description Signal number on Linux x86^[1]

SIGABRT Process aborted
6

SIGALRM Signal raised by alarm 14

SIGBUS Bus error: "access to undefined portion of memory object" 7

SIGCHLD Child process terminated, stopped (or continued*) 17

SIGCONT Continue if stopped 18

SIGFPE Floating point exception: "erroneous arithmetic operation" 8

SIGHUP Hangup 1

SIGILL Illegal instruction 4

SIGINT Interrupt 2

SIGKILL Kill (terminate immediately) 9

SIGPIPE Write to pipe with no one reading 13

SIGQUIT Quit and dump core 3

SIGSEGV Segmentation violation 11

SIGSTOP Stop executing temporarily 19

SIGTERM Termination (request to terminate) 15

SIGTSTP Terminal stop signal 20

SIGTTIN Background process attempting to read from tty ("in") 21

SIGTTOU Background process attempting to write to tty ("out") 22

SIGUSR1 User-defined 1 10

SIGUSR2 User-defined 2 12

SIGPOLL Pollable event 29

SIGPROF Profiling timer expired 27

SIGSYS Bad syscall 31

SIGTRAP Trace/breakpoint trap 5

SIGURG Urgent data available on socket 23

SIGVTALRM Signal raised by timer counting virtual time: "virtual timer expired" 26

SIGXCPU CPU time limit exceeded 24

SIGXFSZ File size limit exceeded 25

Signal	Description	Signal number on Linux x86^[1]
SIGABRT	Process aborted	6
SIGALRM	Signal raised by `alarm`	14
SIGBUS	Bus error: "access to undefined portion of memory object"	7
SIGCHLD	Child process terminated, stopped (or continued*)	17
SIGCONT	Continue if stopped	18
SIGFPE	Floating point exception: "erroneous arithmetic operation"	8
SIGHUP	Hangup	1
SIGILL	Illegal instruction	4
SIGINT	Interrupt	2
SIGKILL	Kill (terminate immediately)	9
SIGPIPE	Write to pipe with no one reading	13
SIGQUIT	Quit and dump core	3
SIGSEGV	Segmentation violation	11
SIGSTOP	Stop executing temporarily	19
SIGTERM	Termination (request to terminate)	15
SIGTSTP	Terminal stop signal	20
SIGTTIN	Background process attempting to read from tty ("in")	21
SIGTTOU	Background process attempting to write to tty ("out")	22
SIGUSR1	User-defined 1	10
SIGUSR2	User-defined 2	12
SIGPOLL	Pollable event	29
SIGPROF	Profiling timer expired	27
SIGSYS	Bad syscall	31
SIGTRAP	Trace/breakpoint trap	5
SIGURG	Urgent data available on socket	23
SIGVTALRM	Signal raised by timer counting virtual time: "virtual timer expired"	26
SIGXCPU	CPU time limit exceeded	24
SIGXFSZ	File size limit exceeded	25

Linux initial RAM disk (initrd) overview

What's an initial RAM disk?

The initial RAM disk (initrd) is an initial root file system that is mounted prior to when the real root file system is available. The initrd is bound to the kernel and loaded as part of the kernel boot procedure. The kernel then mounts this initrd as part of the two-stage boot process to load the modules to make the real file systems available and get at the real root file system.

The initrd contains a minimal set of directories and executables to achieve this, such as the insmod tool to install kernel modules into the kernel.

In the case of desktop or server Linux systems, the initrd is a transient file system. Its lifetime is short, only serving as a bridge to the real root file system. In embedded systems with no mutable storage, the initrd is the permanent root file system. This article explores both of these contexts.

Anatomy of the initrd

The initrd image contains the necessary executables and system files to support the second-stage boot of a Linux system.

Depending on which version of Linux you're running, the method for creating the initial RAM disk can vary. Prior to Fedora Core 3, the initrd is constructed using the loop device. The loop device is a device driver that allows you to mount a file as a block device and then interpret the file system it represents. The loop device may not be present in your kernel,but you can enable it through the kernel's configuration tool (make menuconfig) by selecting Device Drivers > Block Devices > Loopback Device Support. You can inspect the loop device as follows (your initrd file name will vary):

Listing 1. Inspecting the initrd (prior to FC3)

# mkdir temp ; cd temp

# cp /boot/initrd.img.gz .

# gunzip initrd.img.gz

# mount -t ext -o loop initrd.img /mnt/initrd

# ls -la /mnt/initrd

You can now inspect the /mnt/initrd subdirectory for the contents of the initrd. Note that even if your initrd image file

does not end with the .gz suffix, it's a compressed file, and you can add the .gz suffix to gunzip it.

Beginning with Fedora Core 3, the default initrd image is a compressed cpio archive file. Instead of mounting the file as

a compressed image using the loop device, you can use a cpio archive. To inspect the contents of a cpio archive, use the

following commands:

Listing 2. Inspecting the initrd (FC3 and later)

# mkdir temp ; cd temp

# cp /boot/initrd-2.6.14.2.img initrd-2.6.14.2.img.gz

# gunzip initrd-2.6.14.2.img.gz

# cpio -i --make-directories <>

The result is a small root file system, as shown in Listing 3. The small, but necessary, set of applications are present in the ./bin directory, including nash (not a shell, a script interpreter), insmod for loading kernel modules, and lvm (logical volume manager tools).

Listing 3. Default Linux initrd directory structure

# ls -la

drwxr-xr-x 10 root root 4096 May 7 02:48 .

drwxr-x--- 15 root root 4096 May 7 00:54 ..

drwxr-xr-x 2 root root 4096 May 7 02:48 bin

drwxr-xr-x 2 root root 4096 May 7 02:48 dev

drwxr-xr-x 4 root root 4096 May 7 02:48 etc

-rwxr-xr-x 1 root root 812 May 7 02:48 init

-rw-r--r-- 1 root root 1723392 May 7 02:45 initrd-2.6.14.2.img

drwxr-xr-x 2 root root 4096 May 7 02:48 lib

drwxr-xr-x 2 root root 4096 May 7 02:48 loopfs

drwxr-xr-x 2 root root 4096 May 7 02:48 proc

lrwxrwxrwx 1 root root 3 May 7 02:48 sbin -> bin

drwxr-xr-x 2 root root 4096 May 7 02:48 sys

drwxr-xr-x 2 root root 4096 May 7 02:48 sysroot

Of interest in Listing 3 is the init file at the root. This file, like the traditional Linux boot process, is invoked when the initrd image is decompressed into the RAM disk.

The cpio command

Using the cpio command, you can manipulate cpio files. Cpio is also a file format that is simply a concatenation of files with headers. The cpio file format permits both ASCII and binary files. For portability, use ASCII. For a reduced file size, use the binary version.

Tools for creating an initrd

Let's now go back to the beginning to formally understand how the initrd image is constructed in the first place. For a traditional Linux system, the initrd image is created during the Linux build process. Numerous tools, such as mkinitrd, can be used to automatically build an initrd with the necessary libraries and modules for bridging to the real root file system. The mkinitrd utility is actually a shell script, so you can see exactly how it achieves its result. There's also the YAIRD (Yet Another Mkinitrd) utility, which permits customization of every aspect of the initrd construction.

Manually building a custom initial RAM disk

Because there is no hard drive in many embedded systems based on Linux, the initrd also serves as the permanent root file system. Listing 4 shows how to create an initrd image. I'm using a standard Linux desktop so you can follow along without an embedded target. Other than cross-compilation, the concepts (as they apply to initrd construction) are the same for an embedded target.

Listing 4. Utility (mkird) to create a custom initrd

#!/bin/bash

# Housekeeping...

rm -f /tmp/ramdisk.img

rm -f /tmp/ramdisk.img.gz

# Ramdisk Constants

RDSIZE=4000

BLKSIZE=1024

# Create an empty ramdisk image

dd if=/dev/zero of=/tmp/ramdisk.img bs=$BLKSIZE count=$RDSIZE

# Make it an ext2 mountable file system

/sbin/mke2fs -F -m 0 -b $BLKSIZE /tmp/ramdisk.img $RDSIZE

# Mount it so that we can populate

mount /tmp/ramdisk.img /mnt/initrd -t ext2 -o loop=/dev/loop0

# Populate the filesystem (subdirectories)

mkdir /mnt/initrd/bin

mkdir /mnt/initrd/sys

mkdir /mnt/initrd/dev

mkdir /mnt/initrd/proc

# Grab busybox and create the symbolic links

pushd /mnt/initrd/bin

cp /usr/local/src/busybox-1.1.1/busybox .

ln -s busybox ash

ln -s busybox mount

ln -s busybox echo

ln -s busybox ls

ln -s busybox cat

ln -s busybox ps

ln -s busybox dmesg

ln -s busybox sysctl

popd

# Grab the necessary dev files

cp -a /dev/console /mnt/initrd/dev

cp -a /dev/ramdisk /mnt/initrd/dev

cp -a /dev/ram0 /mnt/initrd/dev

cp -a /dev/null /mnt/initrd/dev

cp -a /dev/tty1 /mnt/initrd/dev

cp -a /dev/tty2 /mnt/initrd/dev

# Equate sbin with bin

pushd /mnt/initrd

ln -s bin sbin

popd

# Create the init file

cat >> /mnt/initrd/linuxrc <<>

#!/bin/ash

echo

echo "Simple initrd is active"

echo

mount -t proc /proc /proc

mount -t sysfs none /sys

/bin/ash --login

EOF

chmod +x /mnt/initrd/linuxrc

# Finish up...

umount /mnt/initrd

gzip -9 /tmp/ramdisk.img

cp /tmp/ramdisk.img.gz /boot/ramdisk.img.gz

To create an initrd, begin by creating an empty file, using /dev/zero (a stream of zeroes) as input writing to the ramdisk.img file. The resulting file is 4MB in size (4000 1K blocks). Then use the mke2fs command to create an ext2 (second extended) file system using the empty file. Now that this file is an ext2 file system, mount the file to /mnt/initrd using the loop device. At the mount point, you now have a directory that represents an ext2 file system that you can populate for your initrd. Much of the rest of the script provides this functionality.

The next step is creating the necessary subdirectories that make up your root file system: /bin, /sys, /dev, and /proc. Only a handful are needed (for example, no libraries are present), but they contain quite a bit of functionality.

To make your root file system useful, use BusyBox. This utility is a single image that contains many individual utilities commonly found in Linux systems (such as ash, awk, sed, insmod, and so on). The advantage of BusyBox is that it packs many utilities into one while sharing their common elements, resulting in a much smaller image. This is ideal for embedded systems. Copy the BusyBox image from its source directory into your root in the /bin directory. A number of symbolic links are then created that all point to the BusyBox utility. BusyBox figures out which utility was invoked and performs that functionality. A small set of links are created in this directory to support your init script (with each command link pointing to BusyBox).

The next step is the creation of a small number of special device files. I copy these directly from my current /dev subdirectory, using the -a option (archive) to preserve their attributes.

The penultimate step is to generate the linuxrc file. After the kernel mounts the RAM disk, it searches for an init file to execute. If an init file is not found, the kernel invokes the linuxrc file as its startup script. You do the basic setup of the environment in this file, such as mounting the /proc file system. In addition to /proc, I also mount the /sys file system and emit a message to the console. Finally, I invoke ash (a Bourne Shell clone) so I can interact with the root file system. The linuxrc file is then made executable using chmod.

Finally, your root file system is complete. It's unmounted and then compressed using gzip. The resulting file (ramdisk.img.gz) is copied to the /boot subdirectory so it can be loaded via GNU GRUB.

To build the initial RAM disk, you simply invoke mkird, and the image is automatically created and copied to /boot.

Testing the custom initial RAM disk

Your new initrd image is in /boot, so the next step is to test it with your default kernel. You can now restart your Linux system. When GRUB appears, press the C key to enable the command-line utility within GRUB. You can now interact with GRUB to define the specific kernel and initrd image to load. The kernel command allows you to define the kernel file, and the initrd command allows you to specify the particular initrd image file. When these are defined, use the boot command to boot the kernel, as shown in Listing 5.

Listing 5. Manually booting the kernel and initrd using GRUB

GNU GRUB version 0.95 (638K lower / 97216K upper memory)

[ Minimal BASH-like line editing is supported. For the first word, TAB

lists possible command completions. Anywhere else TAB lists the possible

completions of a device/filename. ESC at any time exits.]

grub> kernel /bzImage-2.6.1

[Linux-bzImage, setup=0x1400, size=0x29672e]

grub> initrd /ramdisk.img.gz

[Linux-initrd @ 0x5f2a000, 0xb5108 bytes]

grub> boot

Uncompressing Linux... OK, booting the kernel.

After the kernel starts, it checks to see if an initrd image is available (more on this later), and then loads and mounts it as the root file system. You can see the end of this particular Linux startup in Listing 6. When started, the ash shell is available to enter commands. In this example, I explore the root file system and interrogate a virtual proc file system entry. I also demonstrate that you can write to the file system by touching a file (thus creating it). Note here that the first process created is linuxrc (commonly init).

Listing 6. Booting a Linux kernel with your simple initrd

...

md: Autodetecting RAID arrays

md: autorun

md: ... autorun DONE.

RAMDISK: Compressed image found at block 0

VFS: Mounted root (ext2 file system).

Freeing unused kernel memory: 208k freed

/ $ ls

bin etc linuxrc proc sys

dev lib lost+found sbin

/ $ cat /proc/1/cmdline

/bin/ash/linuxrc

/ $ cd bin

/bin $ ls

ash cat echo mount sysctl

busybox dmesg ls ps

/bin $ touch zfile

/bin $ ls

ash cat echo mount sysctl

busybox dmesg ls ps zfile

Booting with an initial RAM disk

Now that you've seen how to build and use a custom initial RAM disk, this section explores how the kernel identifies and mounts the initrd as its root file system. I walk through some of the major functions in the boot chain and explain what's happening.

The boot loader, such as GRUB, identifies the kernel that is to be loaded and copies this kernel image and any associated initrd into memory. You can find much of this functionality in the ./init subdirectory under your Linux kernel source directory.

After the kernel and initrd images are decompressed and copied into memory, the kernel is invoked. Various initialization is performed and, eventually, you find yourself in init/main.c:init() (subdir/file:function). This function performs a large amount of subsystem initialization. A call is made here to init/do_mounts.c:prepare_namespace(), which is used to prepare the namespace (mount the dev file system, RAID, or md, devices, and, finally, the initrd). Loading the initrd is done through a call to init/do_mounts_initrd.c:initrd_load().

The initrd_load() function calls init/do_mounts_rd.c:rd_load_image(), which determines the RAM disk image to load through a call to init/do_mounts_rd.c:identify_ramdisk_image(). This function checks the magic number of the image to determine if it's a minux, etc2, romfs, cramfs, or gzip format. Upon return to initrd_load_image, a call is made to init/do_mounts_rd:crd_load(). This function allocates space for the RAM disk, calculates the cyclic redundancy check (CRC), and then uncompresses and loads the RAM disk image into memory. At this point, you have the initrd image in a block device suitable for mounting.

Mounting the block device now as root begins with a call to init/do_mounts.c:mount_root(). The root device is created, and then a call is made to init/do_mounts.c:mount_block_root(). From here, init/do_mounts.c:do_mount_root() is called, which calls fs/namespace.c:sys_mount() to actually mount the root file system and then chdir to it. This is where you see the familiar message shown in Listing 6: VFS: Mounted root (ext2 file system).

Finally, you return to the init function and call init/main.c:run_init_process. This results in a call to execve to start the init process (in this case /linuxrc). The linuxrc can be an executable or a script (as long as a script interpreter is available for it).

The hierarchy of functions called is shown in Listing 7. Not all functions that are involved in copying and mounting the initial RAM disk are shown here, but this gives you a rough overview of the overall flow.

Listing 7. Hierarchy of major functions in initrd loading and mounting

init/main.c:init

init/do_mounts.c:prepare_namespace

init/do_mounts_initrd.c:initrd_load

init/do_mounts_rd.c:rd_load_image

init/do_mounts_rd.c:identify_ramdisk_image

init/do_mounts_rd.c:crd_load

lib/inflate.c:gunzip

init/do_mounts.c:mount_root

init/do_mounts.c:mount_block_root

init/do_mounts.c:do_mount_root

fs/namespace.c:sys_mount

init/main.c:run_init_process

execve

Diskless Boot

Much like embedded booting scenarios, a local disk (floppy or CD-ROM) isn't necessary to boot a kernel and ramdisk root filesystem. The Dynamic Host Configuration Protocol (or DHCP) can be used to identify network parameters such as IP address and subnet mask. The Trivial File Transfer Protocol (or TFTP) can then be used to transfer the kernel image and the initial ramdisk image to the local device. Once transferred, the Linux kernel can be booted and initrd mounted, as is done in a local image boot.

Shrinking your initrd

When you're building an embedded system and want the smallest initrd image possible, there are a few tips to consider. The first is to use BusyBox (demonstrated in this article). BusyBox takes several megabytes of utilities and shrinks them down to several hundred kilobytes.

In this example, the BusyBox image is statically linked so that no libraries are required. However, if you need the standard C library (for your custom binaries), there are other options beyond the massive glibc. The first small library is uClibc, which is a minimized version of the standard C library for space-constrained systems. Another library that's ideal for space-constrained environments is dietlib. Keep in mind that you'll need to recompile the binaries that you want in your embedded system using these libraries, so some additional work is required (but worth it).

Reinstalling the Boot Loader

GRUB

The GNU GRand Unified Boot loader (GRUB) is a program which enables the selection of the installed operating system or kernel to be loaded at system boot time. It also allows the user to pass arguments to the kernel.

In many cases, the GRUB boot loader can mistakenly be deleted, corrupted, or replaced by other operating systems and cursor blink when system booting.

The following steps detail the process on how GRUB is reinstalled on the master boot record:

Boot the system from an installation boot medium.
Type linux rescue at the installation boot prompt to enter the rescue environment.
Type chroot /mnt/sysimage to mount the root partition.
Type /sbin/grub-install /dev/hda to reinstall the GRUB boot loader, where /dev/hda is the boot partition.
Review the /boot/grub/grub.conf file, as additional entries may be needed for GRUB to control additional operating systems.
Reboot the system.

RHEL Boot Sequence

Very quick review of how the Linux boots. Here are the very brief steps.

1) When a PC is booted it starts running a BIOS program which is a memory resident program on an EEPROM integrated circuit. The BIOS program will eventually try to read the first sector on a booting media such as a hard or floppy drive. The boot sector contains a small program that the BIOS will load and attempt to pass run control to. This program will attempt to read the operating system from the disk and run it.
2) The small program containing in the boot sector that BIOS will load and attempt to pass control to is called bootloader. This is a small program residing in the 1st sector of primary partition. Primary partition by default is always /boot. This directory will have all files required for bootup.
The boot loader program is present in 2 stages in Linux.
Stage 1: Small stage and resides in MBR (Master boot record) or boot sector. This is the once we were taking above – A small program residing in the 1st sector of primary partition /boot.
Stage 2: This is the complete bootup program present in /boot partiton and is called from first stage.
So 1st stage is present in MBR and just called for stage 2.
This MBR sector is 512 bytes size. The first 446 bytes contains the GRUB program (1st stage).
3) GRUB program reads the configuration file /boot/grub/grub.conf during boot time. Since this file is present in /boot partition, this partition should be present in file system that GRUB will understand, because it has to read the file before OS starts. So /boot mount point should be formated with either ext2 or ext3.
We will discuss about GRUB file shortly after finishing this brief booting sequence steps.
So once control comes to GRUB program it executes and you will see a red spashing windown comes up which gives the countdown in seconds like
Booting up Redhat Enterprise Linux in 5..4..3.. Secs
Now before the conuntdown ends, you can press either space or enter. Once you press that, you will see a table on a screen with list of operating systems present in that server. Example if you have installed Windows initially and then you installed Linux this table will show both OS in the option.
But to get this list of OS you need to enter space or enter, else it will bootup the default operating system which will be Linux.
This showing up of operating system list can be controlled by parameter hiddenmenu present in GRUB config file. We will see this in a short while. Also you can control the time for which it should countdown (timeout parameter in GRUB file).
GRUB will load the specific kernel into the RAM (which kernel to load is passed to GRUB in its script) and uncompress the kernel program in RAM. Once it uncompresses, the control is taken over by kernel and job of GRUB script ends here.
4) The kernel initialization files generates output which may not be possible to see on screen as it scrolls quickly but can be seen in log message file /var/log/dmesg which contains the snapsot of these kernel messages taken just after control is passed to init.
Many packages and device drivers present in kernel program are called. Device drivers will check all there respective hardware devices if they are available. If successful in locating devices, the driver will initialize and usually log output to kernel message buffer.
kernel of Linux is made light weight and hence will load only the required module and packages. But if some of the modules also needs to be loaded along with the kernel, its not a good idea to make it a part of kernel. Instead in redhat those additional modules are included in initrd file, which is then temporarily mounted by kernel on a RAM disk to make modules available for initialization process. This file initrd is password as one of the arguement in GRUB file for loading the kernel. We will see this arguement when we see GRUB file.
After all essential drivers are loaded, kernel will mount the root filesystem in read-only mode so that no process while booting should make any changes to any file on disk.
The first process is then created and loaded and control is passed from kernel to that process. This first process that gets created is called “init”. This process is having PID=1 and its the initialization process. This will intialize the system.
5) Init process created above will read its configuration file /etc/inittab. This file stores the information like initial run level, system initialization script, run level specific scripts, trap certain key sequences etc.
We will discuss about this file in my next post when I will explain run levels.
The final file that is run in boot sequence is /etc/rc.d/rc.local
So if we have to make any customization or call a custom script, we can call from this file. Example if we want to start a database when our server boots up we can add a script in /etc/rc.d/rc.local to start a database.
Having a brief idea about boot sequence for Linux, lets see the content of GRUB config file
/boot/grub/grub.conf
The GRUB file in RHEL is present in /boot/grub/grub.conf
Following is the content of this file
# cat /boot/grub/grub.conf
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE: You have a /boot partition. This means that
# all kernel and initrd paths are relative to /boot/, eg.
# root (hd0,0)
# kernel /vmlinuz-version ro root=/dev/sda2
# initrd /initrd-version.img
#boot=/dev/sda
default=0
timeout=5
splashimage=(sd0,0)/grub/splash.xpm.gz
hiddenmenu
title Enterprise (2.6.9-55.0.0.0.2.ELhugemem)
root (sd0,0)
kernel /vmlinuz-2.6.9-55.0.0.0.2.ELhugemem ro root=LABEL=/ rhgb quiet
initrd /initrd-2.6.9-55.0.0.0.2.ELhugemem.img
title Enterprise-smp (2.6.9-55.0.0.0.2.ELsmp)
root (sd0,0)
kernel /vmlinuz-2.6.9-55.0.0.0.2.ELsmp ro root=LABEL=/ rhgb quiet
initrd /initrd-2.6.9-55.0.0.0.2.ELsmp.img
title Enterprise-up (2.6.9-55.0.0.0.2.EL)
root (sd0,0)
kernel /vmlinuz-2.6.9-55.0.0.0.2.EL ro root=LABEL=/ rhgb quiet
initrd /initrd-2.6.9-55.0.0.0.2.EL.img
lets understand meaning of each parameter here.
default -> This gives the default operating system to be booted in case the user is not giving any choice. Example if there are 2 OS installed on a server (lets say Linux and Windows, linux with number 0 and windows with number 1) then default=0 will boot Linux by default. The number is decided by the sequence in which these OS are listed in grub.conf above.
If we see the lines starting with “title”, these are the list of OS installed on server and this list (only the title) will be displayed at the time of booting when control goes from BIOS to GRUB. Since its present in grub.conf 1st stage of bootloader program will display this OS list.
In above file default=0 will pick “Enterprise (2.6.9-55.0.0.0.2.ELhugemem)” OS by default.
timeout -> This gives the time for which countdown will continue or time for which the list of OS should be displayed before taking default option. Example in out case timeout=5 means it will show the option for 5 seconds.
splashimage -> This option gives a red spashing window of redhat linux while showing the list of OS or while booting. If you remove this file it will show a black window.
hiddenmenu -> This options will hide the list of OS installed on server. It will only show the countdown. If you press enter or space (as explained previously in point 3 of boot sequence) then it will show the list of OS. If you remove this option by default while booting it will show the list of OS installed.
title -> As explained just now these are the title to be shown during booting from where you can select. You can change title to anything that you want. Example “My OS”. This will show “My OS” as one of the option during booting.
root (sd0,0) -> This tells us that the boot loader program is present in 1st disk (sd0 indicate 1st hard disk) and 1st partition (,0 indicates 1st partition). So here we are telling where exactly is the bootloader program.
kernel /vmlinuz-2.6.9-55.0.0.0.2.ELhugemem ro root=LABEL=/ rhgb quiet
kernel is a parameter, /vmlinuz-2.6.9-55.0.0.0.2.ELhugemem is a value -> This is the name of kernel file to be used.
ro -> opens the filesystem in readonly mode during booting. If we remove this arguement it will open the filesystem in read write mode.
rhgb -> This is redhar graphical boot. This parameters gives a graphical progress bar while booting. If we remove this then it will give a traditional black window with many [ OK ] messages.
root=LABEL=/ -> This gives the location of root directory where all the installation has happened. this is usaully /. We can also give device name directly example in my case the device name is /dev/sda2. So we can give root=/dev/sda2
This will also work.
quiet -> This will hide details while booting up and will show only few message ehgb mode or in traditional mode. Only few [ OK ] messages will be displayed corresponding to services that are getting started. Else if we remove this parameter many more detailed messages will be displayed.
initrd /initrd-2.6.9-55.0.0.0.2.ELsmp.img -> This gives the extended modules which we want to load while booting. As I explained that not all modules are part of kernel, in order to keep it light. So if additional extended modules needs to be installed then we need to give this file as input.

Wednesday, December 15, 2010

Apache web server load balancing using Pound

Pound is a reverse-proxy load balancing server. It accepts requests from HTTP / HTTPS clients and distributes them to one or more Web servers. The HTTPS requests are decrypted and passed to the back-ends as plain HTTP. It will act as:
a) Server load balancer
b) Reverse proxy server
c) Apache reverse proxy etc
d) It can detects when a backend server fails or recovers, and bases its load balancing decisions on this information: if a backend server fails, it will not receive requests until it recovers
e) It can decrypts https requests to http ones
f) Rejects incorrect requests
h) It can be used in a chroot environment (security feature)

If more than one back-end server is defined, Pound chooses one of them randomly, based on defined priorities. By default, Pound keeps track of associations between clients and back-end servers (sessions).

Install Pound Software

If you are using RHEL / CentOS, grab pound rpm here and type the command:
# rpm -ivh pound*
If you are using FreeBSD, enter:
# cd /usr/ports/www/pound/ && make install clean

How it works?

Let us assume your public IP address 202.54.1.5
Pound will run on 202.54.1.5 port 80
It will forward all incoming http requests to internal host 192.168.1.5 and 192.168.1.10 port 80 or 443
Pound keeps track of associations between clients and back-end servers

Pound Configuration

Forward all incoming request at 202.54.1.5 port 80 request to 192.168.1.5 Apache server running at 8080 port:
Open /etc/pound/pound.cfg file:

# vi /etc/pound/pound.cfg

Following example will distribute the all HTTP/HTTPS requests to two Web servers:

ListenHTTP
       Address 202.54.1.5
       Port    80
End

ListenHTTPS
      Address 202.54.1.5
      Port    443
      Cert    "/etc/ssl/local.server.pem"
End
Service
               BackEnd
                   Address 192.168.1.5
                   Port    80
                   Priority 6
               End
               BackEnd
                   Address 192.168.1.6
                   Port    8080
               End
End


Save and close the file. Restart pound:

# /etc/init.d/pound restart

Pound log file

By default pound log message using syslog:
# tail -f /var/log/messages # grep pound /var/log/messages

Sunday, December 12, 2010

Linux Bonding

Linux allows binding multiple network interfaces into a single channel/NIC

using special kernel module called bonding The Linux bonding driver provides

a method for aggregating multiple network interfaces into a single logical

"bonded" interface

Production Linux Server Configuration

Step #1: Create a bond0 configuration file

[root@linux703 ~]# cat /etc/sysconfig/network-

scripts/ifcfg-bond0
DEVICE=bond0
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
IPADDR=10.20.30.20
NETMASK=255.255.255.0
NETWORK=10.20.30.0
BROADCAST=10.20.30.255
GATEWAY=10.20.30.254
[root@linux703 ~]#

Step #2: Modify eth0 and eth1 config files:

[root@linux703 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
TYPE=Ethernet

[root@linux703 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
TYPE=Ethernet

Step # 3: Load bond driver/module

[root@linux703 ~]# cat /etc/modprobe.conf
alias eth0 bnx2
alias eth1 bnx2
alias scsi_hostadapter cciss
alias usb-controller uhci-hcd
alias usb-controller1 ehci-hcd
alias bond0 bonding
options bond0 mode=802.3ad miimon=100
[root@linux703 ~]#

Step # 4: Test configuration

First, load the bonding module:
# modprobe bondingRestart networking service in order to bring up bond0 interface:


#  service network restart

Verify everything is working:
# less /proc/net/bonding/bond0

All About Bonding in detailed

What is bonding?
Bonding is the same as port trunking. In the following I will use the word bonding because practically we will bond interfaces as one.
But still...what is bonding?
Bonding allows you to aggregate multiple ports into a single group, effectively combining the bandwidth into a single connection. Bonding also allows you to create multi-gigabit pipes to transport traffic through the highest traffic areas of your network. For example, you can aggregate three megabits ports (1 mb each) into a three-megabits trunk port. That is equivalent with having one interface with three megabits speed.
Where should I use bonding?
You can use it wherever you need redundant links, fault tolerance or load balancing networks. It is the best way to have a high availability network segment. A very useful way to use bonding is to use it in connection with 802.1q VLAN support (your network equipment must have 802.1q protocol implemented).

This small howto will try to cover the most used bonding types. The following script (the gray area) will configure a bond interface (bond0) using two ethernet interface (eth0 and eth1). You can place it onto your on file and run it at boot time..

#!/bin/bash

modprobe bonding mode=0 miimon=100 # load bonding module

ifconfig eth0 down # putting down the eth0 interface
ifconfig eth1 down # putting down the eth1 interface

ifconfig bond0 hw ether 00:11:22:33:44:55 # changing the MAC address of the bond0 interface
ifconfig bond0 192.168.55.55 up # to set ethX interfaces as slave the bond0 must have an ip.

ifenslave bond0 eth0 # putting the eth0 interface in the slave mod for bond0
ifenslave bond0 eth1 # putting the eth1 interface in the slave mod for bond0

You can set up your bond interface according to your needs. Changing one parameters (mode=X) you can have the following bonding types:
mode=0 (balance-rr)
Round-robin policy: Transmit packets in sequential order from the first available slave through the last. This mode provides load balancing and fault tolerance.

mode=1 (active-backup)
Active-backup policy: Only one slave in the bond is active. A different slave becomes active if, and only if, the active slave fails. The bond's MAC address is externally visible on only one port (network adapter) to avoid confusing the switch. This mode provides fault tolerance. The primary option affects the behavior of this mode.

mode=2 (balance-xor)
XOR policy: Transmit based on [(source MAC address XOR'd with destination MAC address) modulo slave count]. This selects the same slave for each destination MAC address. This mode provides load balancing and fault tolerance.

mode=3 (broadcast)
Broadcast policy: transmits everything on all slave interfaces. This mode provides fault tolerance.

mode=4 (802.3ad)
IEEE 802.3ad Dynamic link aggregation. Creates aggregation groups that share the same speed and duplex settings. Utilizes all slaves in the active aggregator according to the 802.3ad specification.

 Pre-requisites:
 1. Ethtool support in the base drivers for retrieving
 the speed and duplex of each slave.
 2. A switch that supports IEEE 802.3ad Dynamic link
 aggregation.
 Most switches will require some type of configuration
 to enable 802.3ad mode.

mode=5 (balance-tlb)
Adaptive transmit load balancing: channel bonding that does not require any special switch support. The outgoing traffic is distributed according to the current load (computed relative to the speed) on each slave. Incoming traffic is received by the current slave. If the receiving slave fails, another slave takes over the MAC address of the failed receiving slave.

 Prerequisite:
 Ethtool support in the base drivers for retrieving the
 speed of each slave.

mode=6 (balance-alb)
Adaptive load balancing: includes balance-tlb plus receive load balancing (rlb) for IPV4 traffic, and does not require any special switch support. The receive load balancing is achieved by ARP negotiation. The bonding driver intercepts the ARP Replies sent by the local system on their way out and overwrites the source hardware address with the unique hardware address of one of the slaves in the bond such that different peers use different hardware addresses for the server.

The most used are the first four mode types...

Also you can use multiple bond interface but for that you must load the bonding module as many as you need.
Presuming that you want two bond interface you must configure the /etc/modules.conf as follow:

 alias bond0 bonding
 options bond0 -o bond0 mode=0 miimon=100
 alias bond1 bonding
 options bond1 -o bond1 mode=1 miimon=100

Notes:

To restore your slaves MAC addresses, you need to detach them from the bond (`ifenslave -d bond0 eth0'). The bonding driver will then restore the MAC addresses that the slaves had before they were enslaved.
The bond MAC address will be the taken from its first slave device.
Promiscous mode: According to your bond type, when you put the bond interface in the promiscous mode it will propogates the setting to the slave devices as follow:
- for mode=0,2,3 and 4 the promiscuous mode setting is propogated to all slaves.
- for mode=1,5 and 6 the promiscuous mode setting is propogated only to the active slave.
  For balance-tlb mode the active slave is the slave currently receiving inbound traffic, for balance-alb mode the active slave is the slave used as a "primary." and for the active-backup, balance-tlb and balance-alb modes, when the active slave changes (e.g., due to a link failure), the promiscuous setting will be propogated to the new active slave.
Where does a bonding device get its MAC address from?

If not explicitly configured with ifconfig, the MAC address of the bonding device is taken from its first slave device. This MAC address is then passed to all following slaves and remains persistent (even if the the first slave is removed) until the bonding device is brought down or reconfigured.

If you wish to change the MAC address, you can set it with ifconfig:

                    # ifconfig bond0 ha ether 00:11:22:33:44:55

                    The MAC address can be also changed by bringing down/up the device

                    and then changing its slaves (or their order):



                    # ifconfig bond0 down ; modprobe -r bonding

                    # ifconfig bond0 .... up

                    # ifenslave bond0 eth...

This method will automatically take the address from the next slave that will be added.

To restore your slaves' MAC addresses, you need to detach them from the bond (`ifenslave -d bond0 eth0'), set them down (`ifconfig eth0 down'), unload the drivers (`rmmod 3c59x', for example) and reload them to get the MAC addresses from their eeproms. If the driver is shared by several devices, you need to turn them all down. Another solution is to look for the MAC address at boot time (dmesg or tail /var/log/messages) and to reset it by hand with ifconfig :

                    # ifconfig eth0 down

                    # ifconfig eth0 hw ether 00:20:40:60:80:A0