Wednesday, October 17, 2012

Three way Handshake

The three way or 3 way handshake is necessary for reliable TCP communication.
To establish a connection, TCP uses a three-way handshake. Before a client attempts to connect with a server, the server must first bind to a port to open it up for connections: this is called a passive open. Once the passive open is established, a client may initiate an active open. To establish a connection, the three-way (or 3-step) handshake occurs:
1. The active open is performed by sending a SYN to the server.
2. In response, the server replies with a SYN-ACK.
3. Finally the client sends an ACK (usually called SYN-ACK-ACK) back to the server.







Reference:
http://www.cisco.com/web/about/ac123/ac147/archived_issues/ipj_9-4/syn_flooding_attacks.html

 

What are Memory Leaks?


A memory leak, in computer science (or leakage, in this context), occurs when a computer program consumes memory but is unable to release it back to the operating system.

When a program needs to store some temporary information during execution, it can dynamically request a chunk of memory from the system. However, the system has a fixed amount of total memory available. If one application uses up all of the system’s free memory, then other applications will not be able to obtain the memory that they require. The implications of a “memory starved” application can range from a graceful shutdown to an unexpected crash. Most large scale applications regularly request memory, so running out of system memory tends to have a domino effect. Even if the applications do not terminate, the system will slow down to a crawl—or even hang—in low memory conditions. Clearly, none of these results are desirable, so the system never wants to run out—or run low—of memory.

It is the responsibility of each application to “free” dynamically requested memory when they are finished using it. Freeing the memory returns it to the system, where it can be re–allocated to another application when needed. When an application dynamically allocates memory, and does not free that memory when it is finished using it, that program has a memory leak. The memory is not being used by the application anymore, but it cannot be used by the system or any other program either.

Memory leaks add up over time, and if they are not cleaned up, the system eventually runs out of memory. Most everyone has seen the “Your computer is running low of virtual memory” message box on Windows when memory gets too high. It is typically accompanied by horribly slow response time, and often the user can’t even close the wasteful application because of this sluggishness. The only response at that point is to reboot the computer.

Linux: Check For Memory Leaks In Programs

Valgrind :It is memory debugging, memory leak detection, and profiling tool for Linux and Mac OS X operating systems. Valgrind is a flexible program for debugging and profiling Linux executable.
 

How Do I Install Valgrind?

Type the following command under CentOS / Redhat / RHEL Linux:
# yum install valgrind
Type the following command under Debian / Ubuntu Linux:
# apt-get install valgrind

How Do I use Valgrind?

If you normally run your program like this:
./a.out arg1 arg2
OR
/path/to/myapp arg1 arg2
Use this command line to turn on the detailed memory leak detector:
valgrind --leak-check=yes ./a.out arg1 arg2
valgrind --leak-check=yes /path/to/myapp arg1 arg2

You can also set logfile:
valgrind --log-file=output.file --leak-check=yes --tool=memcheck ./a.out arg1 arg2
Most error messages look like the following:
cat output.file

Wednesday, October 10, 2012

Understanding Linux CPU Load 


You might be familiar with Linux load averages already. Load averages are the three numbers shown with the uptime and top commands - they look like this:
load average: 0.09, 0.05, 0.01
Most people have an inkling of what the load averages mean: the three numbers represent averages over progressively longer periods of time (one, five, and fifteen minute averages), and that lower numbers are better. Higher numbers represent a problem or an overloaded machine. But, what's the the threshold? What constitutes "good" and "bad" load average values? When should you be concerned over a load average value, and when should you scramble to fix it ASAP?
First, a little background on what the load average values mean. We'll start out with the simplest case: a machine with one single-core processor.

The traffic analogy

A single-core CPU is like a single lane of traffic. Imagine you are a bridge operator ... sometimes your bridge is so busy there are cars lined up to cross. You want to let folks know how traffic is moving on your bridge. A decent metric would be how many cars are waiting at a particular time. If no cars are waiting, incoming drivers know they can drive across right away. If cars are backed up, drivers know they're in for delays.
So, Bridge Operator, what numbering system are you going to use? How about:
  • 0.00 means there's no traffic on the bridge at all. In fact, between 0.00 and 1.00 means there's no backup, and an arriving car will just go right on.
  • 1.00 means the bridge is exactly at capacity. All is still good, but if traffic gets a little heavier, things are going to slow down.
  • over 1.00 means there's backup. How much? Well, 2.00 means that there are two lanes worth of cars total -- one lane's worth on the bridge, and one lane's worth waiting. 3.00 means there are three lane's worth total -- one lane's worth on the bridge, and two lanes' worth waiting. Etc.
= load of 1.00
= load of 0.50
= load of 1.70


This is basically what CPU load is. "Cars" are processes using a slice of CPU time ("crossing the bridge") or queued up to use the CPU. Unix refers to this as the run-queue length: the sum of the number of processes that are currently running plus the number that are waiting (queued) to run.
Like the bridge operator, you'd like your cars/processes to never be waiting. So, your CPU load should ideally stay below 1.00. Also like the bridge operator, you are still ok if you get some temporary spikes above 1.00 ... but when you're consistently above 1.00, you need to worry.

So you're saying the ideal load is 1.00?

Well, not exactly. The problem with a load of 1.00 is that you have no headroom. In practice, many sysadmins will draw a line at 0.70:
  • The "Need to Look into it" Rule of Thumb: 0.70 If your load average is staying above > 0.70, it's time to investigate before things get worse.
  • The "Fix this now" Rule of Thumb: 1.00. If your load average stays above 1.00, find the problem and fix it now. Otherwise, you're going to get woken up in the middle of the night, and it's not going to be fun.
  • The "Arrgh, it's 3AM WTF?" Rule of Thumb: 5.0. If your load average is above 5.00, you could be in serious trouble, your box is either hanging or slowing way down, and this will (inexplicably) happen in the worst possible time like in the middle of the night or when you're presenting at a conference. Don't let it get there.

What about Multi-processors? My load says 3.00, but things are running fine!

Got a quad-processor system? It's still healthy with a load of 3.00.
On multi-processor system, the load is relative to the number of processor cores available. The "100% utilization" mark is 1.00 on a single-core system, 2.00, on a dual-core, 4.00 on a quad-core, etc.
If we go back to the bridge analogy, the "1.00" really means "one lane's worth of traffic". On a one-lane bridge, that means it's filled up. On a two-late bridge, a load of 1.00 means its at 50% capacity -- only one lane is full, so there's another whole lane that can be filled.
= load of 2.00 on two-lane road
Same with CPUs: a load of 1.00 is 100% CPU utilization on single-core box. On a dual-core box, a load of 2.00 is 100% CPU utilization.

Multicore vs. multiprocessor

While we're on the topic, let's talk about multicore vs. multiprocessor. For performance purposes, is a machine with a single dual-core processor basically equivalent to a machine with two processors with one core each? Yes. Roughly. There are lots of subtleties here concerning amount of cache, frequency of process hand-offs between processors, etc. Despite those finer points, for the purposes of sizing up the CPU load value, the total number of cores is what matters, regardless of how many physical processors those cores are spread across.
Which leads us to a two new Rules of Thumb:
  • The "number of cores = max load" Rule of Thumb: on a multicore system, your load should not exceed the number of cores available.
  • The "cores is cores" Rule of Thumb: How the cores are spread out over CPUs doesn't matter. Two quad-cores == four dual-cores == eight single-cores. It's all eight cores for these purposes.

Bringing It Home

Let's take a look at the load averages output from uptime:
~ $ uptime
23:05 up 14 days, 6:08, 7 users, load averages: 0.65 0.42 0.36
This is on a dual-core CPU, so we've got lots of headroom. I won't even think about it until load gets and stays above 1.7 or so.
Now, what about those three numbers? 0.65 is the average over the last minute, 0.42 is the average over the last five minutes, and 0.36 is the average over the last 15 minutes. Which brings us to the question:
Which average should I be observing? One, five, or 15 minute?
For the numbers we've talked about (1.00 = fix it now, etc), you should be looking at the five or 15-minute averages. Frankly, if your box spikes above 1.0 on the one-minute average, you're still fine. It's when the 15-minute average goes north of 1.0 and stays there that you need to snap to. (obviously, as we've learned, adjust these numbers to the number of processor cores your system has).
So # of cores is important to interpreting load averages ... how do I know how many cores my system has?
cat /proc/cpuinfo to get info on each processor in your system. Note: not available on OSX, Google for alternatives. To get just a count, run it through grep and word count: grep 'model name' /proc/cpuinfo | wc -l

Tuesday, October 9, 2012

How to check Linux disk failed

Linux Server SCSI / SATA Hard Disk Failure check 

 I/O errors in /var/log/messages indicates that something is wrong with the hard disk and it may be failing. You can check hard disk for errors using smartctl command, which is control and monitor utility for SMART disks under Linux / UNIX like operating systems

smartctl for servers

smartctl is a command line utility designed to perform SMART tasks such as printing the SMART self-test and error logs, enabling and disabling SMART automatic testing, and initiating device self-tests. First, make sure S.M.A.R.T. support is enabled in the BIOS.
Next, run the following command to see if your hard disks support S.M.A.R.T technology or not:
# smartctl -i /dev/sdb

To enable SMART, run:
# smartctl -s on -d ata /dev/sdb

Sample outputs:
smartctl version 5.33 [x86_64-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.

Run overall-health self-assessment test, enter:
# smartctl -d ata -H /dev/sdb

Sample outputs:
smartctl version 5.33 [x86_64-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
  

Sample failing hard Disk detailed report

# smartctl -a /dev/sda

smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED Please note the following marginal Attributes: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 190 Airflow_Temperature_Cel 0x0022 044 033 045 Old_age Always FAILING_NOW 56 (96 110 58 25)
 
The following will provide even more information about failing hard disk:
# smartctl --attributes --log=selftest /dev/sda


You can read more data from hard disk by typing the following command:
# smartctl -d ata -a /dev/sdb


A note about RAID controller

To look at ATA disks behind 3ware SCSI RAID controllers, the syntax is:
# smartctl -a -d 3ware,2 /dev/sda
# smartctl -a -d 3ware,0 /dev/twe0


SATA Health Check Disk Syntax

# smartctl -d sat --all /dev/sgX
# smartctl -d sat --all /dev/sg1

Run test:
# smartctl -d sat --all /dev/sg1 -H
For SAS disk use the following syntax:
# smartctl -d scsi --all /dev/sgX
# smartctl -d scsi --all /dev/sg1
# smartctl -d scsi --all /dev/sg1 -H



Configure SMARTD

Red Hat Linux

  • Install smartd #yum install smartd*
  • Enable smart by editing /etc/smartd.conf file.
  • Smart Configuration file: /etc/smartd.conf
  • Start/Stop smart: /etc/init.d/smartd start | stop

 Example

You can put following directives in Smart Configuration file:
(a) Send an email to alert@nixcraft.in for /dev/sdb:
/dev/sdb -m alert@nixcraft.in
(b) Read error log:
# smartctl -l error /dev/hdb
(c) Testing hard disk (short or long test):
# smartctl -t short /dev/hdb
# smartctl -t long /dev/hdb



Source : http://sourceforge.net/apps/trac/smartmontools/wiki
 

Thursday, October 4, 2012

Telnet Mail server troubleshooting command

 Telnet - SMTP Commands (sending mail using telnet)

#telnet mail.domain.ext 25
You should receive a reply like:

Trying ???.???.???.???...
Connected to mail.domain.ext.
Escape character is '^]'.
220 mail.domain.ext ESMTP Sendmail ?version-number?; ?date+time+gmtoffset?

You will then need to declare where you are sending the email from:

#HELO local.domain.name

This should give you:

250 mail.domain.ext Hello local.domain.name [loc.al.i.p], pleased to meet you

Now give your email address:

(On many mailservers the space after the : is required rather that optional. Thanks to Justing Goldberg)

MAIL FROM: mail@domain.ext

250 2.1.0 mail@domain.ext... Sender ok

Now give the recipients address:

RCPT TO: mail@otherdomain.ext

250 2.1.0 mail@otherdomain.ext... Recipient ok

To start composing the message issue the command DATA

If you want a subject for your email type Subject: type subject here- then press enter twice (these are needed to conform to RFC 882)


You may now proceed to type the body of your message (e.g. hello mail@otherdomain.ext from mail@domain.ext)


To tell the mail server that you have completed the message enter a single dot "." on a line on it's own.

The mail server should reply with: 250 2.0.0 ???????? Message accepted for delivery


You can close the connection by issuing the QUIT command.

The mailserver should reply with something like:

221 2.0.0 mail.domain.ext closing connection
Connection closed by foreign host.

Telnet - POP Commands (retrieving mail using telnet)

telnet mail.domain.ext 110

You should receive a reply like:

Trying ???.???.???.???...
Connected to mail.domain.ext.
Escape character is '^]'.
+OK ready


Then log in:

USER userName

This should give you:

+OK Password required for userName.


Now give your password:


PASS paSwd


Should yeild:

+OK userName has ? visible messages (? hidden) in ????? octets.


If it doesn't please see possible problems.


To see a list of your emails awaiting collection use the LIST command, this will also show you the id number of your messages (e.g. 1 or 2 etc.)

To view the contents of an email type RETR + the id number of the message (e.g RETR 1).

To delete a message use DELE + the id number of the message (e.g DELE 1).

To leave your mailbox and close the connection use QUIT

Linux Sendmail Configuration

Configuring Sendmail to send mail from command line

Sendmail is defualt MTA for most of the Linux distro  and very useful for system administration like if you want to cron alert you should have sendmail configured or want to server alert you should have sendmail or any other MTA but i recommend sendmail is best choice easy to use and setup.

So lets  get start  ed :)

Configuring it is very simple. First you'll need the sendmail-cf package. Install it using yum:
[root@server ~]# yum install sendmail-cf
Edit the file /etc/mail/sendmail.mc and add the following lines. Make sure you set your mail server domain name where it's bolded:
MASQUERADE_AS(yourdomain.com)dnl
MASQUERADE_DOMAIN(yourdomain.com)dnl
In the same file /etc/mail/sendmail.mc remove the "dnl" from the beginning of the lines so it will look like this:
LOCAL_DOMAIN(`localhost.localdomain')dnl
FEATURE(masquerade_envelope)dnl
FEATURE(masquerade_entire_domain)dnl
Save the file and compile it using m4:
[root@server ~]# m4 /etc/mail/sendmail.mc > /etc/sendmail.cf
Send Sendmail a -HUP signal using kill or simply restart the daemon for the configuration changes to take effect:
[root@server ~]# service sendmail restart

Testing your configuration using sendmail

And that's it! you're done. Just send yourself a test email to make sure it is really working:
[root@server ~]# /usr/sbin/sendmail -t < mail.txt
Where the contents of the mail.txt file are:
Date: Thu Nov 11 08:41:54 2007
To: you@somewhere.com
Subject: The subject of the message
From: whatever@somewhere.com
Body of message goes here

Testing your configuration using mutt

You can also use mutt/mail to test, which is a bit simpler (and you can also add the -a parameter for file attachment):
[root@server ~]# mutt -s "Test Email" you@somewhere.com < /dev/null
[root@server ~]# mail -s "Test Email" you@somewhere.com < /dev/null

Troubleshooting mail server using telnet .

http://linuxtroops.blogspot.in/search?q=telnet+25