Tuesday, October 9, 2012

How to check Linux disk failed

Linux Server SCSI / SATA Hard Disk Failure check 

 I/O errors in /var/log/messages indicates that something is wrong with the hard disk and it may be failing. You can check hard disk for errors using smartctl command, which is control and monitor utility for SMART disks under Linux / UNIX like operating systems

smartctl for servers

smartctl is a command line utility designed to perform SMART tasks such as printing the SMART self-test and error logs, enabling and disabling SMART automatic testing, and initiating device self-tests. First, make sure S.M.A.R.T. support is enabled in the BIOS.
Next, run the following command to see if your hard disks support S.M.A.R.T technology or not:
# smartctl -i /dev/sdb

To enable SMART, run:
# smartctl -s on -d ata /dev/sdb

Sample outputs:
smartctl version 5.33 [x86_64-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.

Run overall-health self-assessment test, enter:
# smartctl -d ata -H /dev/sdb

Sample outputs:
smartctl version 5.33 [x86_64-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
  

Sample failing hard Disk detailed report

# smartctl -a /dev/sda

smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED Please note the following marginal Attributes: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 190 Airflow_Temperature_Cel 0x0022 044 033 045 Old_age Always FAILING_NOW 56 (96 110 58 25)
 
The following will provide even more information about failing hard disk:
# smartctl --attributes --log=selftest /dev/sda


You can read more data from hard disk by typing the following command:
# smartctl -d ata -a /dev/sdb


A note about RAID controller

To look at ATA disks behind 3ware SCSI RAID controllers, the syntax is:
# smartctl -a -d 3ware,2 /dev/sda
# smartctl -a -d 3ware,0 /dev/twe0


SATA Health Check Disk Syntax

# smartctl -d sat --all /dev/sgX
# smartctl -d sat --all /dev/sg1

Run test:
# smartctl -d sat --all /dev/sg1 -H
For SAS disk use the following syntax:
# smartctl -d scsi --all /dev/sgX
# smartctl -d scsi --all /dev/sg1
# smartctl -d scsi --all /dev/sg1 -H



Configure SMARTD

Red Hat Linux

  • Install smartd #yum install smartd*
  • Enable smart by editing /etc/smartd.conf file.
  • Smart Configuration file: /etc/smartd.conf
  • Start/Stop smart: /etc/init.d/smartd start | stop

 Example

You can put following directives in Smart Configuration file:
(a) Send an email to alert@nixcraft.in for /dev/sdb:
/dev/sdb -m alert@nixcraft.in
(b) Read error log:
# smartctl -l error /dev/hdb
(c) Testing hard disk (short or long test):
# smartctl -t short /dev/hdb
# smartctl -t long /dev/hdb



Source : http://sourceforge.net/apps/trac/smartmontools/wiki
 

No comments:

Post a Comment