Monitoring RAID controller using Nagios
Monitoring RAID Controller using Nagios
Monitoring RAID controllers in a environment is very important because, if disk failed we will get notified then we can take an immediate preventing action by replacing the failed disk. In this article i am going to share with you guys is how to monitor “3ware Inc 9650SE SATA-II RAID PCIe” controller using Nagios monitoring tool. Monitoring RAID controller using Nagios
Prerequisites:
- RAID Controller card installed.
- Install tw_cli rpm
- Download a Nagios plugin called check_3ware_raid_1_1
- Install and configure NRPE client in remote machine
- Required access to both the machines (Nagios Server and Remote machine)
first we have to check RAID installed an server using below command
# lspci -v | grep RAID 08:00.0 RAID bus controller: 3ware Inc 9650SE SATA-II RAID PCIe (rev 01) Subsystem: 3ware Inc 9650SE SATA-II RAID PCIe
Monitor 3ware Inc 9650SE SATA-II RAID will be using tw_cli
Download and install
# rpm -ivh http://dl.atrpms.net/el5-i386/atrpms/stable/tw_cli-2.00.03.018-7.i386.rpm
Package information
# yum info tw_cli Name : tw_cli Arch : i386 Version : 2.00.03.018 Release : 7 Size : 4.0 M Repo : installed Summary : 3ware Command Line Interface Tool URL : http://www.3ware.com/ License : 3ware Description: tw_cli is a Command Line Interface Storage Management Software for : AMCC/3ware ATA RAID Controller(s). It provides controller, logical : unit and drive management. tw_cli can be used in both interactive and : batch mode, providing higher-level API (Application Programming : Interface) functionalities.
Files that are included in the package
# rpm -ql tw_cli /etc/tw_sched.cfg /sbin/tw_cli /sbin/tw_sched /sbin/tw_update /usr/share/doc/tw_cli-2.00.03.018 /usr/share/doc/tw_cli-2.00.03.018/tw_cli.8.html /usr/share/doc/tw_cli-2.00.03.018/tw_sched.8.html /usr/share/man/man8/tw_cli.8.gz /usr/share/man/man8/tw_sched.8.gz
View disk array status using below command after installing the tw_cli rpm
# tw_cli //hostname> /c0 show Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy ------------------------------------------------------------------------------ u0 RAID-10 OK - - 256K 1862.62 OFF ON Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 931.51 GB 1953525168 WD-WCAW1 p1 OK u0 931.51 GB 1953525168 WD-WCAW2 p2 OK u0 931.51 GB 1953525168 WD-WCAW3 p3 OK u0 931.51 GB 1953525168 WD-WCAW4
Add to Nagios check the RAID aid nrpe and now this script check_3ware_raid_1_1
Make the file executable and moved to the folder
# chmod +x check_3ware_raid_1_1 # mv check_3ware_raid_1_1 /usr/lib/nagios/plugins/check_raid
/usr/lib/nagios/plugins for 32-bit systems
/usr/lib64/nagios/plugins” for 64-bit systems
check the script by manually running from Nagios whether it is properly working are not.
# /usr/lib/nagios/plugins/check_raid RAID OK: All arrays OK [1 array checked on 1 controller]
Now connect to remote machine and copy the script to “/usr/local/nagios/libexec/” directory as default path. If you changed your installation path then copy the script accordingly.
open your nrpe configuration file
# vi /etc/nagios/nrpe.cfg
Or
# vi /usr/local/nagios/nrpe.cfg
add below line to the line
command[check_raid]=sudo /usr/lib/nagios/plugins/check_raid
Restart nrpe Service using below command
# service nrpe restart Shutting down Nagios NRPE daemon (nrpe): [ OK ] Starting Nagios NRPE daemon (nrpe): [ OK ]
Check from a user operates nrpe
# ps waux | grep nrpe nagios 0.0 0.0 5092 884 ? Ss 0:50 nrpe -c /etc/nagios/nrpe.cfg -d
Note: If your environment is not permit to run an scripts using nagios user then add nagios user to sudoers list by editing “vi /etc/sudoers” else script running will fail.
Now go back to your Nagios server terminal and edit existing HOST configuration file “vi /usr/local/nagios/etc/objects/REMOTEHOST.cfg” in this case it is TEST.cfg
#vi /usr/local/nagios/etc/objects/TEST.cfg define service{ use generic-service host_name TEST service_description RAID is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 1 retry_check_interval 1 contact_groups admins notification_interval 120 notification_period 24x7 notification_options c,r check_command check_nrpe!check_raid }
Save the file and exist.
Reload / Restart your Nagios service using below command.
# service nagios reload OR # service nagios restart
Now check the result after 5 – 10 minutes in Nagios web console.
Good Luck..!!
Your feedback is most valuable to us……. Monitoring RAID controller using Nagios Monitoring RAID controller using Nagios Monitoring RAID controller using Nagios
Thanks for your wonderful Support and Encouragement