Monitoring Windows Client Using Nagios
Monitoring Windows Client Using Nagios . To monitor the windows machine using Nagios we have to install the NSClient in windows Desktop / Server. NSClient++ is compatible for all windows versions such as Server and Desktop.
After installing NSClient in remote client, we have to verify the NSClient is responding from Nagios Server in order to verify the NSClient response run the below command from Nagios server.
[root@TechTutorial libexec]# pwd /usr/local/nagios/libexec [root@TechTutorial libexec]# ./check_nt -H 192.168.234.141 -s password -p 12489 -v CLIENTVERSION NSClient++ 0.4.4.15 2015-11-25
above command is confirming that NSClient is ready
Now we have to write the commands in Nagios server in order to monitor the windows services
To configure the commands edit the file called commands.cfg which is located in /usr/local/nagios/etc/objects/ (default installation) directory
[root@TechTutorial objects]# pwd /usr/local/nagios/etc/objects [root@TechTutorial objects]# vi commands.cfg
The commands to monitor the windows server and its services, what are the commands are required for you configure only that from below.
Disk space checking
below command is to check disk space utilization (.i.e. C: D: E: F: …… Z:)
define command{ command_name check_nt_disk command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v USEDDISKSPACE -l $ARG1$ -w $ARG2$ -c $ARG3$ }
Below command is to check CPU utilization
define command{ command_name check_nt_cpuload command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v CPULOAD -l $ARG1$ }
Check Server uptime
define command{ command_name check_nt_uptime command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v UPTIME }
Check NSClient installed version
define command{ command_name check_nt_clientversion command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v CLIENTVERSION }
Running process status verify using below command
define command{ command_name check_nt_process command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v PROCSTATE -l $ARG1$ }
below mentioned command will check all the windows services status as to mention in “service name” in argument1
define command{ command_name check_nt_service command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v SERVICESTATE -d SHOWALL -l $ARG1$ }
below mentioned command will check windows memory usage utilization
define command{ command_name check_nt_memuse command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v MEMUSE -w $ARG1$ -c $ARG2$ }
to check windows machine paging file usage
define command{ command_name check_nt_pagingfile command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\Paging File(_Total)\\% Usage","Paging File usage is %.2f %%" -w $ARG1$ -c $ARG2$ }
DHCP queue length verification
define command{ command_name check_nt_DHCP_queue_length command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\DHCP Server\\Conflict Check Queue Length","Waiting in DHCP Queue due to Conflict is %.f" -w 2 -c 5 }
DHCP Active queue
define command{ command_name check_nt_DHCP_active_queue_length command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\DHCP Server\\Active Queue Length","Waiting in DHCP Queue for Normal Processing is %.f" -w 15 -c 30 }
DHCP – Average response time calculation
define command{ command_name check_nt_DHCP_average_response_time command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\DHCP Server\\Milliseconds per packet (Avg)","Average DHCP Server Response in is %.f" -w 70 -c 250 }
To verify your DNS recursive query is resolving Name to IP we can check using below command
define command{ command_name check_nt_DNS_recursive_query_failures command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\DNS\\Recursive Query Failure/sec","DNS Recursive Queries are failing at %.f per second" -w 5 -c 80 }
DNS recursive query timeouts, given query to the DNS is resolving within given time
define command{ command_name check_nt_DNS_recursive_query_timeouts command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\DNS\\Recursive Query TimeOut/sec","DNS Recursive Queries are failing because Timed Out at %.f per second" -w 5 -c 80 }
DNS – Secure Update Failures
define command{ command_name check_nt_DNS_secure_update_failures command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\DNS\\Secure Update Failure","DNS Secure Update Failures since last Service Restart is %.f" -w 1 -c 15 }
DNS – Total queries received per second
define command{ command_name check_nt_DNS_total_queries_per_sec command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\WINS Server\\Failed Queries/sec","Total Queries received per second is %.f" -w 3 -c 5 }
This below command will check the User login errors after last reboot
define command{ command_name check_nt_logon_errors command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\Server\\Errors Logon","Logon Errors since last reboot is %.f" -w 50 -c 150 }
CIFS / SMB General System Errors, If you want to check errors of CIFS below command will check
define command{ command_name check_nt_SMB_general_system_errors command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\Server\\Errors System","SMB Errors due to Server problems is %.f" -w 2 -c 20 }
CIFS / SMB Blocking requests rejected
define command{ command_name check_nt_SMB_blocking_requests_rejected command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\Server\\Blocking Requests Rejected","SMB Blockiing requests rejected due to insufficient free resources is %.f Server Parameters need adjusting" -w 10 -c 100 }
CPU Load average every 10minutes, 60minutes and 24Hours
define command{ command_name check_nt_cpu_avg command_line $USER1$/check_nt -H $HOSTADDRESS$ -v CPULOAD -l 10,60,95,60,60,95,1440,60,95 }
Memory pool non paged
define command{ command_name check_nt_memory_pool_nonpaged_peak command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\Server\\Pool Nonpaged Peak","Maxium number of bytes of nonpaged pool which should be same as installed physical memory is %.f" }
Memory Pool Paged Failures
define command{ command_name check_nt_memory_pool_paged_failures command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\Server\\Pool Paged Failures","Number of times allocation from the page pool have failed is %.f Physical RAM or paging file too small" -w 2 -c 50 }
Paging File Usage 30% is warning and 60 is CRITICAL
define command{ command_name check_nt_paging_file_useage command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\Paging File(_Total)\\% Usage","Paging file usage is %.2f %%" -w 30 -c 60 }
System PTEs with 3GB switch
define command{ command_name check_nt_system_PTE_with_3GB command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\Memory\\Free System Page Table Entries","Number of Page Table Entries not being used is %.f Thresholds set for testing /3GB switch on or off" -w 8000 -c 5000 }
Registry Quota in Use (percent)
When applications such as Rdisk.exe and other Backup software are used to backup the registry, the amount of paged pool memory used by these applications are charged towards the registry’s quota. If the amount consumed reaches 95 percent of the Registry Size Limit then the warning popup mentioned above will be displayed. The warning is displayed only once for each boot cycle; which means that the popup will not be displayed until the system is rebooted, and the threshold reached again.
define command{ command_name check_nt_registry_quota_in_use command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\System\\% Registry Quota In Use","Percent Quota in use is %.2f %%" -w 60 -c 85 }
Server Queue Length
define command{ command_name check_nt_server_work_queues command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\Server Work Queues(0)\\Queue Length","Current work queue which is an indication of Processing Load is %.f " -w 4 -c 7 }
Disk Queue Length
define command{ command_name check_nt_queue_length command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\PhysicalDisk(_Total)\\Avg. Disk Queue Length","Average number of both read and write requests queued is %.2f Consider a faster disk array" -w 1 -c 5 }
Printer not ready error from last reboot
define command{ command_name check_nt_printer_not_ready command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\Print Queue(_Total)\\Out of Paper Errors","Out of Paper Printer Errors since last Service restart is %.f" -w 1 -c 3 }
Printer out of paper errors
define command{ command_name check_nt_printer_out_of_paper_errors command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\Print Queue(_Total)\\Out of Paper Errors","Out of Paper Printer Errors since last Service restart is %.f" -w 1 -c 3 }
SMTP Local delivery Queue
define command{ command_name check_nt_smtp_local_delivery command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\SMTP Server(_Total)\\Local Queue Length","Number of Messages waiting in queue for Local Recipients is %.f" -w 5 -c 15 }
SMTP Remote Delivery Queue
define command{ command_name check_nt_smtp_remote_delivery_queue command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\SMTP Server(_Total)\\Remote Queue Length","Number of Messages waiting in queue for Remote Recipients is %.f" -w 25 -c 50 }
Exchange Active users count
define command{ command_name check_nt_exchange_active_user_count command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\MSExchangeIS\\Active User Count" -w 4 -c 10 }
Exchange Connection Count
define command{ command_name check_nt_exchange_connection_count command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\MSExchangeIS\\Connection Count" -w 100 -c 250 }
Exchange Message Delivery time
define command{ command_name check_nt_exchange_delivery_time command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\MSExchangeIS(_Average Delivery Time)\\%%Usage","Average Delivery Time is %.2f%%" -w 2 -c 10 }
Exchange Maximum Users
define command{ command_name check_nt_exchange_maximum_users command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\MSExchangeIS\\Connection Count" -w 100 -c 250 }
Exchange messages delivered in Minute
define command{ command_name check_nt_exchange_messages_delivered_per_min command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "MSExchangeIS Mailbox(_Total)\\Messages Delivered/min" -w 25 -c 120 }
Exchange Messages Submitted in Minute
define command{ command_name check_nt_exchange_messages_submitted_per_min command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\MSExchangeIS Mailbox(_Total)\\Messages Submitted/min" -w 5 -c 35 }
Exchange Receive Queue
define command{ command_name check_nt_exchange_receive_queue command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\MSExchangeIS Mailbox(_Receive Queue Size)\\%%Usage","Queue Length is %.2f" -w 3 -c 15 }
Exchange Send queue
define command{ command_name check_nt_exchange_send_queue command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\MSExchangeIS(_Send Queue Size)\\%%Usage","Queue Length is $.2f%%" -w 3 -c 15 }
SQL Database Data file size (Total)
define command{ command_name check_nt_sql_database_files_size command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\SQLServer:Databases(_Total)\\Data File(s) Size (KB)","SQL Server Databases Datafile size total is %.f" -w 5 -c 30 }
SQL Database Log files size (Total)
define command{ command_name check_nt_sql_database_log_files_size command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\SQLServer:Databases(_Total)\\Log File(s) Size (KB)","SQL Server Databases Logfile size total %.f" -w 10 -c 100 }
SQL Database Data file size ( Individual)
define command{ command_name check_nt_sql_database_data_file_size_individual command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\MSSQL$SBSMONITORING:Databases(_Total)\\Data File(s) Size (KB)","SQL Server Databases Datafile size total is %.f" -w 5 -c 30 }
SQL Database Log File size (Individual)
define command{ command_name check_nt_sql_database_log_file_size_individual command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\MSSQL$SBSMONITORING:Databases(_Total)\\Log File(s) Size (KB)","SQL Server Databases Logfile size total %.f" -w 10 -c 100 }
SQL Server service status ( This we can also check using service status )
define command{ command_name check_nt_sql_server_service command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v SERVICESTATE -l mssqlserver }
SQL Database Size
define command{ command_name check_nt_sql_database_size command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\SQLServerDatabases(_Data Files() Size (KB)\\%%Usage","Databases Size is $.2f%%" -w 30000 -c 70000 }
SQL Server Deadlocks / Second
define command{ command_name check_nt_sql_server_deadlocks_per_sec command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\SQLServer:Locks(_Total)\\Number of Deadlocks/sec","SQL Server Deadlocks per second total %.f" -w 1 -c 5 }
SQL Server Connections
continuous 20 connections will give to warning and 40 connections will give you error
define command{ command_name check_nt_sql_server_connections command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\SQLServer:General Statistics\\User Connections","SQL Server Connections are %.f" -w 20 -c 40 }
SQL Database data size (Total)
define command{ command_name check_nt_sql_database_data_size command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\SQLServer:Databases(_Total)\\Data File(s) Size (KB)","SQL Server Databases Datafile size total is %.f" -w 5 -c 30 }
SQL Database Log Size (Total)
define command{ command_name check_nt_sql_database_data_size command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\SQLServer:Databases(_Total)\\Log File(s) Size (KB)","SQL Server Databases Logfile size total %.f" -w 10 -c 100 }
SQL Replication Agents (SharePoint Instance)
define command{ command_name check_nt_sql_replication_agents command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\MSSQL$Sharepoint:Replication Agents\\Running","Number of SQL Server Replication Agents running are %.f“ }
SQL Database Log size (SharePoint Instance)
define command{ command_name check_nt_sql_database_log_size_sharepoint command_line $USER1$/check_nt -H $HOSTADDRESS$ -s password -p 12489 -v COUNTER -l "\\MSSQL$Sharepoint:Databases(_Total)\\Log File(s) Size (KB)","SQL Server Databases Logfile size total %.f" -w 10 -c 30 }
Required commands we will add them to Nagios server, after adding them to commands.cfg file we have to configure the windows template and services to monitor.
Now Create a File with Windows Server name in Nagios Server
Monitoring Windows Client Using Nagios
# touch ARKIT-WINDOWS.cfg [root@TechTutorial objects]# cat ARKIT-WINDOWS.cfg ############################################################################### ############################################################################### # # HOST DEFINITIONS Tech Tutorial http://arkit.co.in # ############################################################################### ############################################################################### define host{ use windows-server ; Windows Template host_name ARKIT-WINDOWS ; ARKIT-WINDOWS machine alias Tech Tutorial Windows Server ARKIT-WINDOWS ; How much long you want you write contact_groups admins address 192.168.1.2 } ############################################################################### ############################################################################### # # SERVICE DEFINITIONS # ############################################################################### ############################################################################### #To Check Nagios Client Version define service{ use generic-service host_name ARKIT-WINDOWS service_description NSClient++ Version check_command check_nt_clientversion } ## To check Server UPTIME define service{ use generic-service host_name ARKIT-WINDOWS service_description Uptime check_command check_nt_uptime } ## To check CPULOAD define service{ use generic-service host_name ARKIT-WINDOWS service_description CPU Load check_command check_nt_cpuload!5 } # Memory Usage check define service{ use generic-service host_name ARKIT-WINDOWS service_description Memory Usage check_command check_nt_memuse!80!90 } # To check C: Drive space utilization define service{ use generic-service host_name ARKIT-WINDOWS service_description C:\ Drive Space check_command check_nt_disk!c!80!90 } define service{ use generic-service host_name ARKIT-WINDOWS service_description W3SVC check_command check_nt_service!W3SVC } ## END Config FILE ######
Place this file into your configuration directory and reload your nagios services.
If there is no configuration directory is configured then add this file path to nagios.cfg
approx 36 number line
# vi /usr/local/nagios/etc/objects/nagios.cfg cfg_file=/usr/local/nagios/etc/objects/ARKIT-WINDOWS.cfg
restart / reload nagios service
Thats it your windows server /desktop is added successfully into Nagios monitoring, go and check in Nagios server web console to services status.
Your feedback is more valuable to us… Write your comments below..
Monitoring Windows Client Using Nagios Monitoring Windows Client Using Nagios Monitoring Windows Client Using Nagios Monitoring Windows Client Using Nagios Monitoring Windows Client Using Nagios Monitoring Windows Client Using Nagios
Thanks for your wonderful Support and Encouragement