FAQ
Errors
wmic won’t compile: /usr/include/stdc-predef.h:0: Syntax error near ‘3’
Table of Contents
The plugin works on the command line, but not in Nagios
Table of Contents
This could be so many things!
The chances are that this is not a plugin issue ie with the code itself, but most likely with the way the plugin has been setup or how Nagios has been setup.
I’m only going to cover so of the areas at a high level since there are plenty of other resources on the Internet to sort this one out. See this FAQ article for more detail.
- Make sure you can get some other checks with some other plugin working. Try a cutdown plugin with the following 3 lines of code:
#!/bin/sh
echo OK
exit 0Then try a cutdown perl plugin with the following 3 lines of code:
#!/usr/bin/perl
print “OK – Test\n”;
exit 0; - Run the plugin from the command line as the same user that runs the Nagios process. This should help with permissions problems.
- Setup your Nagios Service and Command definitions using the samples provided, here and here. Make sure that there are no hidden characters eg spaces/tabs/CR in the wrong places in the files.
- Turn on Nagios debugging in the nagios.cfg file. Set it to dump everything with high verbosity. Then in the debug output look for where the plugin is run so you can see the exact command line and then a few lines later you should be able to see the output of the plugin. If that is still not enough information then change your Nagios command and add -d to the end of the Check WMI Plus command line. You might want to do something like
tail -f /var/log/nagios/nagios.debug | grep -C 10 check_wmi_plus
to see the debug output. - Try the Nagios-users mailing list
- Get a Nagios support contract 🙂
The plugin does not trigger a warning or critical status in Nagios
Table of Contents
The highest probability cause for this problem is that you have not read the –help output.
Here is the bit you want:
WARNING AND CRITICAL SPECIFICATION
If warning or critical specifications are not provided then no checking is done and the check simply returns the value and any related performance data. If they are specified then they should be formatted as shown below.
In other words, you have to specify some warning or critical criteria on the plugin commandline (-w or -c)!
Here are some examples.
Some checks return data, others don’t
Table of Contents
Assuming that your Windows machine has the correct classes and applications/services installed then this is probably a permissions problem.
For example, if you have a Windows server which has DNS and DHCP configured but the checks only work for checkdns, then you probably have this problem.
You probably need to check your permissions as described here.
You can normally quickly prove that it is a permission problem by using an administrators account to check with. If the check succeeds with an administrators account but not the account with reduced permissions, then you have a permission problem.
I’m getting NT_STATUS_CONNECTION_REFUSED
Table of Contents
This is unlikely to be a plugin problem. If you run wmic directly from the command line do you get the same result? Probably.
It typically means that the WMI connection you are trying to make is being rejected, either by the Windows box or perhaps a firewall in-between. Often firewalls simply drop packets and so you may just get a timeout instead of the rejection message.
Sébastien Maury reports that the following ports are required to ensure successful WMI access:
Nagios –> Windows : 135 (TCP) – RPC
Nagios –> Windows : 49125 to 65535 (UDP/TCP) – RPC
I’m getting “UNKNOWN – The WMI query had problems. The error text from wmic is:”
Table of Contents
This is a generic error message for problems encountered with the wmic command.
The plugin adds some English text to common/known wmic problems but if wmic throws some other error then the plugin can’t do anything about it and you will need to troubleshoot the wmic command yourself and ideally report the solution to us so that we can add some useful error message to the plugin.
Use the -d parameter to see the wmic command used and then test directly with the wmic command. If you also use the -z parameter the complete wmic command, including your user/password details will be in the debug output.
Normally, you’d try different hosts, usernames, passwords, operating system versions etc to try and find the common denominator to help you troubleshoot.
As an example, if wmic returns:
[librpc/rpc/dcerpc_connect.c:329:dcerpc_pipe_connect_ncacn_ip_tcp_recv()] failed NT status (c00000b5) in dcerpc_pipe_connect_ncacn_ip_tcp_recv
[librpc/rpc/dcerpc_connect.c:790:dcerpc_pipe_connect_b_recv()] failed NT status (c00000b5) in dcerpc_pipe_connect_b_recv
[wmi/wmic.c:196:main()] ERROR: Login to remote object.
NTSTATUS: NT_STATUS_IO_TIMEOUT – NT_STATUS_IO_TIMEOUT
We know that this relates to a hostname that probably does not resolve via DNS.
But if we get an error like:
[librpc/rpc/dcerpc_connect.c:329:dcerpc_pipe_connect_ncacn_ip_tcp_recv()]
failed NT status (c00000c4) in dcerpc_pipe_connect_ncacn_ip_tcp_recv
[librpc/rpc/dcerpc_connect.c:790:dcerpc_pipe_connect_b_recv()] failed NT
status (c00000c4) in dcerpc_pipe_connect_b_recv
We have no idea what this error really means and what causes it.
Could not correctly parse “customfield” definition in ini file
Table of Contents
This happens when a ini file check contains an incorrectly formatted “customfield” definition. Correct the format.
Could not correctly parse “createlist” definition in ini file
Table of Contents
This happens when a ini file check contains an incorrectly formatted “createlist” definition. Correct the format.
Checkfileage does not work when file names use {}
Table of Contents
If you have a file name containing the curly braces characters {} then checkfileage will not work.
For example, testing the file age of the following file:
C:/Program Files/HP/{E94E150C-762B-4cd1-8A54-7228A07C0710}/scrubber.exe
check_wmi_plus.pl -H HOST -u USER -p PASS -m checkfileage -a "C:/Program Files/HP/{E94E150C-762B-4cd1-8A54-7228A07C0710}/scrubber.exe"
Results in:
UNKNOWN - Could not find the file C:/Program Files/HP/{E94E150C-762B-4cd1-8A54-7228A07C0710}/scrubber.exe
If you look at the debug output (add -d to the command line) you will notice that the actual WMI query used is:
Select name,lastmodified from CIM_DataFile where name="C:\\Program Files\\HP\\\\scrubber.exe"
Notice that the {E94E150C-762B-4cd1-8A54-7228A07C0710}
part of the file name is actually totally missing.
This is because the use of { } is used internally to Check WMI Plus for signifying parameter substitution. That is, the WMI query in the code is coded to use command line argument 1 as the filename. It is specified in the WMI query as {_arg1} and a regular expression is used to look in the coded WMI query to substitute command line parameters into place before execution. So when {E94E150C-762B-4cd1-8A54-7228A07C0710} is specified in the filename Check WMI Plus looks for a parameter “E94E150C-762B-4cd1-8A54-7228A07C0710”, which returns nothing so “{E94E150C-762B-4cd1-8A54-7228A07C0710}” is replaced with nothing ie deleted.
Check WMI Plus variable substitution is used extensively within ini files. It is documented in samples.ini
.
Fortunately there is a workaround you can do to make this all work. Ironically, it is actually using parameter substitution.
For the above example, if we add a third command line parameter (we can do this since its not used for anything else by checkfileage) containing part of the filename, we can actually substitute this into the filename we are actually looking for. Here’s the example to help explain it:
Change the filename we are looking for to use a command line substitution variable for arg3.
Then set arg3 to be the missing part of the file name, {E94E150C-762B-4cd1-8A54-7228A07C0710}.
Like this:
check_wmi_plus.pl -H HOST -u USER -p PASS -m checkfileage -a "C:/Program Files/HP/{_arg3}/scrubber.exe" -3 "{E94E150C-762B-4cd1-8A54-7228A07C0710}"
Now before the WMI query runs, {_arg3} is replaced by {E94E150C-762B-4cd1-8A54-7228A07C0710} and we get our file name back!
Checkcpu Permission issues
Table of Contents
For some reason obtaining CPU utilisation information via WMI can be somewhat trickier than any other WMI data.
The bottom line is that, if it works using an enabled administrator account, then you have a permissions problem.
Some of the following may work:
- Add the monitoring user to the performance log group
- Enable the administrator account (normally only applies to Windows desktop Operating Systems)
Also, try the following FAQ article – How do I setup the Windows User for wmic
**ePN /nagios-plugins/check_wmi_plus.pl: “Use of uninitialized value $opt in string eq at /usr/share/perl/5.x/Getopt/Long.pm line 487
Table of Contents
If you get the above error message or a very similar one from Getopt::Long then there is a workaround you can apply to the Getopt module itself.
Locate Long.pm
on your system. It might be /usr/share/perl5/Getopt/Long.pm
or similar
Edit it and locate the code (which will be around the line number that the error message says):
while ( $goon && @$argv > 0 ) {
# Get next argument.
$opt = shift (@$argv);
print STDERR ("=> arg \"", $opt, "\"\n") if $debug;
Try modifying it to this:
while ( $goon && @$argv > 0 ) {
# Get next argument.
$opt = shift (@$argv) || '';
print STDERR ("=> arg \"", $opt, "\"\n") if $debug;
while ( $goon && @$argv > 0 ) {
You are basically just adding || ” in the place shown (which is 2 | (pipe characters) and 2 ‘ (single quotes).
Just be aware that if you apply this workaround then every time you upgrade Perl or whatever other package contains the Getopt module then you will need to reapply the fix.
Examples
Regular Expression
Table of Contents
Check WMI Plus uses regular expressions for many of its arguments/parameters. These are very powerful “search strings”. If the –help output says that an argument uses regular expressions then you have a very powerful tool for finding exactly what you want.
Check WMI Plus uses Perl Regular Expressions.
Regular expressions can be used to search for one very precise item and even multiple very precise items in a single definition.
Common/Basic regular expression characters:
. | Any character |
* | None or more characters |
+ | One or more characters |
| | Used for OR |
[] | Used for a group of characters |
^ | Used to mark the start of a string |
^ | Used to mark the end of a string |
There are many more regular expression characters available, read about them all here.
Below are some examples:
check_wmi_plus.pl -H win0 -u administrator -p parallel12 -m checkdrivesize --noishowusage -a REGEX
where REGEX is in the table below
REGEX | Description |
. | Matches all drives, since . matches any single character, and all drives have at least one character |
^C | Matches any drives starting with “C” |
C | Matches any drives containing “C” |
C$ | Matches any drives ending in “C” |
C | Matches any drives containing “C” |
C|D | Matches any drives containing “C” or “D” |
C | Matches any drives containing “C” |
C* | Matches any drives containing no “C” or more |
C+ | Matches any drives containing at least one “C” |
C|D | Matches any drives containing “C” or “D” |
[CD] | Matches any drives containing “C” or “D”. This seems similar to above, and it is, but it can be used in the place of a single character eg DATA[123] would match DATA1 or DATA2 or DATA3 |
Can you show me some example warning/critical criteria?
Table of Contents
The checks shown on this page are generated using the –iexample=2 parameter.
A valid -H HOSTNAME -u USER and -p PASSWORD must also be passed on the command line.
The examples are run against a test machine (which is not very busy) running Windows Server 2008 R2, IIS v7, SQL Express 2008 and Exchange 2010.
The Theory
Check WMI Plus follows the standard Nagios Plugin warning/critical range defintions for checking values. Read the –help output for details. You want to look at the section titled “WARNING AND CRITICAL SPECIFICATION”.
The generalised Nagios format for ranges is [@]start:end
.
Check WMI Plus adds the ability to check against many different values at the same time on the command line. This makes the generalised format for ranges FIELD=[@]start:end
.
Check WMI Plus also adds the ability to specify “multipliers” to the numbers that define the range. Multipliers such as m (for Mega) and g (for Giga) can be used. The complete list is shown in –help.
The range format generates warnings/criticals if the value is outside the range defined by start:end. If an @ is specified, the warning/critical is generated if the value is inside the range.
Lets Start Easy
Some of the following commands need at least 2 WMI data samples. If the command output shows Collecting first WMI sample because the previous state data file (/tmp/cwpss_somefilename.state) contained no data. Results will be shown the next time the plugin runs.
then you need to run the command a second time to see the output.
The plugin output is colour coded as follows:
Plugin display output
Warning/Critical trigger information
Performance Data
Basic SQL Server Service Check with no Criteria
This checks just lists the services and their state. Some are OK and some have problems.
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkservice -a sql
Output : OK - Found 6 Services(s), 2 OK and 4 with problems (0 excluded). 'Microsoft Search (Exchange)' (msftesql-Exchange) is Stopped, 'SQL Server (SQLEXPRESS)' (MSSQL$SQLEXPRESS) is Running, 'SQL Active Directory Helper Service' (MSSQLServerADHelper100) is Stopped, 'SQL Server Agent (SQLEXPRESS)' (SQLAgent$SQLEXPRESS) is Stopped, 'SQL Server Browser' (SQLBrowser) is Running, 'SQL Server VSS Writer' (SQLWriter) is Stopped.|'Total Service Count'=6; 'Service Count OK State'=2; 'Service Count Problem State'=4; 'Excluded Service Count'=0;
Checkservice defines a “good” service as one that is running and a “bad” service and one that is not running.
Lets Add Some Warning Criteria
If you take a look at the –help output for checkservice you can see that the only Valid Warning/Critical Fields are:
_NumBad (Default), _NumGood, _NumExcluded, _Total.
So, if we want to warn if there is any service that is bad then we would use the following:
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkservice -a sql -w 0
Output : WARNING - [Triggered by _NumBad>0] - Found 6 Services(s), 2 OK and 4 with problems (0 excluded). 'Microsoft Search (Exchange)' (msftesql-Exchange) is Stopped, 'SQL Server (SQLEXPRESS)' (MSSQL$SQLEXPRESS) is Running, 'SQL Active Directory Helper Service' (MSSQLServerADHelper100) is Stopped, 'SQL Server Agent (SQLEXPRESS)' (SQLAgent$SQLEXPRESS) is Stopped, 'SQL Server Browser' (SQLBrowser) is Running, 'SQL Server VSS Writer' (SQLWriter) is Stopped.|'Total Service Count'=6; 'Service Count OK State'=2; 'Service Count Problem State'=4;0; 'Excluded Service Count'=0;
-w 0
is a short form of the full range specification 0:0 which defines a range from 0 to 0. If the number of “bad” services are outside this range then we get a warning state. So, if we find even 1 “bad” service we will get a warning.
Since we have not specified _NumBad, _NumGood etc the default of _NumBad is used. So the warning criteria of -w 0
says that the plugin should go to warning state if it finds more than zero services in a “bad” state.
Now if we want to warn if we find more than zero “good” services we would have to tell the warning criteria that it should apply to the _NumGood value like this:
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkservice -a sql -w _NumGood=0
Output : WARNING - [Triggered by _NumGood>0] - Found 6 Services(s), 2 OK and 4 with problems (0 excluded). 'Microsoft Search (Exchange)' (msftesql-Exchange) is Stopped, 'SQL Server (SQLEXPRESS)' (MSSQL$SQLEXPRESS) is Running, 'SQL Active Directory Helper Service' (MSSQLServerADHelper100) is Stopped, 'SQL Server Agent (SQLEXPRESS)' (SQLAgent$SQLEXPRESS) is Stopped, 'SQL Server Browser' (SQLBrowser) is Running, 'SQL Server VSS Writer' (SQLWriter) is Stopped.|'Total Service Count'=6; 'Service Count OK State'=2;0; 'Service Count Problem State'=4; 'Excluded Service Count'=0;
Given the above states, if we wanted to warn if there were less than 4 “good” services we have use a slightly different form of the range definition. We want to define a range that is from 4 to infinity, so that if we are outside this we get a warning. 4:
defines a range like this.
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkservice -a sql -w _NumGood=4:
Output : WARNING - [Triggered by _NumGood<4] - Found 6 Services(s), 2 OK and 4 with problems (0 excluded). 'Microsoft Search (Exchange)' (msftesql-Exchange) is Stopped, 'SQL Server (SQLEXPRESS)' (MSSQL$SQLEXPRESS) is Running, 'SQL Active Directory Helper Service' (MSSQLServerADHelper100) is Stopped, 'SQL Server Agent (SQLEXPRESS)' (SQLAgent$SQLEXPRESS) is Stopped, 'SQL Server Browser' (SQLBrowser) is Running, 'SQL Server VSS Writer' (SQLWriter) is Stopped.|'Total Service Count'=6; 'Service Count OK State'=2;4; 'Service Count Problem State'=4; 'Excluded Service Count'=0;
If we wanted to warn when we found less than 2 “good” services OR more than 4 “good” services we need to define a range of 2 to 4. This is done like 2:4
. The command is:
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkservice -a sql -w _NumGood=2:4
Output : OK - Found 6 Services(s), 2 OK and 4 with problems (0 excluded). 'Microsoft Search (Exchange)' (msftesql-Exchange) is Stopped, 'SQL Server (SQLEXPRESS)' (MSSQL$SQLEXPRESS) is Running, 'SQL Active Directory Helper Service' (MSSQLServerADHelper100) is Stopped, 'SQL Server Agent (SQLEXPRESS)' (SQLAgent$SQLEXPRESS) is Stopped, 'SQL Server Browser' (SQLBrowser) is Running, 'SQL Server VSS Writer' (SQLWriter) is Stopped.|'Total Service Count'=6; 'Service Count OK State'=2;4; 'Service Count Problem State'=4; 'Excluded Service Count'=0;
If we just wanted to warn if we find more than 10 services matching “sql” then we need to use the _Total field:
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkservice -a sql -w _Total=10
Output : OK - Found 6 Services(s), 2 OK and 4 with problems (0 excluded). 'Microsoft Search (Exchange)' (msftesql-Exchange) is Stopped, 'SQL Server (SQLEXPRESS)' (MSSQL$SQLEXPRESS) is Running, 'SQL Active Directory Helper Service' (MSSQLServerADHelper100) is Stopped, 'SQL Server Agent (SQLEXPRESS)' (SQLAgent$SQLEXPRESS) is Stopped, 'SQL Server Browser' (SQLBrowser) is Running, 'SQL Server VSS Writer' (SQLWriter) is Stopped.|'Total Service Count'=6;10; 'Service Count OK State'=2; 'Service Count Problem State'=4; 'Excluded Service Count'=0;
Warning If Inside the Range
So far we have specified ranges and warned if we go outside these ranges. Sometimes we need to warn when inside a range. For example, if we only want to warn if there is exactly 1 “bad” service (so less then 1 is ok and more than 1 is ok) then we have to specify a range of 1 to 1 and warn if we are inside this range. To do this we have to invert the range specification using the @.
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkservice -a sql -w @1:1
Output : OK - Found 6 Services(s), 2 OK and 4 with problems (0 excluded). 'Microsoft Search (Exchange)' (msftesql-Exchange) is Stopped, 'SQL Server (SQLEXPRESS)' (MSSQL$SQLEXPRESS) is Running, 'SQL Active Directory Helper Service' (MSSQLServerADHelper100) is Stopped, 'SQL Server Agent (SQLEXPRESS)' (SQLAgent$SQLEXPRESS) is Stopped, 'SQL Server Browser' (SQLBrowser) is Running, 'SQL Server VSS Writer' (SQLWriter) is Stopped.|'Total Service Count'=6; 'Service Count OK State'=2; 'Service Count Problem State'=4;1; 'Excluded Service Count'=0;
Multiple Warning Criteria
You can specify multiple criteria at the same time. For example, if we wanted to go to a warning state if there was more than 1 “bad” service (-w 0
), less than 4 “good” services (-w _NumGood=4:
) or more than 5 total services (-w _Total=5
) we just add them all to the command line at the same time:
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkservice -a sql -w 0 -w _NumGood=4: -w _Total=5
Output : WARNING - [Triggered by _NumBad>0,_NumGood5] - Found 6 Services(s), 2 OK and 4 with problems (0 excluded). 'Microsoft Search (Exchange)' (msftesql-Exchange) is Stopped, 'SQL Server (SQLEXPRESS)' (MSSQL$SQLEXPRESS) is Running, 'SQL Active Directory Helper Service' (MSSQLServerADHelper100) is Stopped, 'SQL Server Agent (SQLEXPRESS)' (SQLAgent$SQLEXPRESS) is Stopped, 'SQL Server Browser' (SQLBrowser) is Running, 'SQL Server VSS Writer' (SQLWriter) is Stopped.|'Total Service Count'=6;5; 'Service Count OK State'=2;4; 'Service Count Problem State'=4;0; 'Excluded Service Count'=0;
When there are multiple criteria specified, if any one of them is triggered, the plugin will exit with the approriate warning/critical state.
The result of that last command output leads us to the next section –
What Triggered my Warning?
If the check has been defined properly, then it will show which field and which range triggered the warning.
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkservice -a sql -w 0 -w _NumGood=4: -w _Total=5
Output : WARNING - [Triggered by _NumBad>0,_NumGood5] - Found 6 Services(s), 2 OK and 4 with problems (0 excluded). 'Microsoft Search (Exchange)' (msftesql-Exchange) is Stopped, 'SQL Server (SQLEXPRESS)' (MSSQL$SQLEXPRESS) is Running, 'SQL Active Directory Helper Service' (MSSQLServerADHelper100) is Stopped, 'SQL Server Agent (SQLEXPRESS)' (SQLAgent$SQLEXPRESS) is Stopped, 'SQL Server Browser' (SQLBrowser) is Running, 'SQL Server VSS Writer' (SQLWriter) is Stopped.|'Total Service Count'=6;5; 'Service Count OK State'=2;4; 'Service Count Problem State'=4;0; 'Excluded Service Count'=0;
Notice how the plugin output includes the red text: [Triggered by FIELD~VALUE]
This lists all the warning criteria that were triggered.
This becomes more important when there are multiple fields of data in the output and/or multiple warning/critical criteria.
Multipliers
Checkservices is not the best example to use here (since the numbers are so low), but we will do it anyway, so that we keep the example checks consistently using checkservice and “sql”.
Lets say we wanted to check if we had more than 1000 “bad” services. We could use -w 1000
or we could use the “k” multiplier -w 1k
. In order to ensure that “k” means 1000 we use the command line parameter --bytefactor=1000
.
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkservice -a sql -w 1k --bytefactor=1000
Output : OK - Found 6 Services(s), 2 OK and 4 with problems (0 excluded). 'Microsoft Search (Exchange)' (msftesql-Exchange) is Stopped, 'SQL Server (SQLEXPRESS)' (MSSQL$SQLEXPRESS) is Running, 'SQL Active Directory Helper Service' (MSSQLServerADHelper100) is Stopped, 'SQL Server Agent (SQLEXPRESS)' (SQLAgent$SQLEXPRESS) is Stopped, 'SQL Server Browser' (SQLBrowser) is Running, 'SQL Server VSS Writer' (SQLWriter) is Stopped.|'Total Service Count'=6; 'Service Count OK State'=2; 'Service Count Problem State'=4;1000; 'Excluded Service Count'=0;
If we wanted “k” to mean 1024 then we use the command line parameter --bytefactor=1024
or leave it out since that is the default setting.
This one generates if warning if there are less than 1024 “good” services:
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkservice -a sql -w 1k:
Output : WARNING - [Triggered by _NumBad<1k] - Found 6 Services(s), 2 OK and 4 with problems (0 excluded). 'Microsoft Search (Exchange)' (msftesql-Exchange) is Stopped, 'SQL Server (SQLEXPRESS)' (MSSQL$SQLEXPRESS) is Running, 'SQL Active Directory Helper Service' (MSSQLServerADHelper100) is Stopped, 'SQL Server Agent (SQLEXPRESS)' (SQLAgent$SQLEXPRESS) is Stopped, 'SQL Server Browser' (SQLBrowser) is Running, 'SQL Server VSS Writer' (SQLWriter) is Stopped.|'Total Service Count'=6; 'Service Count OK State'=2; 'Service Count Problem State'=4;1024; 'Excluded Service Count'=0;
What About Critical?
So far we have only generated warning states. That’s because we have only used -w
. If you want to specify critical criteria use -c
. If you have both -w
and -c
on the same command line and if the critical criteria and the warning criteria are both triggered the plugin will exit with a critical state.
For example, if we wanted to go to a critical state if there was more than 1 “bad” service (-c 0
), but only a warning state if there was less than 4 “good” services (-w _NumGood=4:
) or more than 5 total services (-w _Total=5
) we do:
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkservice -a sql -c 0 -w _NumGood=4: -w _Total=5
Output : CRITICAL - [Triggered by _NumBad>0] - Found 6 Services(s), 2 OK and 4 with problems (0 excluded). 'Microsoft Search (Exchange)' (msftesql-Exchange) is Stopped, 'SQL Server (SQLEXPRESS)' (MSSQL$SQLEXPRESS) is Running, 'SQL Active Directory Helper Service' (MSSQLServerADHelper100) is Stopped, 'SQL Server Agent (SQLEXPRESS)' (SQLAgent$SQLEXPRESS) is Stopped, 'SQL Server Browser' (SQLBrowser) is Running, 'SQL Server VSS Writer' (SQLWriter) is Stopped.|'Total Service Count'=6;5; 'Service Count OK State'=2;4; 'Service Count Problem State'=4;0; 'Excluded Service Count'=0;
In this case only the critical triggers are listed.
Can you show me some example command lines?
Table of Contents
The checks shown on this page are generated using the –iexample=1 parameter.
A valid -H HOSTNAME -u USER and -p PASSWORD must also be passed on the command line.
The examples are run against a machine (which is not very busy) running Windows Server 2008 R2, IIS v7, SQL Express 2008 and Exchange 2010.
Lets Start Easy
Show the version and basic command line help
Command: check_wmi_plus.pl -H HOST -u USER -p PASS --version
Output : Version: 1.51
Some of the following commands need at least 2 WMI data samples. If the command output shows Collecting first WMI sample because the previous state data file (/tmp/cwpss_somefilename.state) contained no data. Results will be shown the next time the plugin runs.
then you need to run the command a second time to see the output.
The plugin output is colour coded as follows:
Plugin display output
Warning/Critical trigger information
Performance Data
Check CPU utilisation
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkcpu
Output : OK (Sample Period 58 sec) - Average CPU Utilisation 3.10%|'Avg CPU Utilisation'=3.10%;
If you take a look at the –help output for checkcpu you can see that the only valid Warning/Critical Field is _AvgCPU, so you do not even need to specify it. So the command for going warning above 1% and critical above 90% is:
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkcpu -w 1 -c 90
Output : WARNING (Sample Period 0 sec) - [Triggered by _AvgCPU>1] - Average CPU Utilisation 7.69%|'Avg CPU Utilisation'=7.69%;1;90;
The Built-in Checks
Check the CPU Queue
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkcpuq
Output : OK - Average CPU Queue Length 0.3 (3 points with 1 sec delay gives values: 0, 1, 0)|'Avg CPU Queue Length'=0.3;
Check the CPU Queue 5 times as fast as possible (0 seconds apart)
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkcpuq -a 5 -y 0
Output : OK - Average CPU Queue Length 0.6 (5 points with 0 sec delay gives values: 1, 0, 0, 1, 1)|'Avg CPU Queue Length'=0.6;
Check the drive size of C:
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkdrivesize -a c:
Output : OK - C: Total=99.90GB, Used=13.09GB (13.1%), Free=86.81GB (86.9%) |'C: Space'=13.09GB; 'C: Utilisation'=13.1%;
Check the drive size of all drives, use volume names and include an overall total
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkdrivesize -o 1 -3 1
Output : OK - Overall Disk Total=99.90GB, Used=13.09GB (13.1%), Free=86.81GB (86.9%) |'Overall Disk Space'=13.09GB; 'Overall Disk Utilisation'=13.1%;
Check the system event log for the last 1 hour for warnings and errors
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkeventlog
Output : OK - 0 event(s) of at least Severity Level "Error", were recorded in the last 1 hours from the System Event Log.|'Event Count'=0;
Check the Application event log for errors only (hence the -o 2) for the past 4 hours
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkeventlog -a application -o 2 -3 4
Output : OK - 0 event(s) of at least Severity Level "Warning", were recorded in the last 4 hours from the application Event Log.|'Event Count'=0;
Check the file age of c:/pagefile.sys and warn if it is older 10 minutes, go critical if less than 30 minutes
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkfileage -a c:/pagefile.sys -w 10min -c 30min:
Output : WARNING - [Triggered by _FileAge>10min] - Age of File c:/pagefile.sys is 1236 days 09:00:26 (1780380min) or 29673.01hr(s).|'c:/pagefile.sys Age'=29673.01hr;0.166666666666667;0.5;
Check the size of c:/pagefile.sys, warn if it is more than 1500MB and go critical if larger than 2GB
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkfilesize -a c:/pagefile.sys -w 1500m -c 2g
Output : OK - File c:/pagefile.sys is 1.000GB. Found 1 instance(s).|'c:/pagefile.sys Size'=1073741824bytes;1572864000;2147483648; 'File Count'=1;
show the size of the files in c:/ (do not include sub directories)
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkfoldersize -a c:/
Output : OK - Folder c:/ is 1.000GB. Found 2 files(s). (List is on next line)|'c:/ Size'=1073792186bytes; 'File Count'=2; The file(s) found are c:\pagefile.sys
c:\wmiexplorer.ps1
Check the RAM utilisation
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkmem
Output : OK - Physical Memory: Total: 0.976GB - Used: 0.897GB (92%) - Free: 0.079GB (8%)|'Physical Memory Used'=963153920Bytes; 'Physical Memory Utilisation'=92%;
Check the RAM utilisation, warn if more than 40% utilised, go critical of more than 90%
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkmem -w 40 -c 90
Output : CRITICAL - [Triggered by _MemUsed%>90] - Physical Memory: Total: 0.976GB - Used: 0.897GB (92%) - Free: 0.079GB (8%)|'Physical Memory Used'=963174400Bytes; 'Physical Memory Utilisation'=92%;40;90;
Check the RAM utilisation, warn if less than 70% is free
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkmem -w _MemFree%=70:
Output : WARNING - [Triggered by _MemFree%<70] - Physical Memory: Total: 0.976GB - Used: 0.897GB (92%) - Free: 0.079GB (8%)|'Physical Memory Used'=963178496Bytes; 'Physical Memory Utilisation'=92%;
List valid network adapters for checknetwork
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checknetwork
Output : No Network Interfaces specified. Valid Interface Names are:
Intel[R] PRO_1000 MT Network Connection, LAN0, (192.168.3.201,fe80::79ad:8819:1156:9eaf), 00:0C:29:7C:1D:BC
Intel[R] PRO_1000 MT Network Connection _2, LAN1, (10.1.0.1,fe80::905c:856d:aa31:8fff), 00:0C:29:7C:1D:C6
isatap.lambert.rd.to, , ,
Teredo Tunneling Pseudo-Interface, , ,
isatap.{53B0E612-FFAF-4126-AFC5-A5322389AD44}, , ,
Specify the -a parameter with an adapter name. Use ' ' around the adapter name.
Check the network stats for the ‘LAN0’ interface (might not be valid for your system).
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checknetwork -a LAN0
Output : OK (Sample Period 59 sec) - Number of Interfaces=1. Interface Details - OK - Interface:LAN0, IP Address:(192.168.3.201,fe80::79ad:8819:1156:9eaf), MAC Address 00:0C:29:7C:1D:BC, Speed:1.000Gbit/s, DHCPEnabled=True, Byte Send Rate=0.018MB/sec, Byte Receive Rate=5.159KB/sec, Packet Send Rate=29.000packet/sec, Packet Receive Rate=37.000packet/sec, Output Queue Length=0, Packets Received Errors=0 |'LAN0 BytesSentPersec'=18143; 'LAN0 BytesReceivedPersec'=5159; 'LAN0 PacketsSentPersec'=29; 'LAN0 PacketsReceivedPersec'=37; 'LAN0 OutputQueueLength'=0; 'LAN0 PacketsReceivedErrors'=0;
Check the size of all page files using automatic warning and critical settings
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkpage -a auto -o .
Output : Overall Status - OK. Individual Page Files Detail: OK - C:\pagefile.sys Total: 1.000GB - Used: 0.153GB (15%) - Free: 0.847GB (85%), Peak Used: 0.168GB (17%) - Peak Free: 0.832GB (83%) |'C:\pagefile.sys Page File Size'=1073741824Bytes; 'C:\pagefile.sys Used'=164626432Bytes; 'C:\pagefile.sys Utilisation'=15%; 'C:\pagefile.sys Peak Used'=180355072Bytes; 'C:\pagefile.sys Peak Utilisation'=17%;
Check for all the processes whose Name matches svchost
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkprocess -a svchost
Output : OK - Found 16 Instance(s) of "svchost" running (0 excluded). (List is on next line)|'Process Count'=16; 'Excluded Process Count'=0; The process(es) found are 15x svchost.exe, SMSvcHost.exe
Check for all the processes whose Name matches svchost, display the full Commandline and warn if there are more than 4 of them
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkprocess -a svchost -o comm -w 4
Output : WARNING - [Triggered by _ItemCount>4] - Found 16 Instance(s) of "svchost" running (0 excluded). (List is on next line)|'Process Count'=16;4; 'Excluded Process Count'=0; The process(es) found are C:\Windows\system32\svchost.exe -k DcomLaunch, C:\Windows\system32\svchost.exe -k RPCSS, C:\Windows\System32\svchost.exe -k LocalServiceNetworkRestricted, C:\Windows\system32\svchost.exe -k netsvcs, C:\Windows\system32\svchost.exe -k LocalService, C:\Windows\System32\svchost.exe -k LocalSystemNetworkRestricted, C:\Windows\system32\svchost.exe -k NetworkService, C:\Windows\system32\svchost.exe -k LocalServiceNoNetwork, C:\Windows\system32\svchost.exe -k apphost, C:\Windows\system32\svchost.exe -k DHCPServer, "C:\Windows\Microsoft.NET\Framework64\v3.0\Windows Communication Foundation\SMSvcHost.exe", C:\Windows\system32\svchost.exe -k regsvc, C:\Windows\system32\svchost.exe -k iissvcs, C:\Windows\System32\svchost.exe -k termsvcs, C:\Windows\system32\svchost.exe -k NetworkServiceNetworkRestricted, C:\Windows\system32\svchost.exe -k LocalServiceAndNoImpersonation
Check for all the processes whose Commandline matches C:/Windows/system32/svchost.exe, display the full Commandline and exclude any of them that contain the string ‘serv’
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkprocess -s comm -a C:/Windows/system32/svchost.exe -o comm -3 serv
Output : OK - Found 8 Instance(s) of "C:/Windows/system32/svchost.exe" running (7 excluded). (List is on next line)|'Process Count'=8; 'Excluded Process Count'=7; The process(es) found are C:\Windows\system32\svchost.exe -k DcomLaunch, C:\Windows\system32\svchost.exe -k RPCSS, C:\Windows\system32\svchost.exe -k netsvcs, C:\Windows\System32\svchost.exe -k LocalSystemNetworkRestricted, C:\Windows\system32\svchost.exe -k apphost, C:\Windows\system32\svchost.exe -k regsvc, C:\Windows\system32\svchost.exe -k iissvcs, C:\Windows\System32\svchost.exe -k termsvcs
Check the all automatially started services are running OK. Warn if there is more than zero not OK, go critical if there is more than 1 not OK
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkservice -a auto -w 0 -c 1
Output : WARNING - [Triggered by _NumBad>0] - Found 55 Services(s), 54 OK and 1 with problems (0 excluded). 'Windows Licensing Monitoring Service' (WLMS) is Stopped.|'Total Service Count'=55; 'Service Count OK State'=54; 'Service Count Problem State'=1;0;1; 'Excluded Service Count'=0;
Check all services with the string ‘windows’ in the short or long name
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkservice -a windows
Output : OK - Found 20 Services(s), 9 OK and 11 with problems (0 excluded). 'Windows Audio Endpoint Builder' (AudioEndpointBuilder) is Stopped, 'Windows Audio' (AudioSrv) is Stopped, 'Windows Event Log' (eventlog) is Running, 'Windows Font Cache Service' (FontCache) is Running, 'Windows Presentation Foundation Font Cache 3.0.0.0' (FontCache3.0.0.0) is Stopped, 'Windows CardSpace' (idsvc) is Stopped, 'Windows Firewall' (MpsSvc) is Running, 'Windows Installer' (msiserver) is Stopped, 'Windows Modules Installer' (TrustedInstaller) is Running, 'Windows Time' (W32Time) is Running, 'Windows Process Activation Service' (WAS) is Running, 'Windows Color System' (WcsPlugInService) is Stopped, 'Windows Event Collector' (Wecsvc) is Stopped, 'Windows Error Reporting Service' (WerSvc) is Stopped, 'Windows Management Instrumentation' (Winmgmt) is Running, 'Windows Remote Management (WS-Management)' (WinRM) is Running, 'Windows Licensing Monitoring Service' (WLMS) is Stopped, 'Microsoft Exchange Server Extension for Windows Server Backup' (wsbexchange) is Stopped, 'Windows Update' (wuauserv) is Running, 'Windows Driver Foundation - User-mode Driver Framework' (wudfsvc) is Stopped.|'Total Service Count'=20; 'Service Count OK State'=9; 'Service Count Problem State'=11; 'Excluded Service Count'=0;
Check SMART status of all drives on the system.
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checksmart -H gold
Output : Overall Status - OK - Found 2 Disks(s), 2 OK and 0 failing |'5FB9WFRP_Reallocated_Sector_Count'=2; '5FB9WFRP_Power_On_Hours'=18836; '5FB9WFRP_Power_Cycle_Count'=1532; '5FB9WFRP_Temperature'=48; '5FB9WFRP_Current_Pending_Sector'=2; '5FB9WFRP_Offline_Uncorrectable'=0; 'Disk#1_Reallocated_Sector_Count'=17; 'Disk#1_Power_On_Hours'=23056; 'Disk#1_Power_Cycle_Count'=2289; 'Disk#1_Temperature'=44; 'Disk#1_Current_Pending_Sector'=0; 'Disk#1_Offline_Uncorrectable'=15; OK - Dev#0, ST340810 A SCSI Disk Device, Serial#5FB9WFRP, PredictFailure=False, Temperature=48
OK - Dev#1, Maxtor 6 Y120P0 SCSI Disk Device, Serial#(null), PredictFailure=False, Temperature=44
Check all services with the string ‘windows’ in the short or long name, exclude any that have ‘audio’ in them
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkservice -a windows -o audio
Output : OK - Found 18 Services(s), 9 OK and 9 with problems (2 excluded). 'Windows Event Log' (eventlog) is Running, 'Windows Font Cache Service' (FontCache) is Running, 'Windows Presentation Foundation Font Cache 3.0.0.0' (FontCache3.0.0.0) is Stopped, 'Windows CardSpace' (idsvc) is Stopped, 'Windows Firewall' (MpsSvc) is Running, 'Windows Installer' (msiserver) is Stopped, 'Windows Modules Installer' (TrustedInstaller) is Running, 'Windows Time' (W32Time) is Running, 'Windows Process Activation Service' (WAS) is Running, 'Windows Color System' (WcsPlugInService) is Stopped, 'Windows Event Collector' (Wecsvc) is Stopped, 'Windows Error Reporting Service' (WerSvc) is Stopped, 'Windows Management Instrumentation' (Winmgmt) is Running, 'Windows Remote Management (WS-Management)' (WinRM) is Running, 'Windows Licensing Monitoring Service' (WLMS) is Stopped, 'Microsoft Exchange Server Extension for Windows Server Backup' (wsbexchange) is Stopped, 'Windows Update' (wuauserv) is Running, 'Windows Driver Foundation - User-mode Driver Framework' (wudfsvc) is Stopped.|'Total Service Count'=18; 'Service Count OK State'=9; 'Service Count Problem State'=9; 'Excluded Service Count'=2;
Check the uptime and warn if it is less than 20 minutes, go critical if it is less than 10 minutes (and just so it will always show a warning for this example add -w 1min)
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkuptime -w 10min: -c 20min: -w 1min
Output : WARNING - [Triggered by _UptimeSec>1min] - System Uptime is 00:51:39 (51min).|'Uptime Minutes'=51min;1;20;
Some Example Ini File Checks
Only some of the checks from the ini files have been included. There are lots more.
Check DHCP stats. Warn if the active queue length exceeds 2
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkdhcp -s stats -w ActiveQueueLength=2
Output : OK (Sample Period 60 sec) - _AcksPersec=0.0, ActiveQueueLength=0, ConflictCheckQueueLength=0, Deniedduetomatch=0, Deniedduetononmatch=0, _DeclinesPersec=0.0, _DiscoversPersec=0.0, _OffersPersec=0.0, _PacketsReceivedPersec=0.0, _ReleasesPersec=0.0, _RequestsPersec=0.0|'_AcksPersec'=0.0; 'ActiveQueueLength'=0;2; 'ConflictCheckQueueLength'=0; 'Deniedduetomatch'=0; 'Deniedduetononmatch'=0; '_DeclinesPersec'=0.0; '_DiscoversPersec'=0.0; '_OffersPersec'=0.0; '_PacketsReceivedPersec'=0.0; '_ReleasesPersec'=0.0; '_RequestsPersec'=0.0;
Check the number of DNS A records defined
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkdns -s arecords
Output : OK - Number of DNS A Records=48|'DNS A Record Count'=48;
Check utilisation of each CPU, rather than just the overall total, warn if any of them goes above 5% utilisation
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkeachcpu -w 5
Output : WARNING (Sample Period 59 sec) - [Triggered by _AvgCPU>5] - CPU0=5.2% CPU1=2.5% CPU_Total=3.9% |'Avg Utilisation CPU0'=5.2%;5; 'Avg Utilisation CPU1'=2.5%;5; 'Avg Utilisation CPU_Total'=3.9%;5;
List Exchange DB Instances (needs at least Information Store and Transport services running)
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkexchange -s listDBInstances
Output : Number of Instances=3. DB Instance Names - 'edgetransport/_Total' SessionsInUse=13, 'edgetransport/Transport Mail Database' SessionsInUse=7, 'edgetransport/IP Filtering Database' SessionsInUse=6, |'DB Instance Count'=3; 'edgetransport/_Total Sessions in use'=13; 'edgetransport/Transport Mail Database Sessions in use'=7; 'edgetransport/IP Filtering Database Sessions in use'=6;
Check Exchange stats for any database name ending in _Total
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkexchange -s DBInstances -a %_total
Output : Overall Status - OK (Sample Period 60 sec) - Transport Name="edgetransport/_Total" (OK) - _DatabaseCachePercentHit=0, DatabaseCacheSizeMB=1, _DatabaseCacheRequestsPersec=1, _DatabaseCacheMissesPersec=0, _IODatabaseReadsAverageLatency=0ms, _IODatabaseWritesAverageLatency=0ms, _IOLogReadsAverageLatency=0ms, _IOLogWritesAverageLatency=0ms, _IODatabaseReadsPersec=0, _IODatabaseWritesPersec=0, _IOLogReadsPersec=0, _IOLogWritesPersec=0, _LogBytesWritePersec=0, SessionsInUse=13, _TableOpenCachePercentHit=0, TablesOpen=3, _TableOpensPersec=0|'_DatabaseCachePercentHit'=0; 'DatabaseCacheSizeMB'=1; '_DatabaseCacheRequestsPersec'=1; '_DatabaseCacheMissesPersec'=0; '_IODatabaseReadsAverageLatency'=0ms; '_IODatabaseWritesAverageLatency'=0ms; '_IOLogReadsAverageLatency'=0ms; '_IOLogWritesAverageLatency'=0ms; '_IODatabaseReadsPersec'=0; '_IODatabaseWritesPersec'=0; '_IOLogReadsPersec'=0; '_IOLogWritesPersec'=0; '_LogBytesWritePersec'=0; 'SessionsInUse'=13; '_TableOpenCachePercentHit'=0; 'TablesOpen'=3; '_TableOpensPersec'=0;
Check Exchange SMTP Receive states for all transports (_Total)
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkexchange -s SmtpReceive -a _total
Output : Overall Status - OK (Sample Period 60 sec) - Transport Name="_total" (OK) - _BytesReceivedPersec=0.000/sec, _ConnectionsCreatedPersec=0.000/sec, _DisconnectionsbyAgentsPersecond=0.000/sec, _MessageBytesReceivedPersec=0.000/sec, _MessagesReceivedPersec=0.000/sec, AveragebytesPerconnection_Base=0.000, AveragebytesPermessage=0.000, AveragebytesPermessage_Base=0.000, AveragemessagesPerconnection=0.000, AveragemessagesPerconnection_Base=0.000, AveragerecipientsPermessage=0.000, AveragerecipientsPermessage_Base=0.000, BytesReceivedTotal=0.000, ConnectionsTotal=0.000, Frequency_PerfTime=3.414M, MessageBytesReceivedTotal=0.000, MessagesReceivedTotal=0.000, RecipientsacceptedTotal=0.000, TarpittingDelaysAnonymous=0.000|'_BytesReceivedPersec'=0; '_ConnectionsCreatedPersec'=0; '_DisconnectionsbyAgentsPersecond'=0; '_MessageBytesReceivedPersec'=0; '_MessagesReceivedPersec'=0; 'BytesReceivedTotal'=0; 'MessageBytesReceivedTotal'=0; 'MessagesReceivedTotal'=0; 'RecipientsacceptedTotal'=0;
Check IIS Connection stats for all web sites
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkiis -s connections -a _total
Output : OK (Sample Period 60 sec) - Site Name="_Total", CurrentConnections=0.000, _ConnectionAttemptsPersec=0.000/sec|'CurrentConnections'=0; '_ConnectionAttemptsPersec'=0;
Check IIS Request stats and warn if the POST Requests per second exceeds 10 for the testsite
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkiis -s requests -a TestSite -w _PostRequestsPersec=10
Output : OK (Sample Period 60 sec) - Site Name="TESTSite", _GetRequestsPersec=0.000/sec, _HeadRequestsPersec=0.000/sec, _PostRequestsPersec=0.000/sec, _PropfindRequestsPersec=0.000/sec, _PutRequestsPersec=0.000/sec, _ISAPIExtensionRequestsPersec=0.000/sec, TotalGetRequests=0.000, TotalHeadRequests=0.000, TotalPostRequests=0.000, TotalPropfindRequests=0.000, TotalPutRequests=0.000, TotalISAPIExtensionRequests=0.000|'_GetRequestsPersec'=0; '_HeadRequestsPersec'=0; '_PostRequestsPersec'=0;10; '_PropfindRequestsPersec'=0; '_PutRequestsPersec'=0; '_ISAPIExtensionRequestsPersec'=0; 'TotalGetRequests'=0; 'TotalHeadRequests'=0; 'TotalPostRequests'=0; 'TotalPropfindRequests'=0; 'TotalPutRequests'=0; 'TotalISAPIExtensionRequests'=0;
Check the IO of the logical drive C:, warn if the current disk queue length is more than 10
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkio -s logical -a c: -w CurrentDiskQueueLength=10
Output : Overall Status - OK (Sample Period 59 sec) - Logical Drive Name="C:" (OK) - _PercentIdleTime=90%, _PercentBusyTime=10%, _PercentDiskTime=14%, _PercentDiskReadTime=0%, _PercentDiskWriteTime=13%, _DiskReadBytesPersec=6.480KB/sec, _DiskReadsPersec=1.000/sec, _DiskWriteBytesPersec=6.062KB/sec, _DiskWritesPersec=1.000/sec, CurrentDiskQueueLength=1, _AvgDiskQueueLength=0.1, _AvgDiskReadQueueLength=0.0, _AvgDiskWriteQueueLength=0.1|'_PercentIdleTimeC:'=90; '_PercentBusyTimeC:'=10; '_PercentDiskTimeC:'=14; '_PercentDiskReadTimeC:'=0; '_PercentDiskWriteTimeC:'=13; '_DiskReadBytesPersecC:'=6636; '_DiskReadsPersecC:'=1; '_DiskWriteBytesPersecC:'=6208; '_DiskWritesPersecC:'=1; 'CurrentDiskQueueLengthC:'=1;10; '_AvgDiskQueueLengthC:'=0.1; '_AvgDiskReadQueueLengthC:'=0.0; '_AvgDiskWriteQueueLengthC:'=0.1;
Check the IO of the physical drive C: (may be different to the logical C:)
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkio -s physical -a c:
Output : Overall Status - OK (Sample Period 59 sec) - Physical Drive Name="0 C:" (OK) - _PercentIdleTime=90%, _PercentBusyTime=10%, _PercentDiskTime=14%, _PercentDiskReadTime=0%, _PercentDiskWriteTime=14%, _DiskReadBytesPersec=6.474KB/sec, _DiskReadsPersec=1.000/sec, _DiskWriteBytesPersec=6.192KB/sec, _DiskWritesPersec=1.000/sec, CurrentDiskQueueLength=0, _AvgDiskQueueLength=0.1, _AvgDiskReadQueueLength=0.0, _AvgDiskWriteQueueLength=0.1|'_PercentIdleTime0 C:'=90; '_PercentBusyTime0 C:'=10; '_PercentDiskTime0 C:'=14; '_PercentDiskReadTime0 C:'=0; '_PercentDiskWriteTime0 C:'=14; '_DiskReadBytesPersec0 C:'=6629; '_DiskReadsPersec0 C:'=1; '_DiskWriteBytesPersec0 C:'=6341; '_DiskWritesPersec0 C:'=1; 'CurrentDiskQueueLength0 C:'=0; '_AvgDiskQueueLength0 C:'=0.1; '_AvgDiskReadQueueLength0 C:'=0.0; '_AvgDiskWriteQueueLength0 C:'=0.1;
Check the printer spooler, warn if OutofPaperErrors>0 (There are no printer servers running on this test machine!)
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkprint -s spooler -w OutofPaperErrors=0
Output : WMI Query returned no data. The item you were looking for may NOT exist or the software that creates the WMI Class may not be running, or all data has been excluded.
Check CPU utilisation for some SQL server processes, warn if utilisation is more than 10% or if there are more than 2 processes
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkproc -s cpu -a %sql% -w 10 -w _ItemCount=2
Output : OK (Sample Period 59 sec) - Found 1 Instance(s) of "%sql%" running. CPU_sqlservr(PID=2220)=0.0% |'Process Count'=1;2; 'Avg Utilisation CPU_sqlservr'=0.0%;10;
Check for processes using more than 50% of the CPU. Include all processes with the string ‘serv’. Also warn if there are more than 2 of them found using more than 50%
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkproc -s cpuabove -a %serv% -w 50 -w _ItemCount=2
Output : WARNING (Sample Period 59 sec) - [Triggered by _ItemCount>2] - Total Process Count=5 (Process details on next line)\nWARNING - [Triggered by _ItemCount>2] - CPU for services (PID=468)=0.5%\nOK - CPU for Microsoft.ActiveDirectory.WebServices (PID=1268)=0.0%\nOK - CPU for ismserv (PID=1496)=0.0%\nOK - CPU for sqlservr (PID=2220)=0.0%\nOK - CPU for MSExchangeADTopologyService (PID=3764)=0.0%\n|'Process Count'=5;2;
List the SQL Express DB Instances
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checksql -s listdb -a MSSQLSQLEXPRESS_MSSQLSQLEXPRESS
Output : Overall Status - OK -Number of Databases=6. DB Names - tempdb, msdb, model, mssqlsystemresource, _Total, master,
Check SQL Express cache stats totals (use a different value for -a for SQL Server)
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checksql -s cache -a MSSQLSQLEXPRESS_MSSQLSQLEXPRESS
Output : Overall Status - OK - Cache Type _Total (OK) - CacheHitRatio=0, CacheObjectCounts=0, CacheObjectsinuse=0, CachePages=204pages (Each Page is 8k). |'CacheHitRatio'=0; 'CacheObjectCounts'=0; 'CacheObjectsinuse'=0; 'CachePages'=204pages;
Check SQL Express latch stats (use a different value for -a for SQL Server)
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checksql -s latches -a MSSQLSQLEXPRESS_MSSQLSQLEXPRESS
Output : OK (Sample Period 59 sec) - AverageLatchWaitTimems=5532ms, _LatchWaitsPersec=0, NumberofSuperLatches=0.000, _SuperLatchDemotionsPersec=0.000/sec, _SuperLatchPromotionsPersec=0.000/sec, TotalLatchWaitTimems=5532ms|'AverageLatchWaitTimems'=5532ms; '_LatchWaitsPersec'=0; 'NumberofSuperLatches'=0; '_SuperLatchDemotionsPersec'=0; '_SuperLatchPromotionsPersec'=0; 'TotalLatchWaitTimems'=5532ms;
Check the numbers of Terminal Services sessions
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkts -s sessions
Output : OK - ActiveSessions=1, InactiveSessions=2, TotalSessions=3|'ActiveSessions'=1; 'InactiveSessions'=2; 'TotalSessions'=3;
Check for users that do not require a password and warn if you find some, go OK of none found (–nodatamode)
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m checkusers -s count -a "PasswordRequired!='True'" --nodatamode -w 0
Output : WARNING - [Triggered by _ItemCount>0] - Number of Users=1 - User information shown on next line is: Name(FullName) \n Guest()|'Number of Users'=1;0;
Show the Operating System and Service Pack version, and the installation data. Warn if the installation is older than 2 years
Command: check_wmi_plus.pl -H HOST -u USER -p PASS -m info -s os -w 2yr
Output : WARNING - [Triggered by _InstallSec>2yr] - OS is Microsoft Windows Server 2008 R2 Datacenter , Service Pack 1, Installation Timestamp=20080116193016.000000+660 (1537.8 days old)|'OS Installation Age'=1537.8days;
Support
My Nagios system has high load when running Check WMI Plus
Table of Contents
How high is high load? It may not really be an issue.You don’t want your hardware sitting idle, do you? 🙂
The real problem is if your Nagios check latency is being pushed out.
Check your “Check Latency” on the Nagios Performance Info screen.
With a lot of check_wmi_plus checks running you will be forking a lot of wmic commands
This frequent forking is probably what is creating your high load.
You may be able to alleviate this issue by
- Make sure you are “keeping state”. This is the default and will be used unless you are specifying the
--nokeepstate
option. Keeping state reduces the number of wmic invocations. - Putting more CPU hardware in (Yuk!)
- Checking less often
- Checking fewer things
- Putting wmic and maybe even check_wmi_plus.pl on faster disk (may not be very effective since its probably already in RAM)
- Remove ini files you are not using (or rename them so they do not end in .ini) so that they don’t have to be read everytime the plugin starts. Probably makes hardly any difference, if any.
I’ve made a modification, do you want it?
Table of Contents
Probably.
We normally take a look at your modifications and try and incorporate them into the plugin releases so that other users can benefit from them also.
Also read about requesting new features.
I’ve found a bug in Check WMI Plus
Table of Contents
Send us the details. We’ll try and fix it.
We will want the plugin debug output.
If its a check someone else has helped us develop then we might not have the required test environments, so your help would be appreciated.
How do I get debug output?
Table of Contents
add the -d switch to the command line
add -d -d for even more debug.
We might ask you to do this if you report a bug, request a new feature, or have some kind of problem.
If we ask for the debug output then you’ll need to put the output into a file and send it to us as it can be very long.
Usernames/passwords are automatically masked in debug output. You can check if you like.
How do I create my own check in an ini file?
Table of Contents
You need the Ini File Documentation probably starting with Overview on How to Create an Ini File Check
How do I setup the Windows User for wmic (or what permissions do I need)?
Table of Contents
The Easy (Insecure) Way
Add a new user and add them to the administrators group. Perhaps disable login privileges. Possibly suitable for test environments etc.
A Better Way
Scroll about halfway down this page to the section titled “Configure remote WMI access in Windows”.
**Note that some people report that even after following the above instructions that not all checks work. We have not yet determined a resolution for this problem. Most of the time using an administrators account fixes this problem. If using the administrator account fixes your errors then you almost certainly have a permissions problem.
This page may also be helpful.
The primary relevant content from the op5 site is also available here
Other Permissions Snippets
There have been many suggestions scattered around the Internet on how to setup the permissions for wmic access. Some of them have been reproduced here. Its typically going to come down to try various combinations and see what works for you.
WMI Connection Testing
This standalone executable sometimes gives somewhat more useful error messages to help find the WMI connection/permission issues:
Windows Management Infrastructure
- (Start → Run …) wmimgmt.msc
- Right-click on WMI Control (Local) Properties →
- Security
- → Root CIMV2
- Security
- Add users with the privileges:
- Enable Account
- Remote Enable
Enabling Remote DCOM
- Add the user (s) in question to the Performance Monitor Users group
- Under Services and Applications, bring up the Properties dialog of WMI Control. In the Security tab, highlight the root / CIMV2, click Security; add Performance Monitor Users and enable the options: Enable Account and Remote Enable
- Run dcomcnfg. At Component Services> Computers> My Computer, in the COM Security tab of the Properties dialog click “Edit Limits” for Both Access Permissions and Launch and Activation Permissions. Add Performance Monitor Users and allow remote access, remote launch, and remote activation
- Select Windows Management Instrumentation under Component Services> Computers> My Computer> DCOM Config and give Remote Launch and Remote Activation privileges to performance Users Group.
Allowing NTLM
Run gpedit.msc and configure the following setting:
Forcing NTLMv2
Add the following command line argument to Check WMI Plus to force the use of NTLMv2:
–extrawmicarg “–option=client ntlmv2 auth=Yes”
Make Sure the Account is Active
Using an administrative command prompt
net user USERNAME /active:yes
Windows Firewall
Make sure the Windows firewall will allow wmi connections from the Check WMI Plus client.
If you are not sure if you have a firewall issue, disable the firewall and test. If it works when you disable the firewall but not when the firewall is enabled, then you need to add the correct firewall rule.
Workgroup and Systems that cannot communicate with their Domain
There are some reports that you have to disable UAC if you’re checking against a system that’s either in a workgroup or cannot communicate to a domain (even if joined).
Check checkxxxxxx does not work with Windows/Exchange/SQL etc Version X
Table of Contents
Check out this FAQ Article first.
If it turns out that you don’t have the required WMI classes then
1) you might have a broken machine, and you need to fix it, or
2) maybe we can do a similar check using the WMI Classes you do have available. This becomes a feature request
There was a reported problem where some WMI classes were missing and this was related to a bug in Adobe ColdFusion 9 64 Bit (refer the bug report). The workaround for this problem was
-Stop the WMI service
-Rename %windir%\system32\cfperfmon_9.dll to cfperfmon_9.bak
-Restart the WMI service
Is is possible to monitor Version X of Windows/Exchange/SQL etc
Table of Contents
It might be. It depends on the WMI Classes available. Most of the checks list which versions of Windows or other application they have been tested against, so we only know about the versions we say we have tested against.
If the WMI Classes/fields we need to do the monitoring do not exist then you’ll probably either have to modify the check, rebuild your WMI database, or you just can’t get the information.
Typically the checks are developed and tested against the latest versions of Windows and various applications. Over the various versions the Microsoft or other application providers normally add WMI Classes and fields to provide more data whilst leaving most or all existing classes and fields intact to provide backward compatibility. So normally, older versions just can not provide the same detail via WMI as the newer versions.
If you have an older version eg Exchange 2003 when checkexchange has only been tested against Exchange 2007/2010, then you’ll have to try it. If you get errors like “The target host might not have the required WMI classes installed” then the WMI Classes just don’t exist and that version probably can’t be monitored unless a new check is developed to use the WMI Classes that your version uses.
If you run the check against an older version of the software and you get some data back and other fields back as NO_WMI_DATA then the version of your application just does not provide all the WMI fields. You might be able to modify the check and get some but the chances are that the application provider added some WMI fields as the versions progress and you are simply out of luck.
Try browsing your WMI Classes and compare what you have against what the check needs (see the query= definition in the ini file).
As an example, the check, “checkexchange listDBInstances” defines the WMI query as
query=SELECT Name,SessionsInUse FROM Win32_PerfRawData_ESE_MSExchangeDatabaseInstances
Hence, it is expecting a WMI Class called Win32_PerfRawData_ESE_MSExchangeDatabaseInstances.
Use WMI Explorer to browse the WMI Classes and see if this class actually exists.
If not, well, we can’t get the data using the existing checks to allow “checkexchange listDBInstances” to work.
If it ends up that you have some version we have never tested against and the WMI Classes we use just don’t exist, then it is possible that we might be able to create a similar check. You could try a feature request.
Can you read the –help for me?
Table of Contents
There are two answers to this:
1) Yes – if you buy a support contract 😉
2) No – we don’t have time to read it for you. We expect you to read it for yourself.
If you contact us with a question that is answered by the –help output do not be surprised when we reply saying “Please read the –help”.
Can I request a new feature?
Table of Contents
Yes.
We’ll take a look at it and see what we can do.
We’ll probably ask you to help test it as sometimes we don’t have the required test environment.
We might ask you to be able to browse the WMI classes on your machine, using something like WMI Explorer v2.0 or Wmi Explorer Powershell so be ready for that.
So, only ask if you are prepared to help out.
Can you help me configure Nagios?
Table of Contents
No.
That what all the existing Nagios documentation and mailing lists are for.
We can help you with problems directly with Check WMI Plus.
Pro
How can I see how much CPU Pro is saving me?
Table of Contents
This page assumes that you have Pro installed and functioning correctly.
At a high level, what we are going to run through to get this information is:
- Turn on usage stats
- Collect usage stats whilst Pro’s CPU saving options are enabled
- Collect usage stats whilst Pro’s CPU saving options are disabled
- Compare the stats
Turn on Usage Stats
In the Check WMI Plus Conf file make sure
$collect_usage_info=1;
is set
Now, when Check WMI Plus runs it will log usage stats into the directory specified by the Conf file setting:
$wmi_data_dir
(by default the data directory is is the same directory as the plugin)
into a file called
$usage_db_file
(this is setting in the Conf file)
Check to see that the file exists and is growing as Check WMI Plus runs.
Enable Pro CPU Saving Options
In the Conf file ensure the following settings are set:
$force_wmic_command=0;
$use_compiled_ini_files=1;
Start Collecting Stats
We want to collect usage stats to a nice clean usage file. To do this switch to a new usage file.
check_wmi_plus.pl --logswitch --logsuffix proenabled
(or check_wmi_plus.pl --logswitch --logsuffix prodisabled
if you are performing that test)
The output will tell you the name of the new usage file being used. Remember the name of this file because we will want it later.
Now leave it to collect usage stats. Ideally, you want to collect usages stats from at least 2 complete cycles of all your Check WMI Plus checks. For example, if you have it scheduled across a range of servers every 5 minutes, leave it for at least 10 minutes. The longer you leave this, the more stats you collect, the more accurate your comparison result will be.
Stop Collecting Stats
When you have waited long enough, we want to close off the usage stats file. To do this switch to a new usage file.
check_wmi_plus.pl --logswitch
The output will tell you the name of the new usage file being used. We don’t care about the new file. We care about the usage file you remembered the name of when you first switched to it.
Disable Pro CPU Saving Options
In the Conf file ensure the following settings are set:
$force_wmic_command=1;
$use_compiled_ini_files=0;
Now go back and repeat the starting and stopping of stats collection. This will give you another usage stats file ending with a prodisabled suffix.
Now you can Re-enable the Pro CPU saving options and optionally undo the changes you made to enable the usage stats
Compare the Stats
Now you have 2 usagedb files, one with a suffix proenabled and one with a suffix prodisabled. Run a simple SQL query on each of the files and compare the results.
sqlite3 -header -column
/opt/nagios/bin/plugins/check_wmi_plus.data/check_wmi_plus.usagedb.proenabled
"select count(*) as 'Checks',round(sum(elapsedtime)/count(*),3) as
'Avg Elapsed (sec)',round(sum(totalcpu)/count(*),3) as 'AvgCPU
(sec)',round(julianday(max(timestamp))-julianday(min(timestamp)),4) as
'Days',sum(iniused) as 'Ini Checks' from usage"
And run it again using the .prodisabled file.
From the proenabled stats you should see an output similar to the following:
Checks Avg Elapsed (sec) AvgCPU (sec) Days Ini Checks ---------- ----------------- ------------ ---------- ---------- 123 1.804 0.059 0.0017 52
and from the prodisabled stats you should see an output similar to the following:
Checks Avg Elapsed (sec) AvgCPU (sec) Days Ini Checks ---------- ----------------- ------------ ---------- ---------- 135 5.782 0.207 0.0004 60
You should be able to clearly see an increase in the AvgCPU figure (the average amount of CPU consumed per check) and probably also the Avg Elapsed figures.
I have Pro but my load/CPU has not improved
Table of Contents
Really? Perform the procedure described on this page.
If the comparison you make against the proenabled and prodisabled stats files shows a reduction in CPU when the Pro options are enabled, then your CPU utilisation by Check WMI Plus is working and Pro is doing what it was intended to do.
Using Cached WMI Responses
If you still think your CPU/load has not changed, then it is something else. There is another Check WMI Plus test you can run, but you need to do this other test under controlled circumstances since it will cause Check WMI Plus checks to give errors and hence generate Nagios alerts. The test removes all WMI calls and just uses the last data retrieved by WMI in place of any future WMI calls. Obviously, this makes all responses from Check WMI Plus false and will make any checks that rely on 2 data points generate an error.
The benefit of this test is as follows: WMI calls take a long time. They are most often the longest part of the elapsed time of the plugin execution. The longer the plugin is running the longer it stays in the run queue (whilst waiting for IO) and hence potentially increases what you see as load figures even if CPU is not being consumed. If the plugin does not have to wait for WMI calls it can complete much quicker. This test will enable you to clearly see if the WMI calls are making your load appear high.
To perform this test, make the following change in your Conf file:
$use_cached_wmic_responses=1;
Remember to change the setting back!
Disable Check WMI Plus
Try taking Check WMI Plus out of the equation all together. Rename the plugin, put an exit;
as the second line of the plugin or something similar to make it do nothing.
See what that does to your load/CPU.
Its Still a Problem for Me
If you are no longer running any Check WMI Plus checks and you still have high CPU/Load then it is clearly something else.
You’ll have to eliminate other possibilities just like we eliminated Check WMI Plus.
Capturing a Machine Performance Snapshot
This may be useful to record performance differences as you make changes.
During each of the collection cycles, run “top”, halfway into the top session, press “1” to add CPU details. For each of the top sessions you run, video the screen using your phone, a video camera or preferably some screen grab software. Save the video with a meaningful name. You normally only need a session length of 1 minute or so to get a good feeling of how the system is performing.
How do I purchase Pro?
Table of Contents
See this