Check WMI Plus Pro to improve performance and functionality

Check WMI Plus Pro improves performance in a couple of different ways (first 3 items in the feature list) as well as some functional improvements:

Pro Features

  • It changes wmic from a process that is forked to a library that is called from Perl. This reduces CPU load due to process forking. On our test machine the plugin uses around 20-30ms of CPU time per plugin wmi call (some checks perform multiple WMI calls). With the Pro version installed, the same checks use 10-15ms of CPU time. That's an improvement worth having when your plugin is only using 50ms of CPU overall. This improvement affect all checks.
  • Ini performance improvements. This change sees a massive improvement in the performance when using ini file based checks. On our test machine the plugin uses around 320-380ms of CPU time per plugin invocation. With the Pro version installed, the same checks use 45-56ms of CPU time. That's a massive improvement that will make a major difference to the CPU utilisation of your Nagios server depending on how many ini based checks you are using.
  • Helper and Join query caching. By default and helper and join queries are cached, this can significantly reduce the number of WMI queries performed, hence making the plugin faster. Typically, helper and join data is relatively static and may only ever need to be queried infrequently. The --helperexpiry and --joinexpiry options allow you to change the default cache time.
  • Ability to use Helper Queries in ini-based checks. Helper queries allow you to defined checks which use WMI data (other than the primary) query and make it available to be used in calculations. Some examples of helper queries might be ones that obtain the total number of cores available, the total number of user logged in, the total number of drives in the Windows machine. This data can then be used in calculations that require it. The ini check [checkproc cpubycore] shows a full example on how to obtain and use helper query data.
  • Ability to check disk quotas using the checkquota mode. This allows you to check for users that have exceeded or are about to exceed quotas for any combination of drives. Specify which drives/users to evaluate using --includedata/--excludedata.
  • Ability to check file shares using the checkshare mode. This allows you to check for changes to shares or if any shares are added/deleted eg any share added to C:. Specify which shares to evaluate using --includedata/--excludedata.
  • Ability to check users defined in groups. Alert on changes to users in groups. Specify which groups/users to evaluate using --includedata/--excludedata.
  • Ability to check logons ie currently logged in users using checklogon. Alert on changes to logons and/or various logon attributes eg Interactive logons. Specify which logons to evaluate using --includedata/--excludedata.
  • Ability to check startup commands using the checkstartupcommand mode. This allows you to check for changes to start up command or if any commands are added/deleted or other settings eg Users. Specify which startup commands to evaluate using --includedata/--excludedata.
  • Ability to check print queues using checkprintjob mode. Alert if the job queue does not change between invocations. Alert if the job queue goes above a specific number of jobs and other settings. Specify which print jobs to evaluate using --includedata/--excludedata.
  • Ability to check user accounts using checkuseraccount. Alert on changes to the accounts, if an account becomes locked out, disabled, pass does not expire etc. Specify which accounts to evaluate using --includedata/--excludedata.
  • Ability to check groups using checkgroup. Alert on changes to groups and/or various group attributes. Specify which groups to evaluate using --includedata/--excludedata.
  • Collect usage stats for later analysis. Over a 10 day period, these sample services with a 5 minute normal check interval performed around 96,000 checks. The usage stats file was 16.7MB in size.
  • Specify warning and critical criteria as regular expressions. This extends to --includedata and --excludedata since they use the same format as warning and critical specifications. This can be useful for testing for strings, multiple discreet values (eg testing for 1007 or 1012 or 1015 but not the values in between) and various other things eg if values such as True or False are returned by a WMI field.
  • Use variable substitutions. This allows translation of strings to other strings for some command line operations and display purposes. For example, for checkeventlog this can translate eventcodes eg 7036 to more meaningful words. Also used to translate timezone numbers eg +600 to timezone words eg AEST. Also for checkeventlog, allows you to specify record types by name not number eg use -o Info instead of -o 3. Can be used to allow use of words (instead of numbers) in critical and warning specifications eg -c SYSTEMDOWN instead of -c 4583. Some samples included.
  • Ability to modify the normal exit code of the plugin based on a specification.
  • Ability to use --includedata and --excludedata on specific internal checks (versus just ini file checks). This will be especially useful for modes like checkdrivesize where you can specify all drives with "-a ." and then exclude specific drives eg "--excludedata DeviceID=~'F:|e:'".
  • Ability to control the way event are included and excluded. By default any matching events will include/exclude an event. Added a new mode 'includeall' where all match specifications have to be met for the event log records to be included. This allows precise selection of event log records.
  • Show some plugin performance stats at the end of each plugin output. The output for checkcpu looks like:
    OK (Sample Period 300 sec) - Average CPU Utilisation 22.40%

    Plugin Elapsed
    [Preparation: 0.000s]
    [wmic: 0.256s]
    Plugin Elapsed: 0.349s
    CPU - User:0.015997 (0.011998+0.003999) System:0.015996 (0.006998+0.008998) MyTotal:0.018996 ChildTotal:0.012997 Total:0.031993 Elapsed:0.349 WMIC_Calls:1 WMIC_Library_Calls:0

Running these sample services over about a 26 hour period with a 5 minute normal check interval (about 40% are ini-based checks) generated the following comparison between using the ini performance improvements and not using them (figures shown using an SQL query against the usage stats):

Pro  Checks      Avg Elapsed (sec)  AvgCPU (sec)  Days        Ini Checks
---  ----------  -----------------  ------------  ----------  ----------
No   9825        0.68               0.182         1.1         4120     
Yes* 7115        0.574              0.068         0.794       2982
Yes  10981       0.576              0.057         1.23        4604      

*Compiled ini option only, WMI Client Library disabled

Table generated using the command line:sqlite3 -header -column /opt/nagios/bin/plugins/check_wmi_plus.data/check_wmi_plus.usagedb.alternate "select count(*) as 'Checks',round(sum(elapsedtime)/count(*),3) as 'Avg Elapsed (sec)',round(sum(totalcpu)/count(*),3) as 'AvgCPU (sec)',round(julianday(max(timestamp))-julianday(min(timestamp)),2) as 'Days',sum(iniused) as 'Ini Checks' from usage"

Going Pro

The Pro version is a chargeable option and hence is not publicly available. The latest versions of the plugin will automatically use the Pro version if it is installed on your system. It is available with a support contract. The support contract provides free Pro version upgrades for one year.

Before deciding to go Pro you need to understand the following:

  • The Pro version may be of benefit if you are seeing high CPU load. It might not help your "load problem".
  • If your load average is less than or equal to the number of total cores available to your system, then you don't have a load problem.
  • We are of the opinion that even if your load is higher than your number of cores, what really matters (or it is a major factor) is your check latency. If your check latency is still good, then Pro may or may not really help you.
  • Check WMI Plus Pro can only reduce the CPU that the plugin uses. It may not be able to reduce the load on your server to something you deem acceptable, but it will reduce the CPU used by the plugin.
  • You are buying Pro so that Check WMI Plus uses less CPU.
  • We can only guarantee the Check WMI Plus Pro reduces the CPU Check WMI Plus consumes. We can not guarantee that it reduces your CPU utilisation or load.

There are a couple of tests you can run to confirm if the load is most likely generated by lots of Check WMI Plus checks.
Try using cached WMI responses and then try disabling Check WMI Plus.
If you try these tests, let us know the results.

If you understand the above and would like to go Pro, then register interest/comments via the Contact Page stating that you understand the above bullet points.

While you are waiting for a response, please prepare a 1 minute top session video (use a phone or screen grab software). Press "1" halfway into the video so that top shows your CPU details. Do this video when the server is busy. We will ask for this video. This will show us if Pro is likely to benefit you.

Once you have received confirmation that we are prepared to sell the Pro version to you please do the following:

  • Download and review the EULA.
  • Send us an email from an appropriately authorised person in your company with their contact details, the EULA attached, saying that they agree to the EULA.
  • Tell us the number of Nagios servers you wish to install it on.

Check WMI Plus Pro is $250AUD per Nagios server that it is installed on with a $50AUD/year support contract. The support contract entitles you to free Pro version upgrades for one year.

Sample Usage Stats

Using:sqlite3 -header -column /opt/nagios/bin/plugins/check_wmi_plus.data/check_wmi_plus.usagedb "SELECT mode,count() as count,round(sum(totalcpu)/count(),3) as avgcpu,round(sum(elapsedtime)/count(),3) as avgelapsed,sum(wmiccalls) as WMIC,sum(wmiclibrarycalls) as WMIC_LIB FROM usage GROUP BY mode ORDER BY avgcpu DESC"

mode        count       avgcpu      avgelapsed  WMIC        WMIC_LIB  
----------  ----------  ----------  ----------  ----------  ----------
checkcpuq   8107        0.317       4.259       161762      0          
info        10924       0.08        0.764       10912       3         
checkfilea  13587       0.077       0.48        13576       1         
checkio     5451        0.068       0.43        5447        0         
checkproc   21976       0.068       0.509       21964       2         
checkts     13520       0.068       0.351       13516       0         
checkprint  5524        0.067       0.408       5523        0         
checkeachc  5456        0.066       0.446       5450        0         
test        10907       0.066       0.406       10900       1         
checkcpu    5561        0.058       0.417       5536        21        
checkservi  16561       0.058       0.994       16549       3         
checknetwo  5468        0.057       0.721       6306        0         
checkevent  10905       0.055       0.371       10897       2         
checkproce  16363       0.055       0.416       16347       2         
checkpage   8068        0.054       0.366       8529        0         
checkdrive  5457        0.053       0.553       5452        1         
checkmem    5458        0.053       0.36        5456        0         
checkuptim  5518        0.053       0.357       5513        1         
checkfiles  5454        0.052       0.353       5454        0