Monitoring Dynamics AX with Nagios

Posted by Simon Butcher on

Since writing about how I monitor Dynamics AX batch processes from a more technical perspective, I have been asked how to set up monitoring for a complete Dynamics AX system.

Many years ago, Scott introduced me to Nagios, an open-source product designed to be able to monitor anything. At the office, we use it to monitor practically everything, and to fulfil Sarbanes-Oxley requirements, such as monitoring temperature or backup reliability. If we could, we'd monitor the coffee machine with Nagios.

Naturally since we've built our monitoring around this system, we use this to monitor our Dynamics AX environment too. Much of the information here could be adapted for monitoring other systems too, but the focus of this article will be simple monitoring for Dynamics AX.

Background

Firstly, there are some fundamentals that need to be covered. We run Nagios on a Linux server (running the Slackware Linux distribution). Nagios, in a nut-shell, simply runs commands designed to return the status and a single line of information per service or host. Nagios can cope with an enormous numbers these hosts, and each host contains many services. A more detailed explanation of core functionality can be found within the Nagios documentation itself.

We monitor availability of hosts (servers, network equipment and network appliances) by pinging, and specific traits of those hosts such as resource utilisation, temperature/health, connectivity, service availability. In many cases this extends to service operation, such as not just connecting to an LDAP service, but performing a query too.

Beyond this, we use Nagios’ ability to determine dependencies to highlight potential problems with other systems when one system becomes unavailable. We've also use the escalation configuration, where by on-going (or critical) faults escalate to sending an SMS to appropriate people using Štěpán Roh’s smsd and a Siemens MC35T GSM modem.

To monitor Windows servers, we use NRPE-NT, the Windows variant of the Nagios Remote Plug-in Executor. This simply runs as a windows service, and securely allows us to execute any service check plug-ins we like. All of these check “plug-ins” we use, including our own, are available on the Nagios Exchange website.

Configuring NRPE-NT for the server

Since the core Nagios system will actually be asking NRPE-NT on the server to execute check commands, let's examine the configuration of NRPE-NT first.

Firstly, you should install NRPE-NT on each server you're monitoring. Unzip the installation set from their website anywhere you like, and from the command-prompt run nrpe_nt -i to install the service. Beyond that, everything is configured in the file “nrpe.cfg” (which you can simply edit with Notepad), but be aware that any changes to this file require you to restart the “Nagios Remote Plugin Executor for NT/W2K” service before they come into effect.

We monitor the server itself using the basic NRPE-NT plug-ins, which allows us to monitor simple resources such as CPU utilisation, memory, disk space, and individual services.

Here is an example basic configuration for a server:

# This is the port number NRPE-NT will listen for connections on
server_port=5666

# If you have multiple addresses, set the listening address here
server_address=172.16.90.34

# For security, I recommend you set this. This defines the
# IP-address of your Nagios server. You can put more than one
# address here, separated by commas.
allowed_hosts=172.16.90.30,172.17.0.30

# This security feature disables accepting command arguments.
# Read the NRPE-NT manual very carefully before enabling this
# as there are security implications.
dont_blame_nrpe=0

# This enables debugging, which can generate a lot of logs.
# In a normal production environment, you would leave this off.
debug=0

# This defines how long NRPE-NT will wait for a command to run.
# If the command has not finished by the time this setting, in
# seconds, has elapsed, then NRPE-NT will kill the check command
# and return a bad status to Nagios.
command_timeout=30

# This is the command 'nt_check_disk_c' which checks the
# available storage disk space on drive 'C:'. We want warnings at
# 70% disk-usage, and critical alerts at 90% disk-usage.
command[nt_check_disk_c]=c:\nrpe_nt\Plugins\diskspace_nrpe_nt.exe c: 70 90

# For an SQL server, you'll need a few more of those lines!
#command[nt_check_disk_d]=c:\nrpe_nt\Plugins\diskspace_nrpe_nt.exe d: 70 90
#command[nt_check_disk_e]=c:\nrpe_nt\Plugins\diskspace_nrpe_nt.exe e: 70 90
# etc...

# This command checks CPU usage. Warnings at 50%, critical alerts
# at 80%. In multiple CPU/core systems, this is the total usage
# from all logical CPUs in the system.
command[nt_cpuload]=c:\nrpe_nt\Plugins\cpuload_nrpe_nt.exe 50 80

# This command checks memory load. Because of cache, it can be
# normal for seemingly high memory use. Because of this, we check
# for warnings at 95% and critical alerts at 99% memory 
# utilisation.
command[nt_memload]=c:\nrpe_nt\Plugins\memload_nrpe_nt.exe 95 99

# I use terminal services to maintain the server, so let's 
# monitor that service. As a convention, I name the command after
# the service's executable name, but the check plug-in looks at
# the name of the service as Windows displays it.
command[nt_service_termsvcs]=c:\nrpe_nt\Plugins\service_nrpe_nt.exe "Terminal Services"

Configuring NRPE-NT for the AOS

Both Axapta 3.0 and Dynamics AX 4.0 can be monitored quite closely. Via NRPE-NT, I monitor the service itself, along with monitoring the number of clients (Axapta 3.0) and sessions (Dynamics AX 4.0). Monitoring client/session counters is done through the wincheck_counter plug-in, which grabs data from Windows Performance Monitor counters.

Note that if you're running Dynamics AX 4.0 and you connect to the server using remote desktop, you won't see any instances available for AX. So long as you can see them in Performance Monitor when directly using the server's console, don't worry as the wincheck_counter plug-in will see them too.

If you're using Axapta 3.0, you can add the following to your NRPE-NT configuration:

# Check for the Axapta 3.0 AOS service
command[nt_service_axaos]=c:\nrpe_nt\Plugins\service_nrpe_nt.exe "Axapta Object Server"

# This check returns the number of clients on the server.
# Here I'm referring to 'LIVE' as the server's instance name,
# with warnings when there are 40-clients, and critical alerts
# at 50-clients. You'll need to change this, obviously.
command[axaos_clients]=c:\nrpe_nt\Plugins\wincheck_counter.exe "Navision Axapta Object Server" -P "Clients" -f "%.0f clients online" -I "LIVE" -w 40 -c 50

If you're using Dynamics AX 4.0, add the following to your NRPE-NT configuration:

# Check for the Dynamics AX 4.0 service. This is for an
# instance called 'LIVE'. The double dollar-signs are
# intentional!
command[nt_service_axaos]=c:\nrpe_nt\Plugins\service_nrpe_nt.exe "Dynamics Server$$01-LIVE"

# This checks the number of sessions online. This includes
# clients, workers, servers, .NET/COM connectors, etc.
# Make sure the instance number matches the instance you're
# monitoring (here it's "01"). This line will generated
# warnings at 40-sessions, and critical alerts at 50-sessions.
command[axaos_clients]=c:\nrpe_nt\Plugins\wincheck_counter.exe "Microsoft Dynamics AX Object Server" -P "ACTIVE SESSIONS" -f "%.0f clients online" -I "01" -w 40 -c 50

Configuring NRPE-NT for the SQL server

Monitoring the SQL server is much of the same. You'll need different services, obviously, but you can also monitor the server for adverse performance conditions if you're adventurous.

Below is an simple example based on a Microsoft SQL Server 2005 installation that just monitors services.

# These will check to make sure the SQL service and the
# SQL agent is running.
command[nt_service_sqlservr]=c:\nrpe_nt\Plugins\service_nrpe_nt.exe "SQL Server (MSSQLSERVER)"
command[nt_service_sqlagent]=c:\nrpe_nt\Plugins\service_nrpe_nt.exe "SQL Server Agent (MSSQLSERVER)"

Configuring NRPE-NT with batch monitoring

To monitor batch processes in Dynamics AX 4.0, you can use my plug-in from my previous article. This requires the .NET Business Connector to be installed, and obviously configured to point to your live environment. For convenience, we run these on the same machine that runs the batch processor.

To make maintenance easier for myself, I name these after the class name in Dynamics AX, but keep in mind that there is a limit of 31–characters for a command's name in NRPE-NT, and any commands with longer names seem to disappear; Also be aware that the names are case-sensitive.

Here's an example, monitoring two of the most common batch jobs:

# Monitor the two batch jobs that update "Alerts" in Dynamics AX. 
# This is based on running them every 5 minutes, therefore if the
# job has stalled we want warnings after 10 minutes (600 seconds)
# and critical alerts after an hour (3600 seconds). We check the
# jobs have run in the company 'foo'.
command[axbatch_EventJobCUD]=c:\nrpe_nt\Plugins\checkdaxbatch.exe EventJobCUD 600 3600 foo
command[axbatch_EventJobDueDate]=c:\nrpe_nt\Plugins\checkdaxbatch.exe EventJobDueDate 600 3600 foo

Configuring Nagios

I won't go into detail in configuring Nagios itself, since Nagios comes with its own manual and example configuration. I will show some basic elements you will need to configure to point you in the right direction.

Initially you'll need some command definitions to make calls to NRPE-NT on your servers. You'll need the standard Nagios plug-ins for the following to work:

# The check command to check via NRPE. The definition
# of $USER1$ is defined in the Nagios example config!
command {
   command_name check_via_nrpe
   command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}

# When checking batch processes, sometimes it can take
# longer than expected. To simplify configuration, this
# command definition can be used which adds a prefix to
# the NRPE-NT command and increases the normal timeout.
command {
   command_name check_axbatch_job
   command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c axbatch_$ARG1$ -t 60
}

# To check the RPC port for Dynamics AX/Axapta, this will
# attempt a simple TCP connection to the standard port.
command {
   command_name check_axapta
   command_line $USER1$/check_tcp -H $HOSTADDRESS$ -p 2712
}

Beyond this, you'll need to define a host definition for each server:

# Define a host.
define host {
   host_name aos_server
   alias AOS Server
   address 172.16.90.34
   # ...
   # Many other definitions are missing. Please RTFM!
}

Each host contains several services which you'll also need to define. I always define several basic checks for the server in NRPE-NT, as you've seen above, so a server will usual contain these definitions:

# Check memory
define service {
   host_name aos_server
   service_description Memory
   check_command check_via_nrpe!nt_memload
}

# Check CPU
define service {
   host_name aos_server
   service_description CPU
   check_command check_via_nrpe!nt_cpuload
}

# Check disk C:
define service {
   host_name aos_server
   service_description Disk C:
   check_command check_via_nrpe!nt_check_disk_c
}

# Check terminal services service
define service {
   host_name aos_server
   service_description RDP
   check_command check_via_nrpe!nt_service_termsvcs
}

Obviously you'll need to extend these definitions or use templates, as described in the Nagios documentation.

Obviously, you can see the correlation between the service name defined in NRPE-NT and the check commands used in Nagios itself. Continue in this manner except for checking the RPC port on the AOS:

# Check AOS TCP/IP connectivity
define service {
   host_name aos_server
   service_description RPC
   check_command check_axapta
}

Obviously for the SQL server, you also continue the same trend, names changed.

Conclusion

Monitoring can become a complex beast, so I strongly recommend you read up on Nagios. Monitoring systems like this is worth the learning-curve and time invested because it's not a great idea to rely on users to report system down-time or potential problems.

Trackbacks

Trackback specific URI for this entry

No Trackbacks

Comments

Display comments as Linear | Threaded

John on :

Simon Butcher on :

Scott v2.0 on :

Simon Butcher on :

Wai Luen on :

Simon Butcher on :

The author does not allow comments to this entry

Add Comment

Name

Homepage

Comment

In reply to

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

Enter the string from the spam-prevention image above:

Textile-formatting allowed

Remember Information?

Sun	Mon	Tue	Wed	Thu	Fri	Sat
← Back	April '25
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30