How to write your own monitors with MonitorAgent

MonitorAgent is an agent that is a part of QBroker project. As an agent, it periodically checks predefined occurrences or listens on some data sources for certain patterns. It is a Java standalone process running as a daemon with a bunch of the monitor objects, well designed for each category of occurrences. An occurrence can be any incident or event generated by applications, hardware, etc. Each of the occurrences is monitored by an instance of a monitor object that is registered with the container, MonitorAgent. The monitor object consists of two components. The first one is the report component which has a method for testing or detecting the occurrence. The result of the report component is also called a report. The second component is for action that evaluates the test result (report) and invokes various actions in case of failures or exceptions. Periodically, MonitorAgent launches the report component and passes the result to the action component. The action component checks the result (report) to determine the priority of the event. It sends an event to a centralized event collector, EventFlow, for further analysis and evaluations. In case of failure, MonitorAgent can invoke the pre-configured actions, such as sending an email alert, launching an action script to restart the service, or logging the errors either to ServiceNow or syslog, according to the pre-configured policies. Actually, it is the customized components that are doing the dirty jobs. MonitorAgent is just a container to run and manage all registered monitor instances. It also provides services such as schedule service, thread pooling, report sharing, centralized repository, dynamic deployment and workflow support. Besides the monitors for reports and actions, instances of MessageFlow can also be running inside MonitorAgent. The integration with MessageFlow provides the support for dynamic monitors, node level event processing and flexible workflows.

If you have a system or an application to monitor, but you can not find any existing monitor components fit to your needs, you may have to write a new monitor to cover the new scenario. Here you have two choices. The first one is to write your own monitor from scratch without following any existing standards required by MonitorAgent. If this is your choice, you do not have to read this documentation, even though you might still find it helpful. The second choice is to implement the two interfaces, MonitorReport and MonitorAction, and have your new monitor integrated with MonitorAgent. This way, you will get a lot of existing features, like dependencies, actiongroups, checkpointing, event, report sharing, dynamic deployment and centralized repository, etc. This documentation is to help you write your own monitors integrated with MonitorAgent.

Another advantage to have your monitor integrated with MonitorAgent is that MonitorAgent provides a gateway for your monitor to participate in a large-scale Enterprise Monitor Network. Actually, even though MonitorAgent is a monitor container and is able to host many monitors, it is just a node in an Enterprise Monitor Network. With number of distributed MonitorAgents, a centralized EventFlow and a configuration repository, it will be easy to set up a monitor site with each MonitorAgent as a node or an agent. With multiple interconnected sites, you have a monitor domain. Each node is an instance of MonitorAgent and is a source of the events. All the events will be correlated and processed by the EventDispatcher on the site. Some of the events may be forwarded or escalated to other sites or its parents according to the subscriptions. With this hierarchical framework, the decisions and the actions can be made on any levels, say, the monitor level, the node level, the site level and the domain level. Here is a conceptual diagram of a Monitor Network.

If you are really interested in MonitorAgent project and want to contribute your own monitor modules to the project, you are very welcomed to join us. All the source code for MonitorAgent are checked in at dmtscm:/cvs under pses/java/om. You are free to check out the source tree as a reference. You are also encouraged to implement any interfaces or extend any classes. Please feel free to copy any existing code with a new name and make your own changes. In case you want to commit any changes to the original source code, please let Yannan Lu or the owners of the code know first. Click javadoc to access API specifications for MonitorAgent and other related classes.

Monitor Design Guidelines

A monitor usually is a piece of program running either standalone or inside some kind of containers, like a cron job. It constantly checks its targets to gather the information on them and takes actions based on the data and rules. This fits well into the well-known design pattern of Model 2 for Web Services, or the Smalltalk MVC (Model-View-Control) framework. For example, Java developers now tend to use JSPs and servlets together (Model 2) instead of JSPs alone (Model 1) to develop a Web Service. JSPs are used to handle business data gathering and presentation. Servlets are used to maintain the internal state and to control the workflow based on the business logic, data and the state information. Similarly, it will be a good practice to split a monitor process into two components, a detection component to gather information on the target and an action component to evaluate the data based on the monitor logic and take actions on them.

MonitorAgent tries to model this dual task characteristics of monitors. It is a monitor container, just like a web server as a web service container. In order to have a monitor to run inside MonitorAgent container, each monitor is required to implement two public Java interfaces, MonitorReport and MonitorAction. MonitorReport is to watch or detect occurrences and generate a report on the target. MonitorAction is to evaluate the report and take actions on the occurrences. The report generated by the detection component will be passed as an opaque object into the action component for evaluation. Unlike the Web Services where JavaBean is used to hold the data shared by JSPs and the servlets, MonitorReport and MonitorAction use Java Map to share the data or the monitor report. This simple implementation makes the data passing much more efficient and also saves a lot of tedious work since the developers do not need to write a bunch of setters and getters any more.

When you decide to write a new monitor, you can design a single Java class to implement both interfaces, or two classes to implement each of them separately. In your implementation, try to code your classes for generic use so that the code can be reused in other places.

JSON Configuration

MonitorAgent is a Java container just like a standalone application server. It manages multiple monitors and invokes their public APIs according to the preconfigured schedules. Therefore, the first-thing-first is to have the container to load the user defined monitors and their configurations.

How does the MonitorAgent learn about the details of the monitor you want? It is not a trivial job to have some Java applications to understand a common configuration file that represents or encapsulates generic hierarchical structures. It would be possible to use Java properties to represent an object. But this is only good for an object with a flat structure. To make this process easier, MonitorAgent uses a special mapper, JSON2Map, to parse a JSON configuration file of a monitor and creates an appropriate Java Map object. This Map object will be passed into the constructor of the desired monitor class to instantiate the object.

In order to have applications to understand the content of the property Map, an one-to-one mapping rule has been defined and used. With the rule and the context, the application will be able to map the content back to its original desired structure. The mapping rule is very simple and straight forward. Here is the rule:

Here is an example: misc/completed_f.json. As you see, Name, ClassName, Site, Step, etc, are all Strings. DependencyGroup is a List with its first member of Dependency, a List, too. The first member of Dependency is a Map that has three String members of Name, URI and Type. From this JSON file, you can easily map it into a Java Map. It will be also easy to access them from the Map by walking through its key list.

Implementation of MonitorReport

MonitorReport is an interface to detect and report on occurrences. The following three public methods are most important in the implementation of MonitorReport:

public Map generateReport(long currentTime) throws Exception
public int getSkippingStatus()
public void destroy()

The method of generateReport() takes a timestamp as the input argument from MonitorAgent's scheduler for the current monitor. It is used as the current timestamp for the occurrence. The method of generateReport() is to detect and report the occurrences at this timestamp. It returns a Map that contains all information about the occurrence. This method will be invoked by MonitorAgent constantly in every heartbeat. So the class should be re-entry safe. As the developer, you also have to decide what information should be included in the report Map. The report Map will be passed to an MonitorAction object for examinations. Please keep in mind that your report may be shared by other action objects. Therefore, you need to put a complete set of information in your report.

The other job of the method, generateReport(), is to check the dependencies. A monitor is allowed to have a group of dependencies. A dependency is just an instance of an MonitorReport. Each dependency has three status: true, false and unknown. With a group of dependencies, they can be logically evaluated in AND or OR relationship. The result of the dependencies will determine the skipping status of the monitor. The method of getSkippingStatus() is used to get the skipping status. MonitorAgent will skip or disable the monitor according to its skipping status. With its skipping status, the monitor can control its own pace or temporarily disable itself according to its dependencies.

The method of destroy() is used to clean up resources at shutdown or reload. The developer is supposed to close all outstanding resources including the native resources in this method.

Here is an example: FileTester.java The most important method to implement is generateReport(long). Please pay attention to the report Map.

Implementation of MonitorAction

MonitorAction is an interface to examine the report on an occurrence and to determine its status and whether to take further actions like sending an event as the escalation, and/or invoking some action scripts, etc. The following three public methods are most important in the implementation of MonitorAction:

public Event performAction(int status, long currentTime, Map report)
public Map checkpoint()
public void restoreFromCheckpoint(Map ckpt)

The method of performAction() takes three input arguments from the MonitorAgent. The first one is the status of the occurrence detection. The second is the current timestamp and the last is the Map as the report of the occurrence. Besides of MonitorReport and MonitorAction, there is a scheduler attached to each monitor. MonitorAgent consults with its scheduler every time to check the status of the monitor. Sometimes, the monitor may be blacked out in some preconfigured timewindows or disabled because of the dependencies. MonitorAgent is passing the status to notify the monitor's action object. The status may take one of the following values: NORMAL, OCCURRED, SLATE, ALATE, ELATE, BLACKOUT, DISABED, EXCEPTION, and BLACKEXCEPTION.

The method of performAction() is actually carrying out two tasks. First, it extracts the raw information from the report Map and evaluates the data according to the preconfigured policies. The evaluation process transforms the raw data into more accurate and operationable end result and loads the end result into an Event object as a message. An Event is an information container, just like a JMS Message or a flat JSON-like Message with the self-explained content. It can be easily transported to remote destinations and can be easily integrated with Message-Driven applications. The second task is to send the event as a message to a centralized repository or other services for notification, presentation, correlation and further actions. If the monitor has the local actions configured, these local actions will also be invoked in the natural order on the same event. That is why this method returns an Event array. Currently, MonitorAgent does not process the returned events. In the future release, some correlation engine may be plugged into the container so that all the events will be correlated and processed on the node level.

Implementation of this method is the most challenge task in the design of a monitor. All the monitor logic and policies should be implemented in this method. The implementation has to be reliable, dependable, flexible and extensible. Try to imagine that if a monitor keeps sending false alarms, people is going to stop trusting and using the monitor. If there are other applications relying on the events from the monitor, inaccurate events may trigger bad actions and cause even worse consequences. Therefore please always remember that no alarm is better than false alarm. Also because of this, we encourage you to focus on the first task, that is, how to maintain the internal state; how to enforce the control policies; and how to implement the evaluation logics; just for more accurate judgements and better decisions. All the complicated actions in the second task should be left with other dedicated Java classes. In fact, there is a dedicated public interface for this purpose. It is EventAction with the method of invokeAction(long currentTime, Event event). Currently, there are many implementations available for something like sending email alert, relaying event, running a script or sysloging, etc. There is also a class of EventActionGroup that encapsulates a group of actions. With EventActionGroup, you just need to add the objects to the configuration, instantiate it in the construction and call its invokeAction method on the event.

The other two methods are for checkpointing. You may or may not need them. If your monitor has a state to maintain, you should implement them to persist the state across reload or restart.

Here is an example: ShortJob.java The most important public method to implement is performAction(int, long, Map). Please pay attention to the way to create the Event object. Another example: ProcessMonitor.java implements both MonitorReport and MonitorAction.

Extension of Monitor

To make the implementations a bit easier, an abstract class, Monitor, has been added to implement both MonitorReport and MonitorAction with only two abstract methods, generateReport() and performAction(). The former is the public method to detect an issue and to generate the report on the issue. The latter is the public method to evaluate the report, to raise events and to invoke actions. Apart from those two methods for implementions, a constructor will need to be implemented to instantiate the monitor.

Working with Event

An Event is a self-described structure message, similar to a JSON message or a JMS MapMessage. It has certain mandatory properties, such as priority, name, site, type, text, etc. It may also has other customized and free-formed properties. The applications may use Event to exchange information. The benefit of Event is for different applications on different platforms to easily parse, match, evaluate, correlate, present and process the information carried by the message.

The primary task of MonitorAction is to transform the raw data from the report into a more operational and interchangeable event. With Event, the details of the occurrence will be able to flow across the network. This mobility allows any monitor to publish its report with Event and allows other applications to subscribe the content based on the their interests.

It is very easy to use Event. Here is the code to create an instance of Event with INFO as its priority and "This is a test" as its text:

  Event event = new Event(Event.INFO, "This is a test");
To add more information to the event, you just need to call its API like follows:
  event.setAttribute("name", "MyTest");
  event.setAttribute("count", String.valueOf(count));
To retrieve the content of an event, there are APIs available:
  int priority = event.getPriority();
  int count = Integer.parseInt(event.getAttribute("count"));
  String text = event.getAttribute("text");
Once the event is fully loaded, you just need to call its API to send it out:
  event.send();
By design, the method of send() will first log the event via log4j. If the remote destination is configured, it will also call the specified transport method to transfer the event to the remote destination. However, please be aware that Event has its size limit due to different implementation on the transport scheme. With syslog, the size limit is 1K. With http, the size limit is 40K. Therefore, you may have to truncate the large message or decompose the large message into pieces.