MIRA
Error monitoring and Diagnostics


Overview

There are 3 levels of error management.

The last form of error management will be covered below.

DiagnosticsModule

On systems where many modules form a complex application it is always useful to have some kind of runtime error and status reports from the single modules. In order for your module to provide such diagnostic informations in a generic way you can use the DiagnosticsModule class. It can be used to get and set the current status of a module. This status can either be BOOTUP, RECOVER, OK, WARNING or ERROR. Multiple diagnostic modules can be grouped together in a parent module. All diagnostic modules can be registered at the StatusManager. The StatusManager class keeps track of all registered diagnostic modules and combines their status. If each module is OK the overall state is also OK. If at least one module signals a warning the status is WARNING. The same holds true for ERROR. Every diagnostic module can also be in a boot up state no matter what other warning or errors are set. Additionally a module can signal that it is currently recovering from an error using the RECOVER status. To keep things simple and uniform each authority by default provides a StatusManager and implements a DiagnosticsModule that is registered at the StatusManager. So the authority provides all diagnostic functions of a DiagnosticsModule and allows the user to register additional modules in its StatusManager. The overall status of the authority will be a combined one from the authority itself and all registered diagnostic modules. The combined status of every authority is displayed in miracenter.

Status

Each status entry has a mode (BOOTUP, RECOVER, OK, ERROR, WARNING) and a category. Categories are used to differentiate between different warnings and errors. Each category can only contain an error or a warning. The category can also be OK. If all categories are OK there will be no error or warning and the module is working correctly. There is no restriction or predefined set of categories. As it is a string the user can freely chose an appropriate name.

The status of the authority can be set by calling the bootup(), bootupFinished(), recover(), recoverFinished(), warning(), error() or ok() methods.

// signal that we are booting up
authority.bootup("Allocating some memory");
HeavyObject* o = new HeavyObject();
authority.bootup("Initializing memory");
o->init();
// signal that we are operational
authority.bootupFinished();
// one can set an error with a category and a message text
authority.error("Parser", MakeString() << "There was an error at line " << line);
// reset the status of a category by signaling ok status
authority.ok("Parser");
// or reset all errors and warnings in all categories.
authority.ok();

Units are Authorities and inherit the diagnostic functionality. In Units you can call bootup() during the inititalization phase. There is no need to call bootupFinished() as it is done automatically. Also recovery functionality is wrapped in Units (Recovery Mode).

PersistentErrors

Sometimes it is necessary to store errors persistently to be able to monitor the status of an application over time. This can be done using the ErrorService class of the framework. It supports nearly the same interface as the DiagnosticModule class. The ErrorService database can be used for statistics (how often was an error reported). Or if the application was already terminated the error database could give some hints about why it was not working. Also filtering can be done on the database. Another advantage is when displaying errors to the end user - the error text can be translated into multiple languages.

// set an error with a category, a module reporting the error, a translatable text and a message
MIRA_FW.getErrorService()->setError("Parser", "MyModule", "Failed to read file", MakeString() << "There was an error at line " << line);

Watchdog

Some modules operate in a certain interval. If they don't there is probably an error. Such modules can use the watchdog functionality of the DiagnosticsModule. Authorities provide a mechanism to set a heartbeat interval and to reset the watchdog by calling heartbeat.

// we are working in a cyclic interval of one second
void init()
{
authority.setHeartbeatInterval(Duration::seconds(1));
}
void process()
{
// do some operations here that also can fail or block
if ( fail )
return;
// reset the watchdog
authority.heartbeat();
}

If the heartbeat is not called during the specified interval, the status of the module will automatically set to ERROR providing an appropriate error message.