Commit 9b82bd9f authored by Alessio Netti's avatar Alessio Netti

Analytics: documentation for Health Checker plugin

parent c8671269
......@@ -30,6 +30,7 @@
4. [Sink Plugins](#sinkplugins)
1. [File Sink Plugin](#filesinkPlugin)
2. [Cooling Control Plugin](#coolingcontrolPlugin)
3. [Health Checker Plugin](#healthcheckerPlugin)
5. [Writing Plugins](#writingPlugins)
### Additional Resources
......@@ -1018,6 +1019,33 @@ Finally, the plugin supports the following REST API actions:
|:----- |:----------- |
| status | Displays the temperature settings currently in use, as well as the cooling strategy.
## Health Checker Plugin <a name="healthcheckerPlugin"></a>
The _Health Checker_ plugin allows to monitor sensors and raise alarms if anomalous conditions are detected - these are threshold-based, and can be defined on a per-sensor basis.
Whenever an alarm is raised, the plugin can execute arbitrary scripts and programs to respond accordingly: for example, this can be used to send automatic emails to system administrators. The following is an example of alarm produced by the Health Checker plugin:
[11:45:55] <warning>: The following alarm conditions were detected by the DCDB Health Checker plugin:
- Sensor /system/node1/power is not providing any data.
- Sensor /system/node1/temp has a reading greater than threshold 95000.
Operators in this plugin cannot have any output sensors. The operators themselves provide the following configuration parameters:
| Value | Explanation |
|:----- |:----------- |
| window | Length in milliseconds of the time window that is used to query sensors. Defaults to 0.
| command | Command to be executed when an alarm is raised. This must contain the _%s_ marker, which is replaced at runtime with a descriptive text of the current alarm. The command is executed in a shell environment (similarly to _popen_) and thus can contain most typical shell constructs (e.g., pipes or re-directs). Default is none.
| log | Boolean. If true, whenever an alarm is raised the event is written to the standard DCDB log on top of being transmitted to the external command. Default is true.
| cooldown | Length in milliseconds of a _cooldown_ time window, in which a given alarm cannot be raised again a second time. Defaults to 0.
Sensors in the Health Checker plugin support the following parameters:
| Value | Explanation |
|:----- |:----------- |
| condition | Condition at which an alarm is raised for a given sensor, with respect to a certain threshold. This can be _above_ (readings are greater than the threshold), _below_ (readings are smaller than the threshold), _equals_ (readings are equal to the threshold) or _exists_ (sensor data must be available at all times). Only one condition can be defined per sensor.
| threshold | Numerical value that is used to verify the condition explained by the _condition_ parameter.
# Writing Wintermute Plugins <a name="writingPlugins"></a>
Generating a DCDB Wintermute plugin requires implementing a _Operator_ and _Configurator_ class which contain all logic
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment