24.09., 9:00 - 11:00: Due to updates GitLab will be unavailable for some minutes between 09:00 and 11:00.

README.md 46.3 KB
Newer Older
1
# Wintermute, the DCDB Data Analytics Framework
2 3

### Table of contents
4
1. [Introduction](#introduction)
5
2. [DCDB Wintermute](#dcdbanalytics)
6
    1. [Global Configuration](#globalConfiguration)
7
    2. [Operators](#operators)
8 9 10
    	1. [The Sensor Tree](#sensorTree)
    	2. [The Unit System](#unitSystem)
    	3. [Operational Modes](#opModes)
11
		4. [Operator Configuration](#operatorConfiguration)
12 13
			1. [Configuration Syntax](#configSyntax)
			2. [Instantiating Units](#instantiatingUnits)
Alessio Netti's avatar
Alessio Netti committed
14 15 16 17
			3. [Instantiating Hierarchical Units](#instantiatingHUnits)
			4. [Job Operators](#joboperators)
			5. [MQTT Topics](#mqttTopics)
			6. [Pipelining Operators](#pipelining)
18
    3. [Rest API](#restApi)
Micha Müller's avatar
Micha Müller committed
19 20
        1. [List of ressources](#listOfRessources)
        2. [Examples](#restExamples)
21
3. [Plugins](#plugins)
Alessio Netti's avatar
Alessio Netti committed
22 23
	1. [Aggregator Plugin](#averagePlugin)
	2. [Job Aggregator Plugin](#jobaveragePlugin)
24
	3. [Regressor Plugin](#regressorPlugin)
Alessio Netti's avatar
Alessio Netti committed
25 26 27
	4. [Classifier Plugin](#classifierPlugin)
	5. [Clustering Plugin](#clusteringPlugin)
	6. [Tester Plugin](#testerPlugin)
28 29 30
4. [Sink Plugins](#sinkplugins)
	1. [File Sink Plugin](#filesinkPlugin)
	2. [Writing Plugins](#writingPlugins)
31

32 33 34 35
### Additional Resources

* **End-to-end usage example** of the Wintermute framework to perform regression. Can be found in _operators/regressor_.

36
# Introduction <a name="introduction"></a>
37
In this Readme we describe Wintermute, the DCDB Data Analytics framework, and all data abstractions that are associated with it. 
38

39 40 41
# DCDB Wintermute <a name="dcdbanalytics"></a>
The Wintermute framework is built on top of DCDB, and allows to perform data analytics based on sensor data
in a variety of ways. Wintermute can be deployed both in DCDB Pusher and Collect Agent, with some minor
42 43
differences:

44
* **DCDB Pusher**: only sensor data that is sampled locally and that is contained within the sensor cache can be used for
45 46
data analytics. However, this is the preferable way to deploy simple models on a large-scale, as all computation is
performed within compute nodes, dramatically increasing scalability;
47
* **DCDB Collect Agent**: all available sensor data, in the local cache and in the Cassandra database, can be used for data
48
analytics. This approach is preferable for models that require data from multiple sources at once 
49
(e.g., clustering-based anomaly detection), or for models that are deployed in [on-demand](#operatorConfiguration) mode.
50 51

## Global Configuration <a name="globalConfiguration"></a>
Alessio Netti's avatar
Alessio Netti committed
52 53 54 55
Wintermute shares the same configuration structure as DCDB Pusher and Collect Agent, and it can be enabled via the 
respective (i.e., _dcdbpusher.conf_ or _collectagent.conf_) configuration file.  All output sensors of the frameworks 
are therefore affected by configuration parameters described in the global Readme. Additional parameters specific to 
this framework are the following:
56 57 58

| Value | Explanation |
|:----- |:----------- |
Alessio Netti's avatar
Alessio Netti committed
59
| **analytics** | Wrapper structure for the data analytics-specific values.
60
| hierarchy | Space-separated sequence of regular expressions used to infer the local (DCDB Pusher) or global (DCDB Collect Agent) sensor hierarchy. This parameter should be wrapped in quotes to ensure proper parsing. See the Sensor Tree [section](#sensorTree) for more details.
Alessio Netti's avatar
Alessio Netti committed
61
| filter | Regular expression used to filter the set of sensors in the sensor tree. Everything that matches is included, the rest is discarded.
62 63
| jobFilter | Regular expression used to filter the jobs processed by job operators. The expression is applied to all nodes of the job's nodelist to extract certain information (e.g., rack or island).
| jobMatch | String against which the node names filtered through the _jobFilter_ are checked, to determine if a job is to be processed (see this [section](#jobOperators)).
Alessio Netti's avatar
Alessio Netti committed
64
| **operatorPlugins** | Block containing the specification of all data analytics plugin to be instantiated.
65
| plugin _name_ | The plugin name is used to build the corresponding lib-name (e.g. average --> libdcdboperator_average.1.0)
66 67 68 69
| path | Specify the path where the plugin (the shared library) is located. If left empty, DCDB will look in the default lib-directories (usr/lib and friends) for the plugin file.
| config | One can specify a separate config-file (including path to it) for the plugin to use. If not specified, DCDB will look up pluginName.conf (e.g. average.conf) in the same directory where global.conf is located.
| | |

70
## Operators <a name="operators"></a>
Alessio Netti's avatar
Alessio Netti committed
71
Operators are the basic building block in Wintermute. An Operator is instantiated within a plugin, performs a specific
72
task and acts on sets of inputs and outputs called _units_. Operators are functionally equivalent to _sensor groups_
73
in DCDB Pusher, but instead of sampling data, they process such data and output new sensors. Some high-level examples
74
of operators are the following:
75

Alessio Netti's avatar
Alessio Netti committed
76
* An operator that performs time-series regression on a particular input sensor, and outputs its prediction;
77
* An operator that aggregates a series of input sensors, builds feature vectors, and performs machine 
78
learning-based tasks using a supervised model;
79
* An operator that performs clustering-based anomaly detection by using different sets of inputs associated to different
80
compute nodes;
81
* An operator that outputs statistical features related to the time series of a certain input sensor.
82 83

### The Sensor Tree <a name="sensorTree"></a>
84
Before diving into the configuration and instantiation of operators, we introduce the concept of _sensor tree_. A  sensor
85
tree is simply a data structure expressing the hierarchy of sensors that are being sampled; internal nodes express
Alessio Netti's avatar
Alessio Netti committed
86 87
hierarchical entities (e.g., clusters, racks, nodes, cpus), whereas leaf nodes express actual sensors. In DCDB Pusher, 
a sensor tree refers only to the local hierarchy, while in the Collect Agent it can capture the hierarchy of the entire
88 89
system being sampled.

90
A sensor tree is built at initialization time of DCDB Wintermute, and is implemented in the _SensorNavigator_ class. 
Alessio Netti's avatar
Alessio Netti committed
91 92 93 94
By default, if no hierarchy string has been specified in the configuration, the tree is built automatically by assuming 
that each forward slash-separated part of the sensor name expresses a level in the hierarchy. The total depth of the 
tree is thus determined at runtime as well. This is, in most cases, the preferable configuration, as it complies with
the MQTT topic standard, and interprets each sensor name as if it was a path in a file system.
95

Alessio Netti's avatar
Alessio Netti committed
96 97 98
For spacial cases, the _hierarchy_ global configuration parameter can be used to enforce a specific hierarchy, with
a fixed number of levels. More in general, the following could be a set of forward slash-separated sensor names, from
which we can construct a sensor tree with three levels corresponding to racks, nodes and CPUs respectively:
99 100

```
Alessio Netti's avatar
Alessio Netti committed
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
/rack00/status
/rack00/node05/MemFree
/rack00/node05/energy
/rack00/node05/temp
/rack00/node05/cpu00/col_user
/rack00/node05/cpu00/instr
/rack00/node05/cpu00/branch-misses
/rack00/node05/cpu01/col_user
/rack00/node05/cpu01/instr
/rack00/node05/cpu01/branch-misses
/rack02/status
/rack02/node03/MemFree
/rack02/node03/energy
/rack02/node03/temp
/rack02/node03/cpu00/col_user
/rack02/node03/cpu00/instr
/rack02/node03/cpu00/branch-misses
/rack02/node03/cpu01/col_user
/rack02/node03/cpu01/instr
/rack02/node03/cpu01/branch-misses
121 122 123 124 125 126 127 128 129 130
``` 

Each sensor name is interpreted as a path within the sensor tree. Therefore, the _instr_ and _branch-misses_ sensors
will be placed as leaf nodes in the deepest level of the tree, as children of the respective cpu node they belong to.
Such cpu nodes will be in turn children of the nodes they belong to, and so on.

The generated sensor tree can then be used to navigate the sensor hierarchy, and perform actions such as _retrieving
all sensors belonging to a certain node, to a neighbor of a certain node, or to the rack a certain node belongs to_.
Please refer to the documentation of the _SensorNavigator_ class for more details.

Alessio Netti's avatar
Alessio Netti committed
131 132 133
> NOTE &ensp;&ensp;&ensp;&ensp;&ensp; In the example above, if sensor names were formatted in a different format (e.g.,
rackXX.nodeXX.cpuXX.sensorName), we should have defined a hierarchy string explicitly in order to generate a sensor
tree correctly. Such string would be, in this case "_rack\d{2}.  node\d{2}.  cpu\d{2}._". 
134 135

> NOTE 2 &ensp;&ensp;&ensp; Sensor trees are always built from the names of sensors _as they are published_. Therefore,
136
please make sure to use the _-a_ option in DCDB Pusher appropriately, to build sensor names that express the desired hierarchy.
137 138 139


### The Unit System <a name="unitSystem"></a>
140
Each operator operates on one or more _units_. A unit represents an abstract (or physical) entity in the current system that
141 142 143
is the target of analysis. A unit could be, for example, a rack, a node within a rack, a CPU within a node or an entire HPC system.
Units are identified by three components:

Alessio Netti's avatar
Alessio Netti committed
144
* **Name**: The name of this unit, that corresponds to the entity it represents. For example, _/rack02/node03/_ or _/rack00/node05/cpu01/_ could be unit names. A unit must always correspond to an existing internal node in the current sensor tree;
145 146 147 148 149
* **Input**: The set of sensors that constitute the input for analysis conducted on this unit. The sensors must share a hierarchical relationship with the unit: that is, they can either belong to the node represented by this unit, to its subtree, or to one of its ancestors; 
* **Output**: The set of output sensors that are produced from any analysis conducted on this unit. The output sensors are always directly associated with the node represented by the unit.

Units are a way to define _patterns_ in the sensor tree and retrieve sensors that are associated to each other by a 
hierarchical relationship. See the configuration [section](#instantiatingUnits) for more details on how to create
150
templates in order to define units suitable for operators.
151 152

### Operational Modes <a name="opModes"></a>
153
Operators can operate in two different modes:
154

155 156
* **Streaming**: streaming operators perform data analytics online and autonomously, processing incoming sensor data at regular intervals.
The units of streaming operators are completely resolved and instantiated at configuration time. The type of output of streaming
Alessio Netti's avatar
Alessio Netti committed
157 158
operators is identical to that of _sensors_ in DCDB Pusher, which are pushed to a Collect Agent and finally to the Cassandra datastore,
resulting in a time-series representation;
159 160 161
* **On-demand**: on-demand operators do not perform data analytics autonomously, but only when queried by users. Unlike
for streaming operators, the units of on-demand operators are not instantiated at configuration, but only when a query is performed. When 
such an event occurs, the operator verifies that the queried unit belongs to its _unit domain_, and then instantiates it,
162
resolving its inputs and outputs. Then, the unit is stored in a local cache for future re-use. The outputs of a on-demand
163
operator are exposed through the REST API, and are never pushed to the Cassandra database.
164

Alessio Netti's avatar
Alessio Netti committed
165
Use of streaming operators is advised when a time-series-like output is required, whereas on-demand operators are effective
166
when data is required at specific times and for specific purposes, and when the unit domain's size makes the use of streaming
167
operators unfeasible.
168

169
### Operator Configuration <a name="operatorConfiguration"></a>
170 171
Here we describe how to configure and instantiate operators in Wintermute. The configuration scheme is very similar
to that of _sensor groups_ in DCDB Pusher, and a _global_ configuration block can be defined in each plugin configuration
172
file. The following is instead a list of configuration parameters that are available for the operators themselves:
173 174 175

| Value | Explanation |
|:----- |:----------- |
176
| default | Name of the template that must be used to configure this operator.
Alessio Netti's avatar
Alessio Netti committed
177
| interval | Specifies how often (in milliseconds) the operator will be invoked to perform computations, and thus the sampling interval of its output sensors. Only used for operators in _streaming_ mode.
178
| relaxed | If set to _true_, the units of this operator will be instantiated even if some of the respective input sensors do not exist.
179
| delay | Delay in milliseconds to be applied to the interval of the operator. This parameter can be used to tune how operator pipelines work, ensuring that the next computation stage is started only after the previous one has finished.
180
| unitCacheLimit | Defines the maximum size of the unit cache that is used in the on-demand and job modes. Default is 1000.
181
| minValues |   Minimum number of readings that need to be stored in output sensors before these are pushed as MQTT messages. Only used for operators in _streaming_ mode.
Alessio Netti's avatar
Alessio Netti committed
182
| mqttPart |    Part of the MQTT topic associated to this operator. Only used for the _root_ unit or when the _enforceTopics_ flag is set to true (see this [section](#mqttTopics)).
183
| enforceTopics | If set to _true_, mqttPart will be forcibly pre-pended to the MQTT topics of all output sensors in the operator (see this [section](#mqttTopics)). 
184
| sync | If set to _true_, computation will be performed at time intervals synchronized with sensor readings.
Alessio Netti's avatar
Alessio Netti committed
185
| disabled | If set to _true_, the operator will be instantiated but will not be started and will not be available for queries.
186 187
| duplicate | 	If set to _false_, only one operator object will be instantiated. Such operator will perform computation over all units that are instantiated, at every interval, sequentially. If set to _true_, the operator object will be duplicated such that each copy will have one unit associated to it. This allows to exploit parallelism between units, but results in separate models to avoid race conditions.
| streaming |	If set to _true_, the operator will operate in _streaming_ mode, pushing output sensors regularly. If set to _false_, the operator will instead operate in _on-demand_ mode.
Alessio Netti's avatar
Alessio Netti committed
188 189 190
| unitInput | Block of input sensors that must be used to instantiate the units of this operator. These can both be a list of strings, or fully-qualified _Sensor_ blocks containing specific attributes (see DCDB Pusher Readme).
| unitOutput | Block of output sensors that will be associated to this operator. These must be _Sensor_ blocks containing valid MQTT suffixes. Note that the number of output sensors is usually fixed depending on the type of operator.
| globalOutput | Block for _global_ output sensors that are not associated with a specific unit. If this is defined, all units described by the _unitInput_ and _unitOutput_ blocks will be grouped under a hierarchical _root_ unit that contains the output sensors described here.
191 192 193
| | |

#### Configuration Syntax <a name="configSyntax"></a>
Alessio Netti's avatar
Alessio Netti committed
194
In the following we show a sample configuration block for the _Aggregator_ plugin. For the full version, please refer to the
195 196 197
default configuration file in the _config_ directory:

```
Alessio Netti's avatar
Alessio Netti committed
198 199 200 201 202
global {
	mqttprefix /analytics
}

template_aggregator def1 {
203 204 205 206 207 208
interval	1000
minValues	3
duplicate 	false
streaming	true
}

Alessio Netti's avatar
Alessio Netti committed
209
aggregator avg1 {
210
default     def1
211
mqttPart    /avg1
212

Alessio Netti's avatar
Alessio Netti committed
213
	unitInput {
214 215 216 217
		sensor col_user
		sensor MemFree
	}

Alessio Netti's avatar
Alessio Netti committed
218
	unitOutput {
219
		sensor sum {
Alessio Netti's avatar
Alessio Netti committed
220
			operation 	sum
221
			mqttsuffix  /sum
222 223 224
		}

		sensor max {
Alessio Netti's avatar
Alessio Netti committed
225
			operation	maximum
226
			mqttsuffix  /max
227 228 229
		}

		sensor avg {
Alessio Netti's avatar
Alessio Netti committed
230
			operation	average
231
			mqttsuffix  /avg
232 233 234 235 236 237
		}
	}
}
``` 

The configuration shown above uses a template _def1_ for some configuration parameters, which are then applied to the
238
_avg1_ operator. This operator takes the _col_user_ and _MemFree_ sensors as input (which must be available under this name),
Alessio Netti's avatar
Alessio Netti committed
239
 and outputs _sum_, _max_, and _avg_ sensors. In this configuration, the Unit System and sensor hierarchy are not used, 
240 241 242
 and therefore only one generic unit (called the _root_ unit) will be instantiated.

#### Instantiating Units <a name="instantiatingUnits"></a>
Alessio Netti's avatar
Alessio Netti committed
243
Here we propose once again the configuration discussed above, this time making use of the Unit System to abstract from
244 245 246
the specific system being used and simplify configuration. The adjusted configuration block is the following: 

```
Alessio Netti's avatar
Alessio Netti committed
247 248 249 250 251
global {
	mqttprefix /analytics
}

template_aggregator def1 {
252 253 254 255 256 257
interval	1000
minValues	3
duplicate 	false
streaming	true
}

Alessio Netti's avatar
Alessio Netti committed
258
aggregator avg1 {
259
default     def1
260
mqttPart    /avg1
261

Alessio Netti's avatar
Alessio Netti committed
262
	unitInput {
263 264 265 266
		sensor "<bottomup>col_user"
		sensor "<bottomup 1>MemFree"
	}

Alessio Netti's avatar
Alessio Netti committed
267
	unitOutput {
Alessio Netti's avatar
Alessio Netti committed
268
		sensor "<bottomup, filter cpu00>sum" {
Alessio Netti's avatar
Alessio Netti committed
269
			operation 	sum
270
			mqttsuffix  /sum
271 272
		}

Alessio Netti's avatar
Alessio Netti committed
273
		sensor "<bottomup, filter cpu00>max" {
Alessio Netti's avatar
Alessio Netti committed
274
			operation 	maximum
275
			mqttsuffix  /max
276 277
		}

Alessio Netti's avatar
Alessio Netti committed
278
		sensor "<bottomup, filter cpu00>avg" {
Alessio Netti's avatar
Alessio Netti committed
279
			operation 	average
280
			mqttsuffix  /avg
281 282 283 284 285 286
		}
	}
}
``` 

In each sensor declaration, the _< >_ block is a placeholder that will be replaced with the name of the units that will
287
be associated to the operator, thus resolving the sensor names. Such block allows to navigate the current sensor tree,
288 289 290 291 292 293 294 295 296 297 298 299 300 301
and select nodes that will constitute the units. Its syntax is the following:

```
< bottomup|topdown X, filter Y >SENSORNAME 
``` 

The first section specified the _level_ in the sensor tree at which nodes must be selected. _bottomup X_ and _topdown X_
respectively mean _"search X levels up from the deepest level in the sensor tree"_, and _"search X levels down from the 
topmost level in the sensor tree"_. The _X_ numerical value can be omitted as well.

The second section, on the other hand, allows to search the sensor tree _horizontally_. Within the level specified in the
first section of the configuration block, only the nodes whose names match with the regular expression Y will be selected.
This way, we can navigate the current sensor tree both vertically and horizontally, and easily instantiate units starting 
from nodes in the tree. The set of nodes in the current sensor tree that match with the specified configuration block is
302
defined as the _unit domain_ of the operator.
303 304 305

The configuration algorithm then works in two steps:

306
1. The _output_ block of the operator is read, and its unit domain is determined; this implies that all sensors in the 
307 308 309
output block must share the same _< >_ block, and therefore match the same unit domain;
2. For each unit in the domain, its input sensors are identified. We start from the _unit_ node in the sensor tree, and 
navigate to the corresponding sensor node according to its _< >_ block, which identifies its level in the tree. Each 
310
unit, once its inputs and outputs are defined, is then added to the operator.
311 312

According to the sensor tree built in the previous [section](#sensorTree), the configuration above would result in
Alessio Netti's avatar
Alessio Netti committed
313
an operator with the following set of _flat_ units:
314 315

```
Alessio Netti's avatar
Alessio Netti committed
316
/rack00/node05/cpu00/ {
317
	Inputs {
Alessio Netti's avatar
Alessio Netti committed
318 319
		/rack00/node05/cpu00/col_user
		/rack00/node05/MemFree
320 321 322
	}
	
	Outputs {
Alessio Netti's avatar
Alessio Netti committed
323 324 325
		/rack00/node05/cpu00/sum
		/rack00/node05/cpu00/max
		/rack00/node05/cpu00/avg
326 327 328
	}
}

Alessio Netti's avatar
Alessio Netti committed
329
/rack02/node03/cpu00/ {
330
	Inputs {
Alessio Netti's avatar
Alessio Netti committed
331 332
		/rack02/node03/cpu00/col_user
		/rack02/node03/MemFree
333 334 335
	}
                     	
	Outputs {
Alessio Netti's avatar
Alessio Netti committed
336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438
		/rack02/node03/cpu00/sum
		/rack02/node03/cpu00/max
		/rack02/node03/cpu00/avg
	}
}
``` 

#### Instantiating Hierarchical Units <a name="instantiatingHUnits"></a>
A second level of aggregation beyond ordinary units can be obtained by defining sensors in the _globalOutput_ block. In
this case, a series of units will be created like in the previous example, and they will be added as _sub-units_ of a 
top-level _root_ unit, which will have as outputs the sensors defined in the _globalOutput_ block. This type of unit
is called a _hierarchical_ unit.
 
 
Computation for a hierarchical unit is always performed starting from the top-level unit. This means that all sub-units
will be processed sequentially in the same computation interval, and that they cannot be split. However, both the
top-level unit and the respective sub-units are exposed to the outside, and their sensors can be queried. Please
refer to the plugins' documentation to see whether hierarchical units are supported or not.

Recalling the previous example, a hierarchical unit can be constructed with the following configuration:

```
global {
	mqttprefix /analytics
}

template_aggregator def1 {
interval	1000
minValues	3
duplicate 	false
streaming	true
}

aggregator avg1 {
default     def1
mqttPart    /avg1

	unitInput {
		sensor "<bottomup>col_user"
		sensor "<bottomup 1>MemFree"
	}

	unitOutput {
		sensor "<bottomup, filter cpu00>sum" {
			operation 	sum
			mqttsuffix  /sum
		}

		sensor "<bottomup, filter cpu00>max" {
			operation 	maximum
			mqttsuffix  /max
		}

		sensor "<bottomup, filter cpu00>avg" {
			operation 	average
			mqttsuffix  /avg
		}
	}
	
	globalOutput {
		sensor globalSum {
			operation 	sum
			mqttsuffix  /globalSum
		}
    }
}
``` 

Note that hierarchical units can only have global outputs, but not global inputs, as they are meant to perform aggregation
of the results obtained on single sub-units. Such a configuration would result in the following unit structure:

```
__root__ {
	Outputs {
		/analytics/avg1/globalSum
	}
	
	Sub-units {
		/rack00/node05/cpu00/ {
			Inputs {
				/rack00/node05/cpu00/col_user
				/rack00/node05/MemFree
			}
			
			Outputs {
				/rack00/node05/cpu00/sum
				/rack00/node05/cpu00/max
				/rack00/node05/cpu00/avg
			}
		}
		
		/rack02/node03/cpu00/ {
			Inputs {
				/rack02/node03/cpu00/col_user
				/rack02/node03/MemFree
			}
								
			Outputs {
				/rack02/node03/cpu00/sum
				/rack02/node03/cpu00/max
				/rack02/node03/cpu00/avg
			}
		}
439 440 441 442
	}
}
``` 

Alessio Netti's avatar
Alessio Netti committed
443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470
> NOTE&ensp;&ensp;&ensp;&ensp;&ensp;&ensp;&ensp; The _duplicate_ setting has no effect when hierarchical units are used 
(i.e., _globalOutput_ is defined).

> NOTE 2 &ensp;&ensp;&ensp;&ensp;&ensp; As of now the _aggregator_ plugin does not support hierarchical units. The
above is only meant as an example for how hierarchical units can be created in general.

#### Job Operators <a name="joboperators"></a>

_Job Operators_ are a class of operators which act on job-specific data. Such data is structured in _job units_. These
units are _hierarchical_, and work as described previously (see this [section](#instantiatingHUnits)). In particular,
 they are arranged as follows:

* The top unit is associated to the job itself and contains all of the required output sensors (_globalOutput_ block);
* One sub-unit for each node on which the job was running is allocated. Each of these sub-units contains all of the input
sensors that are required at configuration time (_unitInput_ block), along output sensors at the compute node level (_unitOutput_ block).

The computation algorithms driving job operators can then navigate freely this hierarchical unit design according to
their specific needs. Job-level sensors in the top unit do not require Unit System syntax (see this [section](#mqttTopics));
sensors that are defined in sub-units, if supported by the plugin, need however to be at the compute node level in the
current sensor tree, since all of the sub-units are tied to the nodes on which the job was running. This way, 
unit resolution is performed correctly by the Unit System.

Job operators also support the _streaming_ and _on-demand_ modes, which work like the following:

* In **streaming** mode, the job operator will retrieve the list of jobs that were running in the time interval starting
from the last computation to the present; it will then build one job unit for each of them, and subsequently perform computation;
* In **on-demand** mode, users can query a specific job id, for which a job unit is built and computation is performed.

471 472 473 474 475 476 477
A filtering mechanism can also be applied to select which jobs an operator should process. The default filtering policy uses
two parameters: a job _filter_ regular expression and a job _match_ string. When a job first appears in the system, the
job filter regex is applied to all of the node names in its nodelist. This regex could extract, for example, the portion
of the node name that encodes a certain _rack_ or _island_ in an HPC system. Then, frequencies are computed for each filtered
node name, and the mode is computed. If the mode corresponds to the job _match_ string, the job is assigned to the
operator. This policy can be overridden and changed on a per-plugin basis.

Alessio Netti's avatar
Alessio Netti committed
478 479
> NOTE &ensp;&ensp;&ensp;&ensp;&ensp; The _duplicate_ setting does not affect job operators.

Alessio Netti's avatar
Alessio Netti committed
480
> NOTE &ensp;&ensp;&ensp;&ensp;&ensp; In order to get units that operate at the _node_ level, the output sensors in the
Alessio Netti's avatar
Alessio Netti committed
481
configuration discussed earlier should have a unit block in the form of < bottomup 1 >.
Alessio Netti's avatar
Alessio Netti committed
482

483
#### MQTT Topics <a name="mqttTopics"></a>
484
The MQTT topics associated to output sensors of a certain operator are constructed in different ways depending
485 486 487
on the unit they belong to:

* **Root unit**: if the output sensors belong to the _root_ unit, that is, they do not belong to any level in the sensor
488
hierarchy and are uniquely defined, the respective topics are constructed like in DCDB Pusher sensors, by concatenating
Alessio Netti's avatar
Alessio Netti committed
489 490
the MQTT prefix, operator part and sensor suffix that are defined. The same happens for sensors defined in the _globalOutput_
block, which are part of the top level in a hierarchical unit, which also corresponds to the _root_ unit;
Alessio Netti's avatar
Alessio Netti committed
491 492
* **Job unit**: if the output sensors belong to a _job_ unit in a job operator (see below), the MQTT topic is constructed
by concatenating the MQTT prefix, the operator part, a job suffix (e.g., /job1334) and finally the sensor suffix;
493 494
* **Any other unit**: if the output sensor belongs to any other unit in the sensor tree, its MQTT topic is constructed
by concatenating the MQTT prefix associated to the unit (which is defined as _the portion of the MQTT topic shared by all sensors
Alessio Netti's avatar
Alessio Netti committed
495
belonging to such unit_) and the sensor suffix.
496

Alessio Netti's avatar
Alessio Netti committed
497 498 499 500 501 502 503 504 505 506 507 508 509 510 511
Even for units belonging to the last category, we can enforce arbitrary MQTT topics by enabling the _enforceTopics_ flag.
Using this, the MQTT prefix and operator part are pre-pended to the unit name and sensor suffix. This is enforced also
in the sub-units of hierarchical units (e.g., for job operators). Recalling the example above, this would lead to the following result:

``` 
MQTT Prefix	/analytics
MQTT Part	/avg1
Unit 		/rack02/node03/cpu00/

Without enforceTopics:
	/rack02/node03/cpu00/sum
With enforceTopics:
	/analytics/avg1/rack02/node03/cpu00/sum
``` 

512 513
Like ordinary sensors in DCDB Pusher, also operator output sensors can be published via the _auto-publish_ feature, and metadata can be specified for them. For more details, refer to the README of DCDB Pusher.

514
#### Pipelining Operators <a name="pipelining"></a>
515

516 517 518
The inputs and outputs of streaming operators can be chained so as to form a processing pipeline. To enable this, users
need to configure operators by enabling the _relaxed_ configuration parameter, and by selecting as input the output sensors
of other operators. This is necessary as the operators are instantiated sequentially at startup, and
519 520
the framework cannot infer the correct order of initialization so as to resolve all dependencies transparently.

521
> NOTE &ensp;&ensp;&ensp;&ensp;&ensp; This feature is not supported when using operators in _on demand_ mode.
522 523

## Rest API <a name="restApi"></a>
524 525
Wintermute provides a REST API that can be used to perform various management operations on the framework. The 
API is functionally identical to that of DCDB Pusher, and is hosted at the same address. All requests that are targeted
526 527
at the data analytics framework must have a resource path starting with _/analytics_.

Micha Müller's avatar
Micha Müller committed
528 529
### List of ressources <a name="listOfRessources"></a>

Micha Müller's avatar
Micha Müller committed
530
Prefix `/analytics` left out!
Micha Müller's avatar
Micha Müller committed
531 532 533

<table>
  <tr>
Micha Müller's avatar
Micha Müller committed
534
    <td colspan="2"><b>Ressource</b></td>
Micha Müller's avatar
Micha Müller committed
535 536 537 538 539 540 541 542 543 544 545 546
    <td colspan="2">Description</td>
  </tr>
  <tr>
  	<td>Query</td>
  	<td>Value</td>
  	<td>Opt.</td>
  	<td>Description</td>
  </tr>
</table>

<table>
  <tr>
Micha Müller's avatar
Micha Müller committed
547
    <td colspan="2"><b>GET /help</b></td>
Micha Müller's avatar
Micha Müller committed
548 549 550 551 552 553 554 555 556
    <td colspan="2">Return a cheatsheet of possible analytics REST API endpoints.</td>
  </tr>
  <tr>
  	<td colspan="4">No queries.</td>
  </tr>
</table>

<table>
  <tr>
Micha Müller's avatar
Micha Müller committed
557
    <td colspan="2"><b>GET /plugins</b></td>
Micha Müller's avatar
Micha Müller committed
558 559 560 561 562
    <td colspan="2">List all currently loaded data analytic plugins.</td>
  </tr>
  <tr>
  	<td>json</td>
  	<td>"true"</td>
Micha Müller's avatar
Micha Müller committed
563
  	<td>Yes</td>
Micha Müller's avatar
Micha Müller committed
564 565 566 567 568 569
  	<td>Format response as json.</td>
  </tr>
</table>

<table>
  <tr>
Micha Müller's avatar
Micha Müller committed
570
    <td colspan="2"><b>GET /sensors</b></td>
Micha Müller's avatar
Micha Müller committed
571 572 573 574
    <td colspan="2">List all sensors of a specific plugin.</td>
  </tr>
  <tr>
  	<td>plugin</td>
575
  	<td>All operator plugin names.</td>
Micha Müller's avatar
Micha Müller committed
576
  	<td>No</td>
Micha Müller's avatar
Micha Müller committed
577 578 579
  	<td>Specify the plugin.</td>
  </tr>
  <tr>
580 581
  	<td>operator</td>
  	<td>All operators of a plugin.</td>
Micha Müller's avatar
Micha Müller committed
582
  	<td>Yes</td>
583
  	<td>Restrict sensor list to an operator.</td>
Micha Müller's avatar
Micha Müller committed
584 585 586 587
  </tr>
  <tr>
  	<td>json</td>
  	<td>"true"</td>
Micha Müller's avatar
Micha Müller committed
588
  	<td>Yes</td>
Micha Müller's avatar
Micha Müller committed
589 590 591 592 593 594
  	<td>Format response as json.</td>
  </tr>
</table>

<table>
  <tr>
Micha Müller's avatar
Micha Müller committed
595
    <td colspan="2"><b>GET /units</b></td>
Micha Müller's avatar
Micha Müller committed
596 597 598 599
    <td colspan="2">List all units of a specific plugin.</td>
  </tr>
  <tr>
  	<td>plugin</td>
600
  	<td>All operator plugin names.</td>
Micha Müller's avatar
Micha Müller committed
601
  	<td>No</td>
Micha Müller's avatar
Micha Müller committed
602 603 604
  	<td>Specify the plugin.</td>
  </tr>
  <tr>
605 606
  	<td>operator</td>
  	<td>All operators of a plugin.</td>
Micha Müller's avatar
Micha Müller committed
607
  	<td>Yes</td>
608
  	<td>Restrict unit list to an operator.</td>
Micha Müller's avatar
Micha Müller committed
609 610 611 612
  </tr>
  <tr>
  	<td>json</td>
  	<td>"true"</td>
Micha Müller's avatar
Micha Müller committed
613
  	<td>Yes</td>
Micha Müller's avatar
Micha Müller committed
614 615 616 617 618 619
  	<td>Format response as json.</td>
  </tr>
</table>

<table>
  <tr>
620 621
    <td colspan="2"><b>GET /operators</b></td>
    <td colspan="2">List all operators of a specific plugin.</td>
Micha Müller's avatar
Micha Müller committed
622 623 624
  </tr>
  <tr>
  	<td>plugin</td>
625
  	<td>All operator plugin names.</td>
Micha Müller's avatar
Micha Müller committed
626
  	<td>No</td>
Micha Müller's avatar
Micha Müller committed
627 628 629 630 631
  	<td>Specify the plugin.</td>
  </tr>
  <tr>
  	<td>json</td>
  	<td>"true"</td>
Micha Müller's avatar
Micha Müller committed
632
  	<td>Yes</td>
Micha Müller's avatar
Micha Müller committed
633 634 635 636 637 638
  	<td>Format response as json.</td>
  </tr>
</table>

<table>
  <tr>
Micha Müller's avatar
Micha Müller committed
639
    <td colspan="2"><b>PUT /start</b></td>
640
    <td colspan="2">Start all or only a specific plugin. Or only start a specific streaming operator within a specific plugin.</td>
Micha Müller's avatar
Micha Müller committed
641 642 643 644
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
Micha Müller's avatar
Micha Müller committed
645
  	<td>Yes</td>
Micha Müller's avatar
Micha Müller committed
646 647 648
  	<td>Specify the plugin.</td>
  </tr>
  <tr>
649 650
  	<td>operator</td>
  	<td>All operator names of a plugin.</td>
Micha Müller's avatar
Micha Müller committed
651
  	<td>Yes</td>
652
  	<td>Only start the specified operator. Requires a plugin to be specified. Limited to streaming operators.</td>
Micha Müller's avatar
Micha Müller committed
653 654 655 656 657
  </tr>
</table>

<table>
  <tr>
Micha Müller's avatar
Micha Müller committed
658
    <td colspan="2"><b>PUT /stop</b></td>
659
    <td colspan="2">Stop all or only a specific plugin. Or only stop a specific streaming operator within a specific plugin.</td>
Micha Müller's avatar
Micha Müller committed
660 661 662 663
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
Micha Müller's avatar
Micha Müller committed
664
  	<td>Yes</td>
Micha Müller's avatar
Micha Müller committed
665 666 667
  	<td>Specify the plugin.</td>
  </tr>
  <tr>
668 669
  	<td>operator</td>
  	<td>All operator names of a plugin.</td>
Micha Müller's avatar
Micha Müller committed
670
  	<td>Yes</td>
671
  	<td>Only stop the specified operator. Requires a plugin to be specified. Limited to streaming operators.</td>
Micha Müller's avatar
Micha Müller committed
672 673 674 675 676
  </tr>
</table>

<table>
  <tr>
Micha Müller's avatar
Micha Müller committed
677
    <td colspan="2"><b>PUT /reload</b></td>
678
    <td colspan="2">Reload configuration and initialization of all or only a specific operator plugin.</td>
Micha Müller's avatar
Micha Müller committed
679 680 681 682
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
Micha Müller's avatar
Micha Müller committed
683
  	<td>Yes</td>
Micha Müller's avatar
Micha Müller committed
684 685 686 687
  	<td>Reload only the specified plugin.</td>
  </tr>
</table>

688 689 690 691 692 693 694 695 696 697
<table>
  <tr>
    <td colspan="2"><b>PUT /navigator</b></td>
    <td colspan="2">Rebuild the Sensor Navigator used for instantiating operators.</td>
  </tr>
  <tr>
  	<td colspan="4">No queries.</td>
  </tr>
</table>

Micha Müller's avatar
Micha Müller committed
698 699
<table>
  <tr>
Micha Müller's avatar
Micha Müller committed
700
    <td colspan="2"><b>PUT /compute</b></td>
701
    <td colspan="2">Query the given operator for a certain input unit. Intended for "on-demand" operators, but works with "streaming" operators as well.</td>
Micha Müller's avatar
Micha Müller committed
702 703 704 705
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
Micha Müller's avatar
Micha Müller committed
706
  	<td>No</td>
Micha Müller's avatar
Micha Müller committed
707 708 709
  	<td>Specify the plugin.</td>
  </tr>
  <tr>
710 711
  	<td>operator</td>
  	<td>All operator names of a plugin.</td>
Micha Müller's avatar
Micha Müller committed
712
  	<td>No</td>
713
  	<td>Specify the operator within the plugin.</td>
Micha Müller's avatar
Micha Müller committed
714 715 716 717
  </tr>
  <tr>
  	<td>unit</td>
  	<td>All units of a plugin.</td>
Micha Müller's avatar
Micha Müller committed
718
  	<td>Yes</td>
Micha Müller's avatar
Micha Müller committed
719 720 721 722 723
  	<td>Select the target unit. Defaults to the root unit if not specified.</td>
  </tr>
  <tr>
  	<td>json</td>
  	<td>"true"</td>
Micha Müller's avatar
Micha Müller committed
724
  	<td>Yes</td>
Micha Müller's avatar
Micha Müller committed
725 726 727 728 729 730
  	<td>Format response as json.</td>
  </tr>
</table>

<table>
  <tr>
731 732
    <td colspan="2"><b>PUT /operator</b></td>
    <td colspan="2">Perform a custom REST PUT action defined at operator level. See operator plugin documenation for such actions.</td>
Micha Müller's avatar
Micha Müller committed
733 734 735 736
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
Micha Müller's avatar
Micha Müller committed
737
  	<td>No</td>
Micha Müller's avatar
Micha Müller committed
738 739 740 741
  	<td>Specify the plugin.</td>
  </tr>
  <tr>
  	<td>action</td>
742
  	<td>See operator plugin documentation.</td>
Micha Müller's avatar
Micha Müller committed
743
  	<td>No</td>
Micha Müller's avatar
Micha Müller committed
744 745 746
  	<td>Select custom action.</td>
  </tr>
  <tr>
747 748
  	<td>operator</td>
  	<td>All operators of a plugin.</td>
Micha Müller's avatar
Micha Müller committed
749
  	<td>Yes</td>
750
  	<td>Specify the operator within the plugin.</td>
Micha Müller's avatar
Micha Müller committed
751 752 753 754 755 756
  </tr>
  <tr>
  	<td colspan="4">Custom action may require or allow for more queries!</td>
  </tr>
</table>

Alessio Netti's avatar
Alessio Netti committed
757
> NOTE&ensp;&ensp;&ensp;&ensp;&ensp;&ensp;&ensp; Opt. = Optional
Micha Müller's avatar
Micha Müller committed
758

Alessio Netti's avatar
Alessio Netti committed
759
> NOTE 2 &ensp;&ensp;&ensp;&ensp;&ensp; The value of operator output sensors can be retrieved with the _compute_ resource, or with the _/average_ resource defined in the DCDB Pusher REST API.
Micha Müller's avatar
Micha Müller committed
760

761
> NOTE 3 &ensp;&ensp;&ensp;&ensp;&ensp; Developers can integrate their custom REST API resources that are plugin-specific, by implementing the _REST_ method in _OperatorTemplate_. To know more about plugin-specific resources, please refer to the respective documentation. 
762

Alessio Netti's avatar
Alessio Netti committed
763 764
> NOTE 4 &ensp;&ensp;&ensp;&ensp;&ensp; When operators employ a _root_ unit (e.g., when the Unit System is not used or a _globalOutput_ block is defined in regular operators) the _unit_ query can be omitted when performing a _/compute_ action.

765 766 767
### Rest Examples <a name="restExamples"></a>
In the following are some examples of REST requests over HTTPS:

Alessio Netti's avatar
Alessio Netti committed
768
* Listing the units associated to the _avgoperator1_ operator in the _aggregator_ plugin:
769
```bash
Alessio Netti's avatar
Alessio Netti committed
770
GET https://localhost:8000/analytics/units?plugin=aggregator;operator=avgOperator1
771
```
Alessio Netti's avatar
Alessio Netti committed
772
* Listing the output sensors associated to all operators in the _aggregator_ plugin:
773
```bash
Alessio Netti's avatar
Alessio Netti committed
774
GET https://localhost:8000/analytics/sensors?plugin=aggregator;
775
```
Alessio Netti's avatar
Alessio Netti committed
776
* Reloading the _aggregator_ plugin:
777
```bash
Alessio Netti's avatar
Alessio Netti committed
778
PUT https://localhost:8000/analytics/reload?plugin=aggregator
779
```
Alessio Netti's avatar
Alessio Netti committed
780
* Stopping the _avgOperator1_ operator in the _aggregator_ plugin:
781
```bash
Alessio Netti's avatar
Alessio Netti committed
782
PUT https://localhost:8000/analytics/stop?plugin=aggregator;operator=avgOperator1
783
```
Alessio Netti's avatar
Alessio Netti committed
784
* Performing a query for unit _/node00/cpu03/_ to the _avgOperator1_ operator in the _aggregator_ plugin:
785
```bash
Alessio Netti's avatar
Alessio Netti committed
786
PUT https://localhost:8000/analytics/compute?plugin=aggregator;operator=avgOperator1;unit=/node00/cpu03/
787 788
```

789 790
> NOTE &ensp;&ensp;&ensp;&ensp;&ensp; The analytics RestAPI requires authentication credentials as well.

791
# Plugins <a name="plugins"></a>
792
Here we describe available plugins in Wintermute, and how to configure them.
793

Alessio Netti's avatar
Alessio Netti committed
794 795
## Aggregator Plugin <a name="averagePlugin"></a>
The _Aggregator_ plugin implements simple data processing algorithms. Specifically, this plugin allows to perform basic
796
aggregation operations over a set of input sensors, which are then written as output.
Alessio Netti's avatar
Alessio Netti committed
797
The configuration parameters specific to the _Aggregator_ plugin are the following:
798 799 800 801

| Value | Explanation |
|:----- |:----------- |
| window | Length in milliseconds of the time window that is used to retrieve recent readings for the input sensors, starting from the latest one.
802

803
Additionally, output sensors in operators of the _Aggregator_ plugin accept the following parameters:
804 805 806

| Value | Explanation |
|:----- |:----------- |
807
| operation | Operation to be performed over the input sensors. Can be "sum", "average", "maximum", "minimum", "std", "percentiles" or "observations".
808
| percentile |  Specific percentile to be computed when using the "percentiles" operation. Can be an integer in the (0,100) range.
809 810
| relative | If true, the _relative_ query mode will be used. Otherwise the _absolute_ mode is used.

811

Alessio Netti's avatar
Alessio Netti committed
812 813 814 815
## Job Aggregator Plugin <a name="jobaveragePlugin"></a>

The _Job Aggregator_ plugin offers the same functionality as the _Aggregator_ plugin, but on a per-job basis. As such,
it performs aggregation of the specified input sensors across all nodes on which each job is running. Please refer
816
to the corresponding [section](#joboperators) for more details.
Alessio Netti's avatar
Alessio Netti committed
817

818 819
> NOTE &ensp;&ensp;&ensp;&ensp;&ensp; The Job Aggregator plugin does not support the _relative_ option supported by the Aggregator plugin, and always uses the _absolute_ sensor query mode.

820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836
## Smoothing Plugin <a name="smoothingPlugin"></a>
The _Smoothing_ plugin performs similar functions as the _Aggregator_ plugin, but is optimized for performing running averages. It uses a single accumulator to compute approximate running averages, improving memory usage and reducing the amount of queried data.

Units ins the _Smoothing_ plugin are also instantiated differently compared to other plugins. As it is meant to perform running average on most or all sensors present in a system, there is no need to specify the input sensors of an operator: the plugin will automatically fetch all instantiated sensors in the system, and create separate units for them, each with their separate average output sensors. 
For this reason, pattern expressions specified on output sensors (e.g., _<bottomup 1, filter cpu>avg60_) are ignored and only the MQTT suffixes of the output sensors are used to construct units. If required, users can still specify input sensors using the Unit System syntax, to select subsets of the available sensors in the system. Operators of the _Smoothing_ plugin accept the following configuration parameters: 

| Value | Explanation |
|:----- |:----------- |
| separator | Character used to separate the MQTT prefix of output sensors (which is the input sensor's name) from the suffix (which is the average's identifier). Default is "#". 
| exclude | String containing a regular expression, defining which sensors must be excluded from the smoothing process.

Additionally, output sensors in operators of the _Smoothing_ plugin accept the following parameters:

| Value | Explanation |
|:----- |:----------- |
| range | Range in milliseconds of the average to be stored in this output sensor.

837
> NOTE &ensp;&ensp;&ensp;&ensp;&ensp; The _Smoothing_ plugin will automatically update the metadata table of the input sensors in the units to expose the computed averages. This behavior can be altered by setting the _isOperation_ metadata attribute to _false_ for each output sensor. This way, the sensors associated to the averages will be published as independent entries.
838

839 840 841 842 843 844 845 846 847 848 849 850 851 852 853
## Regressor Plugin <a name="regressorPlugin"></a>

The _Regressor_ plugin is able to perform regression of sensors, by using _random forest_ machine learning predictors. The algorithmic backend is provided by the OpenCV library.
For each input sensor in a certain unit, statistical features from recent values of its time series are computed. These are the average, standard deviation, sum of differences, 25th quantile and 75th quantile. These statistical features are then combined together in a single _feature vector_. 

In order to operate correctly, the models used by the regressor plugin need to be trained: this procedure is performed automatically online when using the _streaming_ mode, and can be triggered arbitrarily over the REST API. In _on demand_ mode, automatic training cannot be performed, and as such a pre-trained model must be loaded from a file.
The following are the configuration parameters available for the _Regressor_ plugin:

| Value | Explanation |
|:----- |:----------- |
| window | Length in milliseconds of the time window that is used to retrieve recent readings for the input sensors, starting from the latest one.
| trainingSamples | Number of samples necessary to perform training of the current model.
| targetDistance | Temporal distance (in terms of lags) of the sample that is to be predicted.
| inputPath | Path of a file from which a pre-trained random forest model must be loaded.
| outputPath | Path of a file to which the random forest model trained at runtime must be saved.
854
| getImportances | If true, the random forest will also compute feature importance values when trained, which are printed.
855 856 857

> NOTE &ensp;&ensp;&ensp;&ensp;&ensp; When the _duplicate_ option is enabled, the _outputPath_ field is ignored to avoid file collisions from multiple regressors.

Alessio Netti's avatar
Alessio Netti committed
858
> NOTE 2 &ensp;&ensp;&ensp;&ensp;&ensp; When loading the model from a file and _getImportances_ is set to true, importance values will be printed only if the original model had this feature enabled upon training.
859

860
Additionally, input sensors in operators of the Regressor plugin accept the following parameter:
861 862 863

| Value | Explanation |
|:----- |:----------- |
864
| target | Boolean value. If true, this sensor represents the target for regression. Every unit in operators of the regressor plugin must have excatly one target sensor.
865

866
Finally, the Regressor plugin supports the following additional REST API actions:
867 868 869 870

| Action | Explanation |
|:----- |:----------- |
| train | Triggers a new training phase for the random forest model. Feature vectors are temporarily collected in-memory until _trainingSamples_ vectors are obtained. Until this moment, the old random forest model is still used to perform prediction.
871
| importances | Returns the sorted importance values for the input features, together with the respective labels, if available.
872

Alessio Netti's avatar
Alessio Netti committed
873 874 875 876 877 878 879 880
## Classifier Plugin <a name="classifierPlugin"></a>

The _Classifier_ plugin, as the name implies, performs machine learning classification. It is based on the Regressor plugin, and as such it also uses OpenCV random forest models. The plugin supplies the same options and has the same behavior as the Regressor plugin, with the following two exceptions:

* The _target_ parameter here indicates a sensor which stores the labels (as numerical integer identifiers) to be used for training and on which classification will be based. The mapping from the integer labels to their text equivalent is left to the users. Moreover, unlike in the
Regressor plugin, the target sensor is always excluded from the feature vectors.
* The _targetDistance_ parameter is not used here, as it is only meaningful for regression.

881 882 883 884 885 886 887 888 889 890
## Clustering Plugin <a name="clusteringPlugin"></a>

The _Clustering_ plugin implements a gaussian mixture model for performance variation analysis and outlier detection. The plugin is based on the OpenCV library, similarly to the _Regressor_ plugin.
The input sensors of operators define the axes of the clustering space and hence the number of dimensions of the associated gaussian components. The units of the operator, instead, define the number of points in the clustering space: for this reason, 
the Clustering plugin employs hierarchical units, so that the clustering is performed once for all present sub-units, at each computation interval. When prediction is performed (after training of the GMM model, depending on the _reuseModel_ attribute) the
label of the gaussian component to which each sub-unit belongs is stored in the only output sensor of that sub-unit. Operators of the Clustering plugin support the following configuration parameters:

| Value | Explanation |
|:----- |:----------- |
| window | Length in milliseconds of the time window that is used to retrieve recent readings for the input sensors, starting from the latest one.
891
| lookbackWindow | Enables a lookback window (whose length is expressed in milliseconds) to collect additional points from previous time windows in order to perform model training. The additional points will be accumulated in memory until training can be performed. Disabled by default.
892 893 894 895 896 897 898 899 900 901 902 903 904
| numComponents | Number of gaussian components in the mixture model.
| reuseModel | Boolean value. If false, the GMM model is trained at each computation, otherwise only once or when the "train" REST API action is used. Default is false.
| outlierCut | Threshold used to determine outliers when performing the Mahalanobis distance. 
| inputPath | Path of a file from which a pre-trained random forest model must be loaded.
| outputPath | Path of a file to which the random forest model trained at runtime must be saved.

The Clustering plugin does not provide any additional configuration parameters for its input and output sensors. 
However, it supports the following additional REST API actions:

| Action | Explanation |
|:----- |:----------- |
| train | Triggers a new training phase for the gaussian mixture model. At the next computation interval, the feature vectors of all units of the operator are combined to perform training, after which predicted labels are given as output.
| means | Returns the means of the generated gaussian components in the trained mixture model.
905
| covs | Returns the covariance matrices of the generated gaussian components in the trained mixture model.
906

907
## Tester Plugin <a name="testerPlugin"></a>
Alessio Netti's avatar
Alessio Netti committed
908
The _Tester_ plugin can be used to test the functionality and performance of the query engine, as well as of the Unit System. It will perform a specified number of queries over the set of input sensors for each unit, and then output as a sensor the total number of retrieved readings. The following are the configuration parameters for operators in the _Tester_ plugin:
909 910 911 912 913 914 915 916

| Value | Explanation |
|:----- |:----------- |
| window | Length in milliseconds of the time window that is used to retrieve recent readings for the input sensors, starting from the latest one.
| queries | Number of queries to be performed at each computation interval. If more than the number of input sensors per unit, these will be looped over multiple times.
| relative | If true, the _relative_ query mode will be used. Otherwise the _absolute_ mode is used.

# Sink Plugins <a name="sinkplugins"></a>
917
Here we describe available plugins in Wintermute that are devoted to the output of sensor data (_sinks_), and that do not perform any analysis.
918

919 920
## File Sink Plugin <a name="filesinkPlugin"></a>
The _File Sink_ plugin allows to write the output of any other sensor to the local file system. As such, it does not produce output sensors by itself, and only reads from input sensors.
Alessio Netti's avatar
Alessio Netti committed
921
The input sensors can either be fully qualified, or can be described through the Unit System. In this case, multiple input sensors can be generated automatically, and the respective output paths need to be adjusted by enabling the _autoName_ attribute described below, to prevent multiple sensors from being written to the same file. The file sink operators (named sinks) support the following attributes:
922 923 924 925 926 927 928 929 930 931 932

| Value | Explanation |
|:----- |:----------- |
| autoName | Boolean. If false, the output paths associated to sensors are interpreted literally, and a file is opened for them. If true, only the part in the path describing the current directory is used, while the file itself is named accordingly to the MQTT topic of the specific sensor.

Additionally, input sensors in sinks accept the following parameters:

| Value | Explanation |
|:----- |:----------- |
| path | The path to which the sensors's readings should be written. It is interpreted as described above for the _autoName_ attribute.

933
## Writing DCDB Analytics Plugins <a name="writingPlugins"></a>
934
Generating a DCDB Wintermute plugin requires implementing a _Operator_ and _Configurator_ class which contain all logic
935
tied to the specific plugin. Such classes should be derived from _OperatorTemplate_ and _OperatorConfiguratorTemplate_
936
respectively, which contain all plugin-agnostic configuration and runtime features. Please refer to the documentation 
Alessio Netti's avatar
Alessio Netti committed
937
of the _Aggregator_ plugin for an overview of how a basic plugin can be implemented.