README.md 39.4 KB
Newer Older
1
# Wintermute, the DCDB Data Analytics Framework
2
3

### Table of contents
4
1. [Introduction](#introduction)
5
2. [DCDB Wintermute](#dcdbanalytics)
6
    1. [Global Configuration](#globalConfiguration)
7
    2. [Operators](#operators)
8
9
10
    	1. [The Sensor Tree](#sensorTree)
    	2. [The Unit System](#unitSystem)
    	3. [Operational Modes](#opModes)
11
		4. [Operator Configuration](#operatorConfiguration)
12
13
			1. [Configuration Syntax](#configSyntax)
			2. [Instantiating Units](#instantiatingUnits)
Alessio Netti's avatar
Alessio Netti committed
14
15
16
17
			3. [Instantiating Hierarchical Units](#instantiatingHUnits)
			4. [Job Operators](#joboperators)
			5. [MQTT Topics](#mqttTopics)
			6. [Pipelining Operators](#pipelining)
18
    3. [Rest API](#restApi)
Micha Mueller's avatar
Micha Mueller committed
19
20
        1. [List of ressources](#listOfRessources)
        2. [Examples](#restExamples)
21
3. [Plugins](#plugins)
Alessio Netti's avatar
Alessio Netti committed
22
23
	1. [Aggregator Plugin](#averagePlugin)
	2. [Job Aggregator Plugin](#jobaveragePlugin)
24
	3. [Regressor Plugin](#regressorPlugin)
25
26
27
28
	4. [Tester Plugin](#testerPlugin)
4. [Sink Plugins](#sinkplugins)
	1. [File Sink Plugin](#filesinkPlugin)
	2. [Writing Plugins](#writingPlugins)
29
30

# Introduction <a name="introduction"></a>
31
In this Readme we describe Wintermute, the DCDB Data Analytics framework, and all data abstractions that are associated with it. 
32

33
34
35
# DCDB Wintermute <a name="dcdbanalytics"></a>
The Wintermute framework is built on top of DCDB, and allows to perform data analytics based on sensor data
in a variety of ways. Wintermute can be deployed both in DCDB Pusher and Collect Agent, with some minor
36
37
differences:

38
* **DCDB Pusher**: only sensor data that is sampled locally and that is contained within the sensor cache can be used for
39
40
data analytics. However, this is the preferable way to deploy simple models on a large-scale, as all computation is
performed within compute nodes, dramatically increasing scalability;
41
* **DCDB Collect Agent**: all available sensor data, in the local cache and in the Cassandra database, can be used for data
42
analytics. This approach is preferable for models that require data from multiple sources at once 
43
(e.g., clustering-based anomaly detection), or for models that are deployed in [on-demand](#operatorConfiguration) mode.
44
45

## Global Configuration <a name="globalConfiguration"></a>
Alessio Netti's avatar
Alessio Netti committed
46
47
48
49
Wintermute shares the same configuration structure as DCDB Pusher and Collect Agent, and it can be enabled via the 
respective (i.e., _dcdbpusher.conf_ or _collectagent.conf_) configuration file.  All output sensors of the frameworks 
are therefore affected by configuration parameters described in the global Readme. Additional parameters specific to 
this framework are the following:
50
51
52

| Value | Explanation |
|:----- |:----------- |
Alessio Netti's avatar
Alessio Netti committed
53
| **analytics** | Wrapper structure for the data analytics-specific values.
54
| hierarchy | Space-separated sequence of regular expressions used to infer the local (DCDB Pusher) or global (DCDB Collect Agent) sensor hierarchy. This parameter should be wrapped in quotes to ensure proper parsing. See the Sensor Tree [section](#sensorTree) for more details.
Alessio Netti's avatar
Alessio Netti committed
55
| filter | Regular expression used to filter the set of sensors in the sensor tree. Everything that matches is included, the rest is discarded.
Alessio Netti's avatar
Alessio Netti committed
56
| jobFilter | Regular expression used to filter the jobs processed by job operators. The expression is applied to the first node of the job's nodelist. If a match is found the job is processed, otherwise it is discarded.
Alessio Netti's avatar
Alessio Netti committed
57
| **operatorPlugins** | Block containing the specification of all data analytics plugin to be instantiated.
58
| plugin _name_ | The plugin name is used to build the corresponding lib-name (e.g. average --> libdcdboperator_average.1.0)
59
60
61
62
| path | Specify the path where the plugin (the shared library) is located. If left empty, DCDB will look in the default lib-directories (usr/lib and friends) for the plugin file.
| config | One can specify a separate config-file (including path to it) for the plugin to use. If not specified, DCDB will look up pluginName.conf (e.g. average.conf) in the same directory where global.conf is located.
| | |

63
## Operators <a name="operators"></a>
Alessio Netti's avatar
Alessio Netti committed
64
Operators are the basic building block in Wintermute. An Operator is instantiated within a plugin, performs a specific
65
task and acts on sets of inputs and outputs called _units_. Operators are functionally equivalent to _sensor groups_
66
in DCDB Pusher, but instead of sampling data, they process such data and output new sensors. Some high-level examples
67
of operators are the following:
68

Alessio Netti's avatar
Alessio Netti committed
69
* An operator that performs time-series regression on a particular input sensor, and outputs its prediction;
70
* An operator that aggregates a series of input sensors, builds feature vectors, and performs machine 
71
learning-based tasks using a supervised model;
72
* An operator that performs clustering-based anomaly detection by using different sets of inputs associated to different
73
compute nodes;
74
* An operator that outputs statistical features related to the time series of a certain input sensor.
75
76

### The Sensor Tree <a name="sensorTree"></a>
77
Before diving into the configuration and instantiation of operators, we introduce the concept of _sensor tree_. A  sensor
78
tree is simply a data structure expressing the hierarchy of sensors that are being sampled; internal nodes express
Alessio Netti's avatar
Alessio Netti committed
79
80
hierarchical entities (e.g., clusters, racks, nodes, cpus), whereas leaf nodes express actual sensors. In DCDB Pusher, 
a sensor tree refers only to the local hierarchy, while in the Collect Agent it can capture the hierarchy of the entire
81
82
system being sampled.

83
A sensor tree is built at initialization time of DCDB Wintermute, and is implemented in the _SensorNavigator_ class. 
Alessio Netti's avatar
Alessio Netti committed
84
85
86
87
By default, if no hierarchy string has been specified in the configuration, the tree is built automatically by assuming 
that each forward slash-separated part of the sensor name expresses a level in the hierarchy. The total depth of the 
tree is thus determined at runtime as well. This is, in most cases, the preferable configuration, as it complies with
the MQTT topic standard, and interprets each sensor name as if it was a path in a file system.
88

Alessio Netti's avatar
Alessio Netti committed
89
90
91
For spacial cases, the _hierarchy_ global configuration parameter can be used to enforce a specific hierarchy, with
a fixed number of levels. More in general, the following could be a set of forward slash-separated sensor names, from
which we can construct a sensor tree with three levels corresponding to racks, nodes and CPUs respectively:
92
93

```
Alessio Netti's avatar
Alessio Netti committed
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
/rack00/status
/rack00/node05/MemFree
/rack00/node05/energy
/rack00/node05/temp
/rack00/node05/cpu00/col_user
/rack00/node05/cpu00/instr
/rack00/node05/cpu00/branch-misses
/rack00/node05/cpu01/col_user
/rack00/node05/cpu01/instr
/rack00/node05/cpu01/branch-misses
/rack02/status
/rack02/node03/MemFree
/rack02/node03/energy
/rack02/node03/temp
/rack02/node03/cpu00/col_user
/rack02/node03/cpu00/instr
/rack02/node03/cpu00/branch-misses
/rack02/node03/cpu01/col_user
/rack02/node03/cpu01/instr
/rack02/node03/cpu01/branch-misses
114
115
116
117
118
119
120
121
122
123
``` 

Each sensor name is interpreted as a path within the sensor tree. Therefore, the _instr_ and _branch-misses_ sensors
will be placed as leaf nodes in the deepest level of the tree, as children of the respective cpu node they belong to.
Such cpu nodes will be in turn children of the nodes they belong to, and so on.

The generated sensor tree can then be used to navigate the sensor hierarchy, and perform actions such as _retrieving
all sensors belonging to a certain node, to a neighbor of a certain node, or to the rack a certain node belongs to_.
Please refer to the documentation of the _SensorNavigator_ class for more details.

Alessio Netti's avatar
Alessio Netti committed
124
125
126
> NOTE &ensp;&ensp;&ensp;&ensp;&ensp; In the example above, if sensor names were formatted in a different format (e.g.,
rackXX.nodeXX.cpuXX.sensorName), we should have defined a hierarchy string explicitly in order to generate a sensor
tree correctly. Such string would be, in this case "_rack\d{2}.  node\d{2}.  cpu\d{2}._". 
127
128

> NOTE 2 &ensp;&ensp;&ensp; Sensor trees are always built from the names of sensors _as they are published_. Therefore,
129
please make sure to use the _-a_ option in DCDB Pusher appropriately, to build sensor names that express the desired hierarchy.
130
131
132


### The Unit System <a name="unitSystem"></a>
133
Each operator operates on one or more _units_. A unit represents an abstract (or physical) entity in the current system that
134
135
136
is the target of analysis. A unit could be, for example, a rack, a node within a rack, a CPU within a node or an entire HPC system.
Units are identified by three components:

Alessio Netti's avatar
Alessio Netti committed
137
* **Name**: The name of this unit, that corresponds to the entity it represents. For example, _/rack02/node03/_ or _/rack00/node05/cpu01/_ could be unit names. A unit must always correspond to an existing internal node in the current sensor tree;
138
139
140
141
142
* **Input**: The set of sensors that constitute the input for analysis conducted on this unit. The sensors must share a hierarchical relationship with the unit: that is, they can either belong to the node represented by this unit, to its subtree, or to one of its ancestors; 
* **Output**: The set of output sensors that are produced from any analysis conducted on this unit. The output sensors are always directly associated with the node represented by the unit.

Units are a way to define _patterns_ in the sensor tree and retrieve sensors that are associated to each other by a 
hierarchical relationship. See the configuration [section](#instantiatingUnits) for more details on how to create
143
templates in order to define units suitable for operators.
144
145

### Operational Modes <a name="opModes"></a>
146
Operators can operate in two different modes:
147

148
149
* **Streaming**: streaming operators perform data analytics online and autonomously, processing incoming sensor data at regular intervals.
The units of streaming operators are completely resolved and instantiated at configuration time. The type of output of streaming
Alessio Netti's avatar
Alessio Netti committed
150
151
operators is identical to that of _sensors_ in DCDB Pusher, which are pushed to a Collect Agent and finally to the Cassandra datastore,
resulting in a time-series representation;
152
153
154
* **On-demand**: on-demand operators do not perform data analytics autonomously, but only when queried by users. Unlike
for streaming operators, the units of on-demand operators are not instantiated at configuration, but only when a query is performed. When 
such an event occurs, the operator verifies that the queried unit belongs to its _unit domain_, and then instantiates it,
155
resolving its inputs and outputs. Then, the unit is stored in a local cache for future re-use. The outputs of a on-demand
156
operator are exposed through the REST API, and are never pushed to the Cassandra database.
157

Alessio Netti's avatar
Alessio Netti committed
158
Use of streaming operators is advised when a time-series-like output is required, whereas on-demand operators are effective
159
when data is required at specific times and for specific purposes, and when the unit domain's size makes the use of streaming
160
operators unfeasible.
161

162
### Operator Configuration <a name="operatorConfiguration"></a>
163
164
Here we describe how to configure and instantiate operators in Wintermute. The configuration scheme is very similar
to that of _sensor groups_ in DCDB Pusher, and a _global_ configuration block can be defined in each plugin configuration
165
file. The following is instead a list of configuration parameters that are available for the operators themselves:
166
167
168

| Value | Explanation |
|:----- |:----------- |
169
| default | Name of the template that must be used to configure this operator.
Alessio Netti's avatar
Alessio Netti committed
170
| interval | Specifies how often (in milliseconds) the operator will be invoked to perform computations, and thus the sampling interval of its output sensors. Only used for operators in _streaming_ mode.
171
| relaxed | If set to _true_, the units of this operator will be instantiated even if some of the respective input sensors do not exist.
172
| delay | Delay in milliseconds to be applied to the interval of the operator. This parameter can be used to tune how operator pipelines work, ensuring that the next computation stage is started only after the previous one has finished.
173
| unitCacheLimit | Defines the maximum size of the unit cache that is used in the on-demand and job modes. Default is 1000.
174
| minValues |   Minimum number of readings that need to be stored in output sensors before these are pushed as MQTT messages. Only used for operators in _streaming_ mode.
Alessio Netti's avatar
Alessio Netti committed
175
| mqttPart |    Part of the MQTT topic associated to this operator. Only used for the _root_ unit or when the _enforceTopics_ flag is set to true (see this [section](#mqttTopics)).
176
| enforceTopics | If set to _true_, mqttPart will be forcibly pre-pended to the MQTT topics of all output sensors in the operator (see this [section](#mqttTopics)). 
177
| sync | If set to _true_, computation will be performed at time intervals synchronized with sensor readings.
Alessio Netti's avatar
Alessio Netti committed
178
| disabled | If set to _true_, the operator will be instantiated but will not be started and will not be available for queries.
179
180
| duplicate | 	If set to _false_, only one operator object will be instantiated. Such operator will perform computation over all units that are instantiated, at every interval, sequentially. If set to _true_, the operator object will be duplicated such that each copy will have one unit associated to it. This allows to exploit parallelism between units, but results in separate models to avoid race conditions.
| streaming |	If set to _true_, the operator will operate in _streaming_ mode, pushing output sensors regularly. If set to _false_, the operator will instead operate in _on-demand_ mode.
Alessio Netti's avatar
Alessio Netti committed
181
182
183
| unitInput | Block of input sensors that must be used to instantiate the units of this operator. These can both be a list of strings, or fully-qualified _Sensor_ blocks containing specific attributes (see DCDB Pusher Readme).
| unitOutput | Block of output sensors that will be associated to this operator. These must be _Sensor_ blocks containing valid MQTT suffixes. Note that the number of output sensors is usually fixed depending on the type of operator.
| globalOutput | Block for _global_ output sensors that are not associated with a specific unit. If this is defined, all units described by the _unitInput_ and _unitOutput_ blocks will be grouped under a hierarchical _root_ unit that contains the output sensors described here.
184
185
186
| | |

#### Configuration Syntax <a name="configSyntax"></a>
Alessio Netti's avatar
Alessio Netti committed
187
In the following we show a sample configuration block for the _Aggregator_ plugin. For the full version, please refer to the
188
189
190
default configuration file in the _config_ directory:

```
Alessio Netti's avatar
Alessio Netti committed
191
192
193
194
195
global {
	mqttprefix /analytics
}

template_aggregator def1 {
196
197
198
199
200
201
interval	1000
minValues	3
duplicate 	false
streaming	true
}

Alessio Netti's avatar
Alessio Netti committed
202
aggregator avg1 {
203
default     def1
204
mqttPart    /avg1
205

Alessio Netti's avatar
Alessio Netti committed
206
	unitInput {
207
208
209
210
		sensor col_user
		sensor MemFree
	}

Alessio Netti's avatar
Alessio Netti committed
211
	unitOutput {
212
		sensor sum {
Alessio Netti's avatar
Alessio Netti committed
213
			operation 	sum
214
			mqttsuffix  /sum
215
216
217
		}

		sensor max {
Alessio Netti's avatar
Alessio Netti committed
218
			operation	maximum
219
			mqttsuffix  /max
220
221
222
		}

		sensor avg {
Alessio Netti's avatar
Alessio Netti committed
223
			operation	average
224
			mqttsuffix  /avg
225
226
227
228
229
230
		}
	}
}
``` 

The configuration shown above uses a template _def1_ for some configuration parameters, which are then applied to the
231
_avg1_ operator. This operator takes the _col_user_ and _MemFree_ sensors as input (which must be available under this name),
Alessio Netti's avatar
Alessio Netti committed
232
 and outputs _sum_, _max_, and _avg_ sensors. In this configuration, the Unit System and sensor hierarchy are not used, 
233
234
235
 and therefore only one generic unit (called the _root_ unit) will be instantiated.

#### Instantiating Units <a name="instantiatingUnits"></a>
Alessio Netti's avatar
Alessio Netti committed
236
Here we propose once again the configuration discussed above, this time making use of the Unit System to abstract from
237
238
239
the specific system being used and simplify configuration. The adjusted configuration block is the following: 

```
Alessio Netti's avatar
Alessio Netti committed
240
241
242
243
244
global {
	mqttprefix /analytics
}

template_aggregator def1 {
245
246
247
248
249
250
interval	1000
minValues	3
duplicate 	false
streaming	true
}

Alessio Netti's avatar
Alessio Netti committed
251
aggregator avg1 {
252
default     def1
253
mqttPart    /avg1
254

Alessio Netti's avatar
Alessio Netti committed
255
	unitInput {
256
257
258
259
		sensor "<bottomup>col_user"
		sensor "<bottomup 1>MemFree"
	}

Alessio Netti's avatar
Alessio Netti committed
260
	unitOutput {
Alessio Netti's avatar
Alessio Netti committed
261
		sensor "<bottomup, filter cpu00>sum" {
Alessio Netti's avatar
Alessio Netti committed
262
			operation 	sum
263
			mqttsuffix  /sum
264
265
		}

Alessio Netti's avatar
Alessio Netti committed
266
		sensor "<bottomup, filter cpu00>max" {
Alessio Netti's avatar
Alessio Netti committed
267
			operation 	maximum
268
			mqttsuffix  /max
269
270
		}

Alessio Netti's avatar
Alessio Netti committed
271
		sensor "<bottomup, filter cpu00>avg" {
Alessio Netti's avatar
Alessio Netti committed
272
			operation 	average
273
			mqttsuffix  /avg
274
275
276
277
278
279
		}
	}
}
``` 

In each sensor declaration, the _< >_ block is a placeholder that will be replaced with the name of the units that will
280
be associated to the operator, thus resolving the sensor names. Such block allows to navigate the current sensor tree,
281
282
283
284
285
286
287
288
289
290
291
292
293
294
and select nodes that will constitute the units. Its syntax is the following:

```
< bottomup|topdown X, filter Y >SENSORNAME 
``` 

The first section specified the _level_ in the sensor tree at which nodes must be selected. _bottomup X_ and _topdown X_
respectively mean _"search X levels up from the deepest level in the sensor tree"_, and _"search X levels down from the 
topmost level in the sensor tree"_. The _X_ numerical value can be omitted as well.

The second section, on the other hand, allows to search the sensor tree _horizontally_. Within the level specified in the
first section of the configuration block, only the nodes whose names match with the regular expression Y will be selected.
This way, we can navigate the current sensor tree both vertically and horizontally, and easily instantiate units starting 
from nodes in the tree. The set of nodes in the current sensor tree that match with the specified configuration block is
295
defined as the _unit domain_ of the operator.
296
297
298

The configuration algorithm then works in two steps:

299
1. The _output_ block of the operator is read, and its unit domain is determined; this implies that all sensors in the 
300
301
302
output block must share the same _< >_ block, and therefore match the same unit domain;
2. For each unit in the domain, its input sensors are identified. We start from the _unit_ node in the sensor tree, and 
navigate to the corresponding sensor node according to its _< >_ block, which identifies its level in the tree. Each 
303
unit, once its inputs and outputs are defined, is then added to the operator.
304
305

According to the sensor tree built in the previous [section](#sensorTree), the configuration above would result in
Alessio Netti's avatar
Alessio Netti committed
306
an operator with the following set of _flat_ units:
307
308

```
Alessio Netti's avatar
Alessio Netti committed
309
/rack00/node05/cpu00/ {
310
	Inputs {
Alessio Netti's avatar
Alessio Netti committed
311
312
		/rack00/node05/cpu00/col_user
		/rack00/node05/MemFree
313
314
315
	}
	
	Outputs {
Alessio Netti's avatar
Alessio Netti committed
316
317
318
		/rack00/node05/cpu00/sum
		/rack00/node05/cpu00/max
		/rack00/node05/cpu00/avg
319
320
321
	}
}

Alessio Netti's avatar
Alessio Netti committed
322
/rack02/node03/cpu00/ {
323
	Inputs {
Alessio Netti's avatar
Alessio Netti committed
324
325
		/rack02/node03/cpu00/col_user
		/rack02/node03/MemFree
326
327
328
	}
                     	
	Outputs {
Alessio Netti's avatar
Alessio Netti committed
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
		/rack02/node03/cpu00/sum
		/rack02/node03/cpu00/max
		/rack02/node03/cpu00/avg
	}
}
``` 

#### Instantiating Hierarchical Units <a name="instantiatingHUnits"></a>
A second level of aggregation beyond ordinary units can be obtained by defining sensors in the _globalOutput_ block. In
this case, a series of units will be created like in the previous example, and they will be added as _sub-units_ of a 
top-level _root_ unit, which will have as outputs the sensors defined in the _globalOutput_ block. This type of unit
is called a _hierarchical_ unit.
 
 
Computation for a hierarchical unit is always performed starting from the top-level unit. This means that all sub-units
will be processed sequentially in the same computation interval, and that they cannot be split. However, both the
top-level unit and the respective sub-units are exposed to the outside, and their sensors can be queried. Please
refer to the plugins' documentation to see whether hierarchical units are supported or not.

Recalling the previous example, a hierarchical unit can be constructed with the following configuration:

```
global {
	mqttprefix /analytics
}

template_aggregator def1 {
interval	1000
minValues	3
duplicate 	false
streaming	true
}

aggregator avg1 {
default     def1
mqttPart    /avg1

	unitInput {
		sensor "<bottomup>col_user"
		sensor "<bottomup 1>MemFree"
	}

	unitOutput {
		sensor "<bottomup, filter cpu00>sum" {
			operation 	sum
			mqttsuffix  /sum
		}

		sensor "<bottomup, filter cpu00>max" {
			operation 	maximum
			mqttsuffix  /max
		}

		sensor "<bottomup, filter cpu00>avg" {
			operation 	average
			mqttsuffix  /avg
		}
	}
	
	globalOutput {
		sensor globalSum {
			operation 	sum
			mqttsuffix  /globalSum
		}
    }
}
``` 

Note that hierarchical units can only have global outputs, but not global inputs, as they are meant to perform aggregation
of the results obtained on single sub-units. Such a configuration would result in the following unit structure:

```
__root__ {
	Outputs {
		/analytics/avg1/globalSum
	}
	
	Sub-units {
		/rack00/node05/cpu00/ {
			Inputs {
				/rack00/node05/cpu00/col_user
				/rack00/node05/MemFree
			}
			
			Outputs {
				/rack00/node05/cpu00/sum
				/rack00/node05/cpu00/max
				/rack00/node05/cpu00/avg
			}
		}
		
		/rack02/node03/cpu00/ {
			Inputs {
				/rack02/node03/cpu00/col_user
				/rack02/node03/MemFree
			}
								
			Outputs {
				/rack02/node03/cpu00/sum
				/rack02/node03/cpu00/max
				/rack02/node03/cpu00/avg
			}
		}
432
433
434
435
	}
}
``` 

Alessio Netti's avatar
Alessio Netti committed
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
> NOTE&ensp;&ensp;&ensp;&ensp;&ensp;&ensp;&ensp; The _duplicate_ setting has no effect when hierarchical units are used 
(i.e., _globalOutput_ is defined).

> NOTE 2 &ensp;&ensp;&ensp;&ensp;&ensp; As of now the _aggregator_ plugin does not support hierarchical units. The
above is only meant as an example for how hierarchical units can be created in general.

#### Job Operators <a name="joboperators"></a>

_Job Operators_ are a class of operators which act on job-specific data. Such data is structured in _job units_. These
units are _hierarchical_, and work as described previously (see this [section](#instantiatingHUnits)). In particular,
 they are arranged as follows:

* The top unit is associated to the job itself and contains all of the required output sensors (_globalOutput_ block);
* One sub-unit for each node on which the job was running is allocated. Each of these sub-units contains all of the input
sensors that are required at configuration time (_unitInput_ block), along output sensors at the compute node level (_unitOutput_ block).

The computation algorithms driving job operators can then navigate freely this hierarchical unit design according to
their specific needs. Job-level sensors in the top unit do not require Unit System syntax (see this [section](#mqttTopics));
sensors that are defined in sub-units, if supported by the plugin, need however to be at the compute node level in the
current sensor tree, since all of the sub-units are tied to the nodes on which the job was running. This way, 
unit resolution is performed correctly by the Unit System.

Job operators also support the _streaming_ and _on-demand_ modes, which work like the following:

* In **streaming** mode, the job operator will retrieve the list of jobs that were running in the time interval starting
from the last computation to the present; it will then build one job unit for each of them, and subsequently perform computation;
* In **on-demand** mode, users can query a specific job id, for which a job unit is built and computation is performed.

> NOTE &ensp;&ensp;&ensp;&ensp;&ensp; The _duplicate_ setting does not affect job operators.

Alessio Netti's avatar
Alessio Netti committed
466
> NOTE &ensp;&ensp;&ensp;&ensp;&ensp; In order to get units that operate at the _node_ level, the output sensors in the
Alessio Netti's avatar
Alessio Netti committed
467
configuration discussed earlier should have a unit block in the form of < bottomup 1 >.
Alessio Netti's avatar
Alessio Netti committed
468

469
#### MQTT Topics <a name="mqttTopics"></a>
470
The MQTT topics associated to output sensors of a certain operator are constructed in different ways depending
471
472
473
on the unit they belong to:

* **Root unit**: if the output sensors belong to the _root_ unit, that is, they do not belong to any level in the sensor
474
hierarchy and are uniquely defined, the respective topics are constructed like in DCDB Pusher sensors, by concatenating
Alessio Netti's avatar
Alessio Netti committed
475
476
the MQTT prefix, operator part and sensor suffix that are defined. The same happens for sensors defined in the _globalOutput_
block, which are part of the top level in a hierarchical unit, which also corresponds to the _root_ unit;
Alessio Netti's avatar
Alessio Netti committed
477
478
* **Job unit**: if the output sensors belong to a _job_ unit in a job operator (see below), the MQTT topic is constructed
by concatenating the MQTT prefix, the operator part, a job suffix (e.g., /job1334) and finally the sensor suffix;
479
480
* **Any other unit**: if the output sensor belongs to any other unit in the sensor tree, its MQTT topic is constructed
by concatenating the MQTT prefix associated to the unit (which is defined as _the portion of the MQTT topic shared by all sensors
Alessio Netti's avatar
Alessio Netti committed
481
belonging to such unit_) and the sensor suffix.
482

Alessio Netti's avatar
Alessio Netti committed
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
Even for units belonging to the last category, we can enforce arbitrary MQTT topics by enabling the _enforceTopics_ flag.
Using this, the MQTT prefix and operator part are pre-pended to the unit name and sensor suffix. This is enforced also
in the sub-units of hierarchical units (e.g., for job operators). Recalling the example above, this would lead to the following result:

``` 
MQTT Prefix	/analytics
MQTT Part	/avg1
Unit 		/rack02/node03/cpu00/

Without enforceTopics:
	/rack02/node03/cpu00/sum
With enforceTopics:
	/analytics/avg1/rack02/node03/cpu00/sum
``` 

498
#### Pipelining Operators <a name="pipelining"></a>
499

500
501
502
The inputs and outputs of streaming operators can be chained so as to form a processing pipeline. To enable this, users
need to configure operators by enabling the _relaxed_ configuration parameter, and by selecting as input the output sensors
of other operators. This is necessary as the operators are instantiated sequentially at startup, and
503
504
the framework cannot infer the correct order of initialization so as to resolve all dependencies transparently.

505
> NOTE &ensp;&ensp;&ensp;&ensp;&ensp; This feature is not supported when using operators in _on demand_ mode.
506
507

## Rest API <a name="restApi"></a>
508
509
Wintermute provides a REST API that can be used to perform various management operations on the framework. The 
API is functionally identical to that of DCDB Pusher, and is hosted at the same address. All requests that are targeted
510
511
at the data analytics framework must have a resource path starting with _/analytics_.

Micha Mueller's avatar
Micha Mueller committed
512
513
### List of ressources <a name="listOfRessources"></a>

Micha Mueller's avatar
Micha Mueller committed
514
Prefix `/analytics` left out!
Micha Mueller's avatar
Micha Mueller committed
515
516
517

<table>
  <tr>
Micha Mueller's avatar
Micha Mueller committed
518
    <td colspan="2"><b>Ressource</b></td>
Micha Mueller's avatar
Micha Mueller committed
519
520
521
522
523
524
525
526
527
528
529
530
    <td colspan="2">Description</td>
  </tr>
  <tr>
  	<td>Query</td>
  	<td>Value</td>
  	<td>Opt.</td>
  	<td>Description</td>
  </tr>
</table>

<table>
  <tr>
Micha Mueller's avatar
Micha Mueller committed
531
    <td colspan="2"><b>GET /help</b></td>
Micha Mueller's avatar
Micha Mueller committed
532
533
534
535
536
537
538
539
540
    <td colspan="2">Return a cheatsheet of possible analytics REST API endpoints.</td>
  </tr>
  <tr>
  	<td colspan="4">No queries.</td>
  </tr>
</table>

<table>
  <tr>
Micha Mueller's avatar
Micha Mueller committed
541
    <td colspan="2"><b>GET /plugins</b></td>
Micha Mueller's avatar
Micha Mueller committed
542
543
544
545
546
    <td colspan="2">List all currently loaded data analytic plugins.</td>
  </tr>
  <tr>
  	<td>json</td>
  	<td>"true"</td>
Micha Mueller's avatar
Micha Mueller committed
547
  	<td>Yes</td>
Micha Mueller's avatar
Micha Mueller committed
548
549
550
551
552
553
  	<td>Format response as json.</td>
  </tr>
</table>

<table>
  <tr>
Micha Mueller's avatar
Micha Mueller committed
554
    <td colspan="2"><b>GET /sensors</b></td>
Micha Mueller's avatar
Micha Mueller committed
555
556
557
558
    <td colspan="2">List all sensors of a specific plugin.</td>
  </tr>
  <tr>
  	<td>plugin</td>
559
  	<td>All operator plugin names.</td>
Micha Mueller's avatar
Micha Mueller committed
560
  	<td>No</td>
Micha Mueller's avatar
Micha Mueller committed
561
562
563
  	<td>Specify the plugin.</td>
  </tr>
  <tr>
564
565
  	<td>operator</td>
  	<td>All operators of a plugin.</td>
Micha Mueller's avatar
Micha Mueller committed
566
  	<td>Yes</td>
567
  	<td>Restrict sensor list to an operator.</td>
Micha Mueller's avatar
Micha Mueller committed
568
569
570
571
  </tr>
  <tr>
  	<td>json</td>
  	<td>"true"</td>
Micha Mueller's avatar
Micha Mueller committed
572
  	<td>Yes</td>
Micha Mueller's avatar
Micha Mueller committed
573
574
575
576
577
578
  	<td>Format response as json.</td>
  </tr>
</table>

<table>
  <tr>
Micha Mueller's avatar
Micha Mueller committed
579
    <td colspan="2"><b>GET /units</b></td>
Micha Mueller's avatar
Micha Mueller committed
580
581
582
583
    <td colspan="2">List all units of a specific plugin.</td>
  </tr>
  <tr>
  	<td>plugin</td>
584
  	<td>All operator plugin names.</td>
Micha Mueller's avatar
Micha Mueller committed
585
  	<td>No</td>
Micha Mueller's avatar
Micha Mueller committed
586
587
588
  	<td>Specify the plugin.</td>
  </tr>
  <tr>
589
590
  	<td>operator</td>
  	<td>All operators of a plugin.</td>
Micha Mueller's avatar
Micha Mueller committed
591
  	<td>Yes</td>
592
  	<td>Restrict unit list to an operator.</td>
Micha Mueller's avatar
Micha Mueller committed
593
594
595
596
  </tr>
  <tr>
  	<td>json</td>
  	<td>"true"</td>
Micha Mueller's avatar
Micha Mueller committed
597
  	<td>Yes</td>
Micha Mueller's avatar
Micha Mueller committed
598
599
600
601
602
603
  	<td>Format response as json.</td>
  </tr>
</table>

<table>
  <tr>
604
605
    <td colspan="2"><b>GET /operators</b></td>
    <td colspan="2">List all operators of a specific plugin.</td>
Micha Mueller's avatar
Micha Mueller committed
606
607
608
  </tr>
  <tr>
  	<td>plugin</td>
609
  	<td>All operator plugin names.</td>
Micha Mueller's avatar
Micha Mueller committed
610
  	<td>No</td>
Micha Mueller's avatar
Micha Mueller committed
611
612
613
614
615
  	<td>Specify the plugin.</td>
  </tr>
  <tr>
  	<td>json</td>
  	<td>"true"</td>
Micha Mueller's avatar
Micha Mueller committed
616
  	<td>Yes</td>
Micha Mueller's avatar
Micha Mueller committed
617
618
619
620
621
622
  	<td>Format response as json.</td>
  </tr>
</table>

<table>
  <tr>
Micha Mueller's avatar
Micha Mueller committed
623
    <td colspan="2"><b>PUT /start</b></td>
624
    <td colspan="2">Start all or only a specific plugin. Or only start a specific streaming operator within a specific plugin.</td>
Micha Mueller's avatar
Micha Mueller committed
625
626
627
628
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
Micha Mueller's avatar
Micha Mueller committed
629
  	<td>Yes</td>
Micha Mueller's avatar
Micha Mueller committed
630
631
632
  	<td>Specify the plugin.</td>
  </tr>
  <tr>
633
634
  	<td>operator</td>
  	<td>All operator names of a plugin.</td>
Micha Mueller's avatar
Micha Mueller committed
635
  	<td>Yes</td>
636
  	<td>Only start the specified operator. Requires a plugin to be specified. Limited to streaming operators.</td>
Micha Mueller's avatar
Micha Mueller committed
637
638
639
640
641
  </tr>
</table>

<table>
  <tr>
Micha Mueller's avatar
Micha Mueller committed
642
    <td colspan="2"><b>PUT /stop</b></td>
643
    <td colspan="2">Stop all or only a specific plugin. Or only stop a specific streaming operator within a specific plugin.</td>
Micha Mueller's avatar
Micha Mueller committed
644
645
646
647
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
Micha Mueller's avatar
Micha Mueller committed
648
  	<td>Yes</td>
Micha Mueller's avatar
Micha Mueller committed
649
650
651
  	<td>Specify the plugin.</td>
  </tr>
  <tr>
652
653
  	<td>operator</td>
  	<td>All operator names of a plugin.</td>
Micha Mueller's avatar
Micha Mueller committed
654
  	<td>Yes</td>
655
  	<td>Only stop the specified operator. Requires a plugin to be specified. Limited to streaming operators.</td>
Micha Mueller's avatar
Micha Mueller committed
656
657
658
659
660
  </tr>
</table>

<table>
  <tr>
Micha Mueller's avatar
Micha Mueller committed
661
    <td colspan="2"><b>PUT /reload</b></td>
662
    <td colspan="2">Reload configuration and initialization of all or only a specific operator plugin.</td>
Micha Mueller's avatar
Micha Mueller committed
663
664
665
666
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
Micha Mueller's avatar
Micha Mueller committed
667
  	<td>Yes</td>
Micha Mueller's avatar
Micha Mueller committed
668
669
670
671
672
673
  	<td>Reload only the specified plugin.</td>
  </tr>
</table>

<table>
  <tr>
Micha Mueller's avatar
Micha Mueller committed
674
    <td colspan="2"><b>PUT /compute</b></td>
675
    <td colspan="2">Query the given operator for a certain input unit. Intended for "on-demand" operators, but works with "streaming" operators as well.</td>
Micha Mueller's avatar
Micha Mueller committed
676
677
678
679
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
Micha Mueller's avatar
Micha Mueller committed
680
  	<td>No</td>
Micha Mueller's avatar
Micha Mueller committed
681
682
683
  	<td>Specify the plugin.</td>
  </tr>
  <tr>
684
685
  	<td>operator</td>
  	<td>All operator names of a plugin.</td>
Micha Mueller's avatar
Micha Mueller committed
686
  	<td>No</td>
687
  	<td>Specify the operator within the plugin.</td>
Micha Mueller's avatar
Micha Mueller committed
688
689
690
691
  </tr>
  <tr>
  	<td>unit</td>
  	<td>All units of a plugin.</td>
Micha Mueller's avatar
Micha Mueller committed
692
  	<td>Yes</td>
Micha Mueller's avatar
Micha Mueller committed
693
694
695
696
697
  	<td>Select the target unit. Defaults to the root unit if not specified.</td>
  </tr>
  <tr>
  	<td>json</td>
  	<td>"true"</td>
Micha Mueller's avatar
Micha Mueller committed
698
  	<td>Yes</td>
Micha Mueller's avatar
Micha Mueller committed
699
700
701
702
703
704
  	<td>Format response as json.</td>
  </tr>
</table>

<table>
  <tr>
705
706
    <td colspan="2"><b>PUT /operator</b></td>
    <td colspan="2">Perform a custom REST PUT action defined at operator level. See operator plugin documenation for such actions.</td>
Micha Mueller's avatar
Micha Mueller committed
707
708
709
710
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
Micha Mueller's avatar
Micha Mueller committed
711
  	<td>No</td>
Micha Mueller's avatar
Micha Mueller committed
712
713
714
715
  	<td>Specify the plugin.</td>
  </tr>
  <tr>
  	<td>action</td>
716
  	<td>See operator plugin documentation.</td>
Micha Mueller's avatar
Micha Mueller committed
717
  	<td>No</td>
Micha Mueller's avatar
Micha Mueller committed
718
719
720
  	<td>Select custom action.</td>
  </tr>
  <tr>
721
722
  	<td>operator</td>
  	<td>All operators of a plugin.</td>
Micha Mueller's avatar
Micha Mueller committed
723
  	<td>Yes</td>
724
  	<td>Specify the operator within the plugin.</td>
Micha Mueller's avatar
Micha Mueller committed
725
726
727
728
729
730
  </tr>
  <tr>
  	<td colspan="4">Custom action may require or allow for more queries!</td>
  </tr>
</table>

Alessio Netti's avatar
Alessio Netti committed
731
> NOTE&ensp;&ensp;&ensp;&ensp;&ensp;&ensp;&ensp; Opt. = Optional
Micha Mueller's avatar
Micha Mueller committed
732

Alessio Netti's avatar
Alessio Netti committed
733
> NOTE 2 &ensp;&ensp;&ensp;&ensp;&ensp; The value of operator output sensors can be retrieved with the _compute_ resource, or with the _/average_ resource defined in the DCDB Pusher REST API.
Micha Mueller's avatar
Micha Mueller committed
734

735
> NOTE 3 &ensp;&ensp;&ensp;&ensp;&ensp; Developers can integrate their custom REST API resources that are plugin-specific, by implementing the _REST_ method in _OperatorTemplate_. To know more about plugin-specific resources, please refer to the respective documentation. 
736

Alessio Netti's avatar
Alessio Netti committed
737
738
> NOTE 4 &ensp;&ensp;&ensp;&ensp;&ensp; When operators employ a _root_ unit (e.g., when the Unit System is not used or a _globalOutput_ block is defined in regular operators) the _unit_ query can be omitted when performing a _/compute_ action.

739
740
741
### Rest Examples <a name="restExamples"></a>
In the following are some examples of REST requests over HTTPS:

Alessio Netti's avatar
Alessio Netti committed
742
* Listing the units associated to the _avgoperator1_ operator in the _aggregator_ plugin:
743
```bash
Alessio Netti's avatar
Alessio Netti committed
744
GET https://localhost:8000/analytics/units?plugin=aggregator;operator=avgOperator1
745
```
Alessio Netti's avatar
Alessio Netti committed
746
* Listing the output sensors associated to all operators in the _aggregator_ plugin:
747
```bash
Alessio Netti's avatar
Alessio Netti committed
748
GET https://localhost:8000/analytics/sensors?plugin=aggregator;
749
```
Alessio Netti's avatar
Alessio Netti committed
750
* Reloading the _aggregator_ plugin:
751
```bash
Alessio Netti's avatar
Alessio Netti committed
752
PUT https://localhost:8000/analytics/reload?plugin=aggregator
753
```
Alessio Netti's avatar
Alessio Netti committed
754
* Stopping the _avgOperator1_ operator in the _aggregator_ plugin:
755
```bash
Alessio Netti's avatar
Alessio Netti committed
756
PUT https://localhost:8000/analytics/stop?plugin=aggregator;operator=avgOperator1
757
```
Alessio Netti's avatar
Alessio Netti committed
758
* Performing a query for unit _/node00/cpu03/_ to the _avgOperator1_ operator in the _aggregator_ plugin:
759
```bash
Alessio Netti's avatar
Alessio Netti committed
760
PUT https://localhost:8000/analytics/compute?plugin=aggregator;operator=avgOperator1;unit=/node00/cpu03/
761
762
```

763
764
> NOTE &ensp;&ensp;&ensp;&ensp;&ensp; The analytics RestAPI requires authentication credentials as well.

765
# Plugins <a name="plugins"></a>
766
Here we describe available plugins in Wintermute, and how to configure them.
767

Alessio Netti's avatar
Alessio Netti committed
768
769
## Aggregator Plugin <a name="averagePlugin"></a>
The _Aggregator_ plugin implements simple data processing algorithms. Specifically, this plugin allows to perform basic
770
aggregation operations over a set of input sensors, which are then written as output.
Alessio Netti's avatar
Alessio Netti committed
771
The configuration parameters specific to the _Aggregator_ plugin are the following:
772
773
774
775

| Value | Explanation |
|:----- |:----------- |
| window | Length in milliseconds of the time window that is used to retrieve recent readings for the input sensors, starting from the latest one.
776

777
Additionally, output sensors in operators of the Aggregator plugin accept the following parameters:
778
779
780

| Value | Explanation |
|:----- |:----------- |
781
| operation | Operation to be performed over the input sensors. Can be "sum", "average", "maximum", "minimum", "std", "percentiles" or "observations".
782
| percentile |  Specific percentile to be computed when using the "percentiles" operation. Can be an integer in the (0,100) range.
783
784
| relative | If true, the _relative_ query mode will be used. Otherwise the _absolute_ mode is used.

785

Alessio Netti's avatar
Alessio Netti committed
786
787
788
789
## Job Aggregator Plugin <a name="jobaveragePlugin"></a>

The _Job Aggregator_ plugin offers the same functionality as the _Aggregator_ plugin, but on a per-job basis. As such,
it performs aggregation of the specified input sensors across all nodes on which each job is running. Please refer
790
to the corresponding [section](#joboperators) for more details.
Alessio Netti's avatar
Alessio Netti committed
791

792
793
> NOTE &ensp;&ensp;&ensp;&ensp;&ensp; The Job Aggregator plugin does not support the _relative_ option supported by the Aggregator plugin, and always uses the _absolute_ sensor query mode.

794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
## Regressor Plugin <a name="regressorPlugin"></a>

The _Regressor_ plugin is able to perform regression of sensors, by using _random forest_ machine learning predictors. The algorithmic backend is provided by the OpenCV library.
For each input sensor in a certain unit, statistical features from recent values of its time series are computed. These are the average, standard deviation, sum of differences, 25th quantile and 75th quantile. These statistical features are then combined together in a single _feature vector_. 

In order to operate correctly, the models used by the regressor plugin need to be trained: this procedure is performed automatically online when using the _streaming_ mode, and can be triggered arbitrarily over the REST API. In _on demand_ mode, automatic training cannot be performed, and as such a pre-trained model must be loaded from a file.
The following are the configuration parameters available for the _Regressor_ plugin:

| Value | Explanation |
|:----- |:----------- |
| window | Length in milliseconds of the time window that is used to retrieve recent readings for the input sensors, starting from the latest one.
| trainingSamples | Number of samples necessary to perform training of the current model.
| targetDistance | Temporal distance (in terms of lags) of the sample that is to be predicted.
| inputPath | Path of a file from which a pre-trained random forest model must be loaded.
| outputPath | Path of a file to which the random forest model trained at runtime must be saved.
809
| getImportances | If true, the random forest will also compute feature importance values when trained, which are printed.
810
811
812

> NOTE &ensp;&ensp;&ensp;&ensp;&ensp; When the _duplicate_ option is enabled, the _outputPath_ field is ignored to avoid file collisions from multiple regressors.

Alessio Netti's avatar
Alessio Netti committed
813
> NOTE 2 &ensp;&ensp;&ensp;&ensp;&ensp; When loading the model from a file and _getImportances_ is set to true, importance values will be printed only if the original model had this feature enabled upon training.
814

815
Additionally, input sensors in operators of the Regressor plugin accept the following parameter:
816
817
818

| Value | Explanation |
|:----- |:----------- |
819
| target | Boolean value. If true, this sensor represents the target for regression. Every unit in operators of the regressor plugin must have excatly one target sensor.
820
821
822
823
824
825

Finally, the Regressor plugin supports the following additional REST API action:

| Action | Explanation |
|:----- |:----------- |
| train | Triggers a new training phase for the random forest model. Feature vectors are temporarily collected in-memory until _trainingSamples_ vectors are obtained. Until this moment, the old random forest model is still used to perform prediction.
826
| importances | Returns the sorted importance values for the input features, together with the respective labels, if available.
827

828
## Tester Plugin <a name="testerPlugin"></a>
Alessio Netti's avatar
Alessio Netti committed
829
The _Tester_ plugin can be used to test the functionality and performance of the query engine, as well as of the Unit System. It will perform a specified number of queries over the set of input sensors for each unit, and then output as a sensor the total number of retrieved readings. The following are the configuration parameters for operators in the _Tester_ plugin:
830
831
832
833
834
835
836
837

| Value | Explanation |
|:----- |:----------- |
| window | Length in milliseconds of the time window that is used to retrieve recent readings for the input sensors, starting from the latest one.
| queries | Number of queries to be performed at each computation interval. If more than the number of input sensors per unit, these will be looped over multiple times.
| relative | If true, the _relative_ query mode will be used. Otherwise the _absolute_ mode is used.

# Sink Plugins <a name="sinkplugins"></a>
838
Here we describe available plugins in Wintermute that are devoted to the output of sensor data (_sinks_), and that do not perform any analysis.
839

840
841
## File Sink Plugin <a name="filesinkPlugin"></a>
The _File Sink_ plugin allows to write the output of any other sensor to the local file system. As such, it does not produce output sensors by itself, and only reads from input sensors.
Alessio Netti's avatar
Alessio Netti committed
842
The input sensors can either be fully qualified, or can be described through the Unit System. In this case, multiple input sensors can be generated automatically, and the respective output paths need to be adjusted by enabling the _autoName_ attribute described below, to prevent multiple sensors from being written to the same file. The file sink operators (named sinks) support the following attributes:
843
844
845
846
847
848
849
850
851
852
853

| Value | Explanation |
|:----- |:----------- |
| autoName | Boolean. If false, the output paths associated to sensors are interpreted literally, and a file is opened for them. If true, only the part in the path describing the current directory is used, while the file itself is named accordingly to the MQTT topic of the specific sensor.

Additionally, input sensors in sinks accept the following parameters:

| Value | Explanation |
|:----- |:----------- |
| path | The path to which the sensors's readings should be written. It is interpreted as described above for the _autoName_ attribute.

854
## Writing DCDB Analytics Plugins <a name="writingPlugins"></a>
855
Generating a DCDB Wintermute plugin requires implementing a _Operator_ and _Configurator_ class which contain all logic
856
tied to the specific plugin. Such classes should be derived from _OperatorTemplate_ and _OperatorConfiguratorTemplate_
857
respectively, which contain all plugin-agnostic configuration and runtime features. Please refer to the documentation 
Alessio Netti's avatar
Alessio Netti committed
858
of the _Aggregator_ plugin for an overview of how a basic plugin can be implemented.