README.md 46.6 KB
Newer Older
1
2
3
# DCDB Pusher

### Table of contents
4
5
6
7
1. [Introduction](#introduction)
2. [dcdbpusher](#dcdbpusher)
    1. [Global Configuration](#globalConfiguration)
    2. [Rest API](#restApi)
Micha Mueller's avatar
Micha Mueller committed
8
        1. [List of ressources](#listOfRessources)
Micha Mueller's avatar
Micha Mueller committed
9
        2. [Examples](#restExamples)
10
11
12
13
14
15
16
17
18
19
20
21
    3. [MQTT topic](#mqttTopic)
3. [Plugins](#plugins)
    1. [IPMI](#ipmi)
    2. [Perf-event](#perf)
        1. [type and config](#perfTypeConfig)
        2. [Footnotes](#perfFootnotes)
    3. [SNMP](#snmp)
    4. [SysFS](#sysfs)
    5. [PDU](#pdu)
    6. [BACnet](#bacnet)
    7. [OPA](#opa)
        1. [counterData](#opaCounterData)
Alessio Netti's avatar
Alessio Netti committed
22
    8. [ProcFS](#procfs)
23
    9. [Caliper](#caliper)
24
25
    10. [Metadata Management](#metadataManagement)
    11. [Writing own plugins](#writingOwnPlugins)
26
27

## Introduction <a name="introduction"></a>
Micha Mueller's avatar
Micha Mueller committed
28
29
30
DCDB (DataCenter DataBase) is a database to collect various (sensor-)values of a datacenter for further analysis.
Harvesting of the data is task of the dcdbpusher.

31
# dcdbpusher <a name="dcdbpusher"></a>
Micha Mueller's avatar
Micha Mueller committed
32

Micha Mueller's avatar
Micha Mueller committed
33
This is a general MQTT pusher which sends values of various sensors to the DCDB-database.
34
It ships with plugins for BACnet, IPMI, PDU(proprietary Power Delivery Unit, but could be used as XML plugin), perfcounter, SNMP and sysFS.
Micha Mueller's avatar
Micha Mueller committed
35
36
37
38
39
40
41
42
43

Build it by simply running
```bash
make
```
or alternatively use
```bash
make debug
```
Micha Mueller's avatar
Micha Mueller committed
44
within the `dcdbpusher` directory to build a version which will print additional debug-information during runtime.
Micha Mueller's avatar
Micha Mueller committed
45

46
The logic for the various sensors is encapsulated into plugins (shared dynamic libraries; the makefile will take care of compiling them for you). The dcdbpusher will dynamically open the libraries if they are specified in the [global configuration](#GC) file. Vice versa, if selected sensor-functionality, e.g. sysFS is not specified, the corresponding shared library libdcdbplugin_sysfs.so does not have to be present. 
Micha Mueller's avatar
Micha Mueller committed
47

Micha Mueller's avatar
Micha Mueller committed
48
49
50
51
You can run dcdbpusher by executing
```bash
./dcdbpusher path/to/configfile/
```
52
or run
Micha Mueller's avatar
Micha Mueller committed
53
54
55
```bash
./dcdbpusher -h
```
Micha Mueller's avatar
Micha Mueller committed
56
to print the help-section of dcdbpusher.
Micha Mueller's avatar
Micha Mueller committed
57

Alessio Netti's avatar
Alessio Netti committed
58
Dcdbpusher will check the given file-path for the global configuration file which has to be named `dcdbpusher.conf`.
Micha Mueller's avatar
Micha Mueller committed
59

60
### Global Configuration  <a name="globalConfiguration"></a>
Micha Mueller's avatar
Micha Mueller committed
61

Micha Mueller's avatar
Micha Mueller committed
62
The global configuration specifies various settings for dcdbpusher in general, e.g. which plugins should be loaded etc.
Alessio Netti's avatar
Alessio Netti committed
63
Please have a look at the provided `config/dcdbpusher.conf` example to get familiar with the file scheme. The example also forms a good starting point for writing a custom `dcdbpusher.conf`. The different sections and values are explained in the following table:
Micha Mueller's avatar
Micha Mueller committed
64

Micha Mueller's avatar
Micha Mueller committed
65
| Value | Explanation |
Micha Mueller's avatar
Micha Mueller committed
66
67
|:----- |:----------- |
| global | Wrapper structure for the global values.
Micha Mueller's avatar
Micha Mueller committed
68
69
| mqttBroker | Define address and port of the MQTT-broker which collects the messages (sensor values) send by dcdbpusher.
| mqttprefix | To not rewrite a full MQTT-topic for every sensor one can specify here a consistent prefix.
70
| sensorpattern | pattern used to perform automatic sensor name publishing. See the corresponding [section](#autopublish) for more information.
Micha Mueller's avatar
Micha Mueller committed
71
| threads | Specify how many threads should be created to handle the sensors async. Default value of threads is 1. Note that the MQTTPusher always starts an extra thread. So the actual number of started threads is always one more than defined here. Specifying not enough threads can result in a delay for some sensors until they are read.
72
| maxMsgNum | To avoid publishing too many MQTT messages at once you can define here a maximum count of values that are published in one turn. After reaching this limit the MQTTPusher will be forced to sleep for a short time before continuing.
73
74
|maxInflightMsgNum|Maximum number of messages that can be "inflight". This is a MQTT term and should match the broker's setting. Set to 0 for unlimited.
|maxQueuedMsgNum|Maximum number of MQTT messages (including "inflight") that should be queued. This is to limit the amount of memory that is used for buffering. Set to 0 for unlimited.
Micha Mueller's avatar
Micha Mueller committed
75
76
77
78
| verbosity | Level of detail in the logfile (dcdb.log). Set to a value between 5 (all log-messages, default) and 0 (only fatal messages). NOTE: level of verbosity for the command-line log can be set via the -v flag independently when invoking dcdbpusher.
| daemonize | Set to 'true' if dcdbpusher should run detached as daemon. Default is false.
| tempdir | One can specify a writeable directory where dcdbpusher can write its temporary and logging files to. Default is the current (' ./ ' ) directory.
| cacheInterval | Define a time interval in seconds. The last sensor readings within this time interval will be kept. This value can be overwritten by plugins.
Micha Mueller's avatar
Micha Mueller committed
79
| | |
80
| restAPI | Bundles all values related to the RestAPI. See the corresponding [section](#restApi) for more information on supported functionality.
81
| address | Define (IP-)address and port where the REST API server should run on.
82
83
84
| certificate | Provide the (path and) file which the HTTPS server should use as certificate.
| privateKey | Provide the (path and) file which should be used as corresponding private key for the certificate. If private key and certificate are stored in the same file one should nevertheless provide the path to the cert-file here again.
| dhFile | Provide the (path and) file where Diffie-Hellman parameters for the key exchange are stored.
Micha Mueller's avatar
Micha Mueller committed
85
| authkey | This struct is used to define authentication key tokens for the REST API. Within the struct, define which operations over the REST API are allowed for the token (e.g. PUTReq or GETReq). Each token must be unique.
Micha Mueller's avatar
Micha Mueller committed
86
87
| | |
| plugins | In this section one can specify the plugins which should be used.
Micha Mueller's avatar
Micha Mueller committed
88
| plugin _name_ | The plugin name is used to build the corresponding lib-name (e.g. sysfs --> libdcdbplugin_sysfs.1.0)
Micha Mueller's avatar
Micha Mueller committed
89
| path | Specify the path where the plugin (the shared library) is located. If left empty, dcdbpusher will look in the default lib-directories (usr/lib and friends) for the plugin-file.
Alessio Netti's avatar
Alessio Netti committed
90
| config | One can specify a separate config-file (including path to it) for the plugin to use. If not specified, dcdbpusher will look up pluginName.conf (e.g. sysfs.conf) in the same directory where dcdbpusher.conf is located.
Micha Mueller's avatar
Micha Mueller committed
91
92
| | |

93
Formats of the other sensor-specific config-files are explained in the corresponding [subsections](#ipmi). Example configuration-files can be found in the `config/` directory.
Micha Mueller's avatar
Micha Mueller committed
94
95


96
## REST API <a name="restApi"></a>
97

98
Dcdbpusher runs a HTTPS server which provides some functionality to be controlled over a RESTful API. The API is by default hosted at port 8000 on 127.0.0.1 but the address can be changed in the [`dcdbpusher.conf`](#globalConfiguration).
99
100

A HTTPS request to dcdbpusher should have the following format: `[GET|PUT] host:port[ressource]?[queries]`.
101
Tables with allowed ressources sorted by REST methods can be found below. A query consists of a key-value pair of the format `key=value`. Multiple queries are separated by semicolons(';'). For all requests (except /help) basic authentication credentials must be provided.
102

Micha Mueller's avatar
Micha Mueller committed
103
### List of ressources <a name="listOfRessources"></a>
104

105
106
<table>
  <tr>
Micha Mueller's avatar
Micha Mueller committed
107
    <td colspan="2"><b>Ressource</b></td>
Micha Mueller's avatar
Micha Mueller committed
108
    <td colspan="2">Description</td>
109
110
  </tr>
  <tr>
Micha Mueller's avatar
Micha Mueller committed
111
112
113
114
115
  	<td>Query</td>
  	<td>Value</td>
  	<td>Opt.</td>
  	<td>Description</td>
  </tr>
Micha Mueller's avatar
Micha Mueller committed
116
117
118
</table>

<table>
Micha Mueller's avatar
Micha Mueller committed
119
  <tr>
Micha Mueller's avatar
Micha Mueller committed
120
    <td colspan="2"><b>GET /help</b></td>
Micha Mueller's avatar
Micha Mueller committed
121
    <td colspan="2">Return a cheatsheet of possible REST API endpoints.</td>
Micha Mueller's avatar
Micha Mueller committed
122
123
124
125
  </tr>
  <tr>
  	<td colspan="4">No queries.</td>
  </tr>
Micha Mueller's avatar
Micha Mueller committed
126
127
128
</table>

<table>
Micha Mueller's avatar
Micha Mueller committed
129
  <tr>
Micha Mueller's avatar
Micha Mueller committed
130
    <td colspan="2"><b>GET /plugins</b></td>
Micha Mueller's avatar
Micha Mueller committed
131
132
133
134
135
    <td colspan="2">List all loaded dcdbpusher plugins.</td>
  </tr>
  <tr>
  	<td>json</td>
  	<td>"true"</td>
Micha Mueller's avatar
Micha Mueller committed
136
  	<td>Yes</td>
Micha Mueller's avatar
Micha Mueller committed
137
138
  	<td>Format response as json.</td>
  </tr>
Micha Mueller's avatar
Micha Mueller committed
139
140
141
</table>

<table>
Micha Mueller's avatar
Micha Mueller committed
142
  <tr>
Micha Mueller's avatar
Micha Mueller committed
143
    <td colspan="2"><b>GET /sensors</b></td>
Micha Mueller's avatar
Micha Mueller committed
144
145
146
147
148
    <td colspan="2">List all sensors of a specific plugin.</td>
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
Micha Mueller's avatar
Micha Mueller committed
149
  	<td>No</td>
Micha Mueller's avatar
Micha Mueller committed
150
151
152
153
154
  	<td>Specify the plugin.</td>
  </tr>
  <tr>
  	<td>json</td>
  	<td>"true"</td>
Micha Mueller's avatar
Micha Mueller committed
155
  	<td>Yes</td>
Micha Mueller's avatar
Micha Mueller committed
156
157
  	<td>Format response as json.</td>
  </tr>
Micha Mueller's avatar
Micha Mueller committed
158
159
160
</table>

<table>
Micha Mueller's avatar
Micha Mueller committed
161
  <tr>
Micha Mueller's avatar
Micha Mueller committed
162
    <td colspan="2"><b>GET /average</b></td>
163
    <td colspan="2">Get the average of the last readings of a sensor. Also allows access to analytics sensors.</td>
Micha Mueller's avatar
Micha Mueller committed
164
165
166
167
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
Micha Mueller's avatar
Micha Mueller committed
168
  	<td>No</td>
Micha Mueller's avatar
Micha Mueller committed
169
170
171
172
  	<td>Specify the plugin.</td>
  </tr>
  <tr>
  	<td>sensor</td>
173
  	<td>All sensor names of the plugin or the operator manager.</td>
Micha Mueller's avatar
Micha Mueller committed
174
  	<td>No</td>
Micha Mueller's avatar
Micha Mueller committed
175
176
177
178
179
  	<td>Specify the sensor within the plugin.</td>
  </tr>
  <tr>
  	<td>interval</td>
  	<td>Number of seconds.</td>
Micha Mueller's avatar
Micha Mueller committed
180
  	<td>Yes</td>
Micha Mueller's avatar
Micha Mueller committed
181
182
  	<td>Use only readings more recent than (now - interval) for average calculation. Defaults to zero, i.e. all cached sensor readings are included in average calculation.</td>
  </tr>
Micha Mueller's avatar
Micha Mueller committed
183
184
</table>

Alessio Netti's avatar
Alessio Netti committed
185
186
187
188
189
190
191
192
193
194
195
196
197
<table>
  <tr>
    <td colspan="2"><b>PUT /quit</b></td>
    <td colspan="2">Exits the Pusher with a user-specified return code.</td>
  </tr>
  <tr>
  	<td>code</td>
  	<td>Return code.</td>
  	<td>Yes</td>
  	<td>Return code to be used when exiting.</td>
  </tr>
</table>

198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
<table>
  <tr>
    <td colspan="2"><b>PUT /load</b></td>
    <td colspan="2">Load and intitialize a new plugin but do not start it. Use the /start request to kick off the plugin's data collection.</td>
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>Plugin name.</td>
  	<td>No</td>
  	<td>Name of the new plugin. Is used to build the shared library file name which holds the plugin. Shared lib file name is of the form libdcdbplugin_PLUGINNAME.so (or .dylib for Apple).</td>
  </tr>
  <tr>
  	<td>path</td>
  	<td>A file path.</td>
  	<td>Yes</td>
  	<td>Path to where the shared library for the plugin is located. If not specified the default library directories (urs/lib and friends) are searched.</td>
  </tr>
  <tr>
  	<td>config</td>
  	<td>A file path including file name.</td>
  	<td>Yes</td>
  	<td>Path and name of the plugin configuration file. If not specified we will search for "./PLUGINNAME.conf".</td>
  </tr>
</table>

<table>
  <tr>
    <td colspan="2"><b>PUT /unload</b></td>
    <td colspan="2">Unload a plugin, removing it completely from dcdbpusher. To use the plugin again one has to /load it first.</td>
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
  	<td>No</td>
  	<td>Specify the plugin.</td>
  </tr>
</table>

Micha Mueller's avatar
Micha Mueller committed
236
<table>
Micha Mueller's avatar
Micha Mueller committed
237
  <tr>
238
239
    <td colspan="2"><b>PUT /reload</b></td>
    <td colspan="2">Reload a plugin's configuration (includes fresh creation of a plugin's sensors and a plugin restart).</td>
Micha Mueller's avatar
Micha Mueller committed
240
241
242
243
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
Micha Mueller's avatar
Micha Mueller committed
244
  	<td>No</td>
Micha Mueller's avatar
Micha Mueller committed
245
246
  	<td>Specify the plugin.</td>
  </tr>
Micha Mueller's avatar
Micha Mueller committed
247
248
249
</table>

<table>
Micha Mueller's avatar
Micha Mueller committed
250
  <tr>
251
252
    <td colspan="2"><b>POST /start</b></td>
    <td colspan="2">Start a plugin, i.e. its sensors start polling.</td>
Micha Mueller's avatar
Micha Mueller committed
253
254
255
256
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
Micha Mueller's avatar
Micha Mueller committed
257
  	<td>No</td>
Micha Mueller's avatar
Micha Mueller committed
258
259
  	<td>Specify the plugin.</td>
  </tr>
Micha Mueller's avatar
Micha Mueller committed
260
261
262
</table>

<table>
Micha Mueller's avatar
Micha Mueller committed
263
  <tr>
264
265
    <td colspan="2"><b>POST /stop</b></td>
    <td colspan="2">Stop a plugin, i.e. its sensors stop polling.</td>
Micha Mueller's avatar
Micha Mueller committed
266
267
268
269
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
Micha Mueller's avatar
Micha Mueller committed
270
  	<td>No</td>
Micha Mueller's avatar
Micha Mueller committed
271
  	<td>Specify the plugin.</td>
272
273
274
  </tr>
</table>

Micha Mueller's avatar
Micha Mueller committed
275
> NOTE &ensp;&ensp;&ensp;&ensp;&ensp; Opt. = Optional
276

277
### Examples <a name="restExamples"></a>
278

Micha Mueller's avatar
Micha Mueller committed
279
Two examples for HTTPS requests (authentication credentials not shown):
Micha Mueller's avatar
Micha Mueller committed
280
281

```bash
Micha Mueller's avatar
Micha Mueller committed
282
GET https://localhost:8000/average?plugin=sysfs;sensor=freq1;interval=15
Micha Mueller's avatar
Micha Mueller committed
283
284
```
```bash
Micha Mueller's avatar
Micha Mueller committed
285
PUT https://localhost:8000/stop?plugin=bacnet
Micha Mueller's avatar
Micha Mueller committed
286
```
287

Micha Mueller's avatar
Micha Mueller committed
288
## MQTT topic <a name="mqttTopic"></a>
289

290
For communication between the different DCDB-components (database, dcdbpusher) the [MQTT protocol](https://mqtt.org/) is used. In order to identify each sensor, each has to have a unique MQTT topic assigned. The topic for a sensor is built by appending up to 4 parts:
291
292
293
294
1. mqttprefix    (e.g. /mysystem)
2. mqttpart of entity (if supported by plugin, e.g. /host0)
3. mqttpart of group    (e.g. /eth0)
4. mqttsuffix    (e.g. /xmitdata)
Micha Mueller's avatar
Micha Mueller committed
295

296
Then the topic for the sensor is /mysystem/host0/eth0/xmitdata.
297

298
299
300
301
Additionally, sensors can be published automatically to the Storage Backend under their specified MQTT topics, by using the _auto-publish_ feature. Such feature is enabled via the _-a_ switch to DCDB Pusher. This way, the metadata tables in the Storage Backend will be populated with the information of all instantiated sensors, and these will become visible for queries.



302
# Plugins <a name ="plugins"></a>
Micha Mueller's avatar
Micha Mueller committed
303

Micha Mueller's avatar
Micha Mueller committed
304
The core of dcdbpusher is responsible of collecting all the values read by the sensors and sending them to the database. However, the main functionality of the sensors comes from the various plugins. Every plugin corresponds to a special sensor functionality.
305
All the different plugins share some same general principles in common regarding the sensor structure and configuration. Those principles should also be obeyed when [writing own plugins](#writingOwnPlugins).
Micha Mueller's avatar
Micha Mueller committed
306
1. There are three hierarchical levels (from bottom up):
Micha Mueller's avatar
Micha Mueller committed
307
308
309
    1. Sensors
    2. Groups
    3. Entities (optional)
Micha Mueller's avatar
Micha Mueller committed
310
311
2. There are no sensors on its own. Every sensor belongs to a group.
3. Multiple groups may or may not be aggregated by an entity. Entities can be optionally used by the plugin developer to aggregate groups which belong together, e.g. because they all query the same host.
Micha Mueller's avatar
Micha Mueller committed
312
313
314
315
316
317
4. Every hierarchical level is associated with some attributes. In the following are some hints on how one (when developing own plugins) should decide which attributes are associated with which level. Also for every level the common base attributes are listed (with explanation), which are specified independently of a plugin:
    1. Entities (if present) hold all attributes which are required to query the represented entity or all its associated groups have in common. Common entity attributes:
        * __default__     (One can define the name of a template group (see below) whose values and groups should be used as default)
        * Other entity attributes could be: mqttPart, protocol-version, host address and port.
    2. Groups hold all attributes which multiple sensors belonging to it share in common. Common group attributes:
        * __interval__    (Time in [ms] between two consecutive sensor reads. Default is 1000[ms] = 1[s])
318
319
        * __queueSize__   (Maximum number of sensor readings to queue to bridge connectivity issues with the CollectAgent. Default is 1024.
	* __minValues__   (Minimum number of sensor reads the sensors in a group should gather before they are sent together to the database. Useful to reduce MQTT-overhead. Default is 1 (every sensor value is sent on its own))
320
        * __mqttPart__    (Part for the [mqtt-topic](#mqttTopic) all sensors in this group should share in common)
Micha Mueller's avatar
Micha Mueller committed
321
        * __default__     (One can define the name of a template group (see below) whose values and sensors should be used as default)
Micha Mueller's avatar
Micha Mueller committed
322
    3. Sensors hold only those attributes which are necessary to uniquely identify the target sensor. Common base attributes:
323
        * __mqttsuffix__  (to make its [mqtt-topic](#mqttTopic) unique)
324
        * __delta__ (identifies a monotonic sensor. If set to "true", differences between successive readings are collected)
Alessio Netti's avatar
Alessio Netti committed
325
        * __subSampling__ (subsampling factor S. If S>=1, only one reading every S is sent over MQTT, and the others are kept locally. If S<1, readings are never sent out and only kept locally)
326
		* __publish__ (if set to "true", the sensor will be published when the auto-publish feature is enabled. Otherwise it is omitted. Default is "true".)
Micha Mueller's avatar
Micha Mueller committed
327
5. Be aware that naming of sensor/group/entity is not fixed. A plugin developer can name them as he likes, e.g. counter/multicounter/host.
328
6. It is possible to define template sensors, groups, or entities in the config file. To specify a template sensor/group/entity simply prefix its definition with `template_` (see the example below). You can reference them later by using the `default` attribute. A template entity can consist of groups and these in turn can consist of sensors. When using a template, all of its attribute values are copied to the actual sensor. Copied attributes can be overwritten in the actual entity/group/sensor (some of them even should be overwritten, e.g. the mqttPart). Groups/sensors associated with a template are copied to the actual entity/group. One can specify further groups/sensors which are then added to those copied from the template. If a group's/sensor's name is identical to one of the groups/sensors introduced by the template, it will not be added but instead overwrites the corresponding group/sensor of the template (overwrite means: specified attributes replace template attributes. Otherwise template values are kept). This can be used to purposefully overwrite single (attributes of) groups/sensors introduced by a template. Template entitys/groups/sensors themself are never used in live operation of the plugin. They are purely cosmetic for convenient configuration.
Micha Mueller's avatar
Micha Mueller committed
329
330
 
In the following two abstract config files are shown to visualize the structure, one with the optional entity level and one without. A real example configuration file for every plugin should be provided in the `/config` directory. One should use them as a starting point to write own configuration files. 
Micha Mueller's avatar
Micha Mueller committed
331
```
Micha Mueller's avatar
Micha Mueller committed
332
333
334
335
 Without entity:
------------------------------------------------

global {
336
	mqttprefix /myprefix
Micha Mueller's avatar
Micha Mueller committed
337
338
339
340
341
342
343
	cacheInterval 120
	...
}

template_group temp1 {			;template group named temp1 (is not used in live operation)
	interval	1000			;While it is possible define entities/groups/sensors without
	minValues	3				;name it is strictly disregarded. Naming entities/groups/sensors
344
	mqttPart	/aa				;simplifies debugging and especially enables one to reference
Micha Mueller's avatar
Micha Mueller committed
345
346
								;templates later on. Also names should be always unique.
	sensor s1 {
347
		mqttsuffix		/s1
Micha Mueller's avatar
Micha Mueller committed
348
349
350
351
		...						;usually the sensor would require additional attributes
	}

	sensor s2 {
352
		mqttsuffix		/s2
Micha Mueller's avatar
Micha Mueller committed
353
354
355
356
357
358
		...
	}
}

group g1 {
	default		temp1			;use temp1 as template group
359
	mqttPart	/bb				;overwrite the mqttPart from temp1, to avoid identical
Micha Mueller's avatar
Micha Mueller committed
360
361
								;mqtt-topics if another group uses the same template
	sensor s3 {					;g1 has now 3 sensors: s1, s2 (both taken over from temp1)
362
		mqttsuffix		/s3		;and s3
Micha Mueller's avatar
Micha Mueller committed
363
364
365
366
367
368
		...
	}
}

group g2 {						;g2 consists of only one sensor (s21) and uses
 	sensor s21 {				;for every attribute the default value
369
		mqttsuffix	/s21		;by using a longer mqttsuffix we do not need a
Micha Mueller's avatar
Micha Mueller committed
370
371
372
373
374
		...						;group mqtt-part
	}
}

...
Micha Mueller's avatar
Micha Mueller committed
375
376
```

Micha Mueller's avatar
Micha Mueller committed
377
378
379
380
381
382
383
384
```
 With entity:
------------------------------------------------

global {
	...
}

Micha Mueller's avatar
Micha Mueller committed
385
template_entity temp1 {				;template entity which is not used in live operation
Micha Mueller's avatar
Micha Mueller committed
386
387
388
389
390
	...								;here go entity attributes

	group g1 {						
		interval	1000
		minValues	3
391
		mqttPart	/aa
Micha Mueller's avatar
Micha Mueller committed
392
393
		
		sensor s1 {
394
			mqttsuffix		/s1
Micha Mueller's avatar
Micha Mueller committed
395
396
397
398
			...						;usually the sensor would require additional attributes
		}
	
		sensor s2 {
399
			mqttsuffix		/s2
Micha Mueller's avatar
Micha Mueller committed
400
401
402
403
404
405
406
407
408
409
			...
		}
	}
}

entity ent1 {
	default		temp1				;use temp1 as template entity
	
	group g2 {						;ent1 has now two groups (g1 and g2) with a total of
	 	sensor s21 {				;3 sensors (s1, s2, s21)
410
			mqttsuffix	/s21
Micha Mueller's avatar
Micha Mueller committed
411
412
413
414
415
416
417
			...
		}
	}
}

...
```
Alessio Netti's avatar
Alessio Netti committed
418
One should have noticed the global section in the examples which was not mentioned before. In this section the user can (but is not obligated to) overwrite values from the `dcdbpusher.conf` for this plugin or specify other settings which are global for this plugin.
419

420
## IPMI <a name="ipmi"></a>
Micha Mueller's avatar
Micha Mueller committed
421

Micha Mueller's avatar
Micha Mueller committed
422
The [IPMI](https://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface) plugin enables dcdbpusher to collect sensor values offered by a baseboard management controller (BMC).
423

424
Explanation of the values specific for the IPMI plugin:
425

Micha Mueller's avatar
Micha Mueller committed
426
| Value | Explanation |
427
428
429
430
431
|:----- |:----------- |
| sessiontimeout | Session timeout value for the IPMI-connection
| retransmissiontimeout | Retransmission timeout value for the IPMI-connection
| username | For the remote IPMI-connection login credentials are required
| password | For the remote IPMI-connection login credentials are required
432
433
| ipmiversion | IPMI version to use for LAN connections (1 or 2)
| cipher | Cipher to use for IPMI 2.0 LAN connections (currently supported: 0, 1, 2, 3, 6, 7, 8, 11, 12)
434
| cmd | One can define a raw IPMI-command (in hex-notation) to be sent. In this case also the start and stop fields for the response have to be defined. Alternatively, one can define the record-ID of the sensor (see below).
435
436
| lsb | Offset where the least significant byte of the wanted return value of an IPMI raw command in the IPMI response<sup>[1](#ipmifn1)</sup>
| msb | Offset where the most significant byte of the wanted return value of an IPMI raw command in the IPMI response<sup>[1](#ipmifn1)</sup>
Micha Mueller's avatar
Micha Mueller committed
437
| recordId | Define the record-ID of the sensor to be read. One can look up the corresponding record-IDs for every sensor with the "ipmi-sensors" command line tool (ships with the freeipmi-library). Alternatively, one can define a raw IPMI-command (see above).
438
| factor | One can specify a factor to scale the read value before it is stored in the database (to adjust precision).
439
#### Footnotes <a name="ipmiFootnotes"></a>
440

441
<a name="ipmifn1">**1**</a>: &ensp; Use lsb > msb values if response is Little-endian (LSB first), use lsb < msb values if response is Big-Endian (MSB first). Maximum length is 8 bytes.  
442

443
## Perf-event <a name="perf"></a>
Micha Mueller's avatar
Micha Mueller committed
444

Micha Mueller's avatar
Micha Mueller committed
445
The Perfevent functionality is tasked with collecting data from the CPUs various performance counters (PMUs).
446
> NOTE &ensp;&ensp;&ensp; The perf-event plugin measures PMUs for all processes running on a specific CPU. Therefore a value of less than 1 is required in `/proc/sys/kernel/perf_event_paranoid`. Other values (>=1) restrict the access to PMUs. See this [footnote](#fn1) for additional information.
Micha Mueller's avatar
Micha Mueller committed
447

Micha Mueller's avatar
Micha Mueller committed
448
449
450
451
452
453
454
Explanation of the values specific for the perfevent plugin:

| Value | Explanation |
|:----- |:----------- |
| type | Type of which the counter should be. Each type determines different possible values for the config-field. Possible type-values are described below.
| config | Together with the type-field config determines which performance counter should be read. Possible values and what they measure are listed below.
| cpus | One can define a comma-separated list of cpu numbers (also value ranges can be specified, e.g. 2-4 equals 2,3,4). The hardware counter will then be only opened on the specified cpus.
455
| htVal | Specify multiplier for CPU aggregation. All CPUs where (CPU-number % htVal) has the same result are aggregated together. Only CPUs which are included in the "cpus" field (or all CPUs if the "cpus" field is not present) are aggregated. Background: To reduce the amount of pushed sensor data, it is possible to aggregate cpu readings. This feature is specifically aimed at processors which are hyper-threading enabled but can also come in handy for other use cases. Only the values pushed via the MQTT-Pusher are aggregated. There still exist sensors for each CPU and they store unaggregated readings in their local caches.
456
| mqttsufffix | In the context of the perfevent plugin the CPU id is integrated in the suffix. Sensors will be duplicated in order to open hardware counter for each CPU. Therefore an identifier in the style of "/cpuxx" will be pre-prended to the mqttSuffix when building the topics.
Micha Mueller's avatar
Micha Mueller committed
457

Micha Mueller's avatar
Micha Mueller committed
458

459
460
461
> NOTE &ensp;&ensp;&ensp; As perfevent counters are usually always monotonic, the delta attribute is by default set to true for all sensors. One has to explicitly set delta to "off" for a sensor to overwrite this behaviour.


462
### type and config <a name="perfTypeConfig"></a>
Micha Mueller's avatar
Micha Mueller committed
463

Micha Mueller's avatar
Micha Mueller committed
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
(see the [perf_event_open man-page](http://man7.org/linux/man-pages/man2/perf_event_open.2.html) for more detailed explanations)

| Type | Config | Explanation |
|:----:|:------ |:----------- |
| PERF_TYPE_HARDWARE | | generalized hardware CPU events
| " | PERF_COUNT_HW_CPU_CYCLES | total cycles (affected by frequency scaling)
| " | PERF_COUNT_HW_INSTRUCTIONS | retired instructions
| " | PERF_COUNT_HW_CACHE_REFERENCES | cache accesses (usually last level)
| " | PERF_COUNT_HW_CACHE_MISSES | cache misses (usually last level)
| " | PERF_COUNT_HW_BRANCH_INSTRUCTIONS | retired branch instructions
| " | PERF_COUNT_HW_BRANCH_MISSES | mispredicted branch instructions
| " | PERF_COUNT_HW_BUS_CYCLES | bus cycles
| " | PERF_COUNT_HW_STALLED_CYCLES_FRONTEND | stalled cycles during issue
| " | PERF_COUNT_HW_STALLED_CYCLES_BACKEND  | stalled cycles during retirement
| " | PERF_COUNT_HW_REF_CPU_CYCLES | total cycles (unaffected by frequency scaling)
479
| | | |
Micha Mueller's avatar
Micha Mueller committed
480
481
482
483
484
485
486
487
488
489
490
| PERF_TYPE_SOFTWARE | | software events provided by the kernel
| " | PERF_COUNT_SW_CPU_CLOCK | reports CPU clock
| " | PERF_COUNT_SW_TASK_CLOCK | clock count specific to the running task
| " | PERF_COUNT_SW_PAGE_FAULTS | number of page faults
| " | PERF_COUNT_SW_CONTEXT_SWITCHES | count of context switches
| " | PERF_COUNT_SW_CPU_MIGRATIONS | times the process has migrated to a new CPU
| " | PERF_COUNT_SW_PAGE_FAULTS_MIN | number of minor page faults (no disk-I/O)
| " | PERF_COUNT_SW_PAGE_FAULTS_MAJ | number of major page faults (disk-I/O was required)
| " | PERF_COUNT_SW_ALIGNMENT_FAULTS | alignment faults when accessing unaligned memory
| " | PERF_COUNT_SW_EMULATION_FAULTS | number of unimplemented instructions which had to be emulated
| " | PERF_COUNT_SW_DUMMY | placeholder which counts nothing
491
| | | |
Micha Mueller's avatar
Micha Mueller committed
492
493
| PERF_TYPE_TRACEPOINT | | not yet implemented
| PERF_TYPE_HW_CACHE | | not yet implemented
494
| | | |
Micha Mueller's avatar
Micha Mueller committed
495
| PERF_TYPE_RAW | | user can define architecture-specific raw events here.
496
| " | *XXXX* | Config must be a raw event config value, see <sup>[2](#fn2)</sup>
497
| | | |
Micha Mueller's avatar
Micha Mueller committed
498
| PERF_TYPE_BREAKPOINT | --- | config not required, any values will be ignored. However config must still be specified (even if empty)
499
|<Custom>|<Custom>| dynamic PMU event, see <sup>[3](#fn3)</sup>
Micha Mueller's avatar
Micha Mueller committed
500

501
#### Footnotes <a name="perfFootnotes"></a>
Micha Mueller's avatar
Micha Mueller committed
502
503
504

Taken from the [perf_event_open man-page](http://man7.org/linux/man-pages/man2/perf_event_open.2.html):

505
506
507
<a name="fn1">**1**</a>: &ensp; The pid and cpu arguments allow specifying which process and CPU to monitor:  
[...]  
pid == -1 and cpu >= 0  
508
This measures all processes/threads on the specified CPU. This requires CAP_SYS_ADMIN capability or a /proc/sys/kernel/perf_event_paranoid value of less than 1.
509
510
511
512

[...]

The perf_event_paranoid file can be set to restrict access to the performance counters.
Micha Mueller's avatar
Micha Mueller committed
513
514
515
516
517
518
519

| Value | Restriction |
|:-----:|:----------- |
| 2 | allow only user-space measurements (default since Linux 4.6) |
| 1 | allow both kernel and user measurements (default before Linux 4.6) |
| 0 | allow access to CPU-specific data but not raw trace-point samples |
| -1 | no restrictions |
Micha Mueller's avatar
Micha Mueller committed
520
521
522
	
The existence of the perf_event_paranoid file is the official method for determining if a kernel supports perf_event_open()

Micha Mueller's avatar
Micha Mueller committed
523
<a name="fn2">**2**</a>: &ensp; If type is *PERF_TYPE_RAW*, then a custom "raw" config value is needed. Most CPUs support events that are not covered by the "generalized" events. These are implementation defined; see your CPU manual (for example the Intel Volume 3B documentation or the AMD BIOS and Kernel Developer Guide). The libpfm4 library can be used to translate from the name in the architectural manual to the raw hex value perf_event_open() expects in this field.
Micha Mueller's avatar
Micha Mueller committed
524

525
<a name="fn3">**3**</a>: &ensp; Custom type and Config values can be specified to use the PMU of a specific device. The necessary configuration parameters can be obtained from the type and config files the respective in /sys/devices/<device> tree.
526

527
## snmp <a name="snmp"></a>
Micha Mueller's avatar
Micha Mueller committed
528

Micha Mueller's avatar
Micha Mueller committed
529
The SNMP plugin enables dcdbpusher to talk with devices which have an SNMP agent running and query requests from them. A SNMP sensor corresponds to a single value as identified by the unique OID. Sensors are aggregated by connections. See the exemplary snmp.conf file in the `config/` directory.
530
> NOTE &ensp;&ensp;&ensp; In the SNMP context the word privacy is used synonymously for encryption.
Micha Mueller's avatar
Micha Mueller committed
531

Micha Mueller's avatar
Micha Mueller committed
532
533
Explanation of the values specific for the SNMP plugin:

Micha Mueller's avatar
Micha Mueller committed
534
535
536
537
| Value | Explanation |
|:----- |:----------- |
| connection | An aggregating connection
| Type | Type of the SNMP application which runs on the device queried by the connection. Currently only the type Agent is supported.
538
| Host | Host name of the device which is to be queried. Follows net-snmp's [<transport-specifier>:]<transport-address> format, e.g. udp:hostname:161
Micha Mueller's avatar
Micha Mueller committed
539
| OIDPrefix | This OIDPrefix is used for all following sensors.
Micha Mueller's avatar
Micha Mueller committed
540
541
542
543
544
545
546
547
548
| |
| Version | Which SNMP version to use (either 2 (maps to 2c) or 3).
| Community | Which SNMP community to use (required only if version 2 is used).
| Username | Username to authenticate with (only required for version 3).
| SecLevel | The security level to be used (only required for version 3). Can be either `noAuthNoPriv` for no authentication and privacy ("privacy" is SNMPs synonym for encryption), `authNoPriv` for only authentication and `authPriv` for authentication and privacy.
| AuthProto | Which protocol to use for authentication (only required for version 3 and if SecLevel != noAuthNoPriv). Can be MD5 or SHA1.
| AuthKey | The passphrase for authentication (only required for version 3 and if SecLevel != noAuthNoPriv). Must be at least 8 characters long.
| PrivProto | Which protocol to use for privacy (only required for version 3 and if SecLevel = AuthPriv). Can be DES or AES.
| PrivKey | The passphrase for privacy encryption (only required for version 3 and if SecLevel = AuthPriv). Must be at least 8 characters long.
Micha Mueller's avatar
Micha Mueller committed
549
| mqttPart | Connection specific MQTT-part which is appended to the MQTT-prefix and succeded by the sensor specific suffix.
Micha Mueller's avatar
Micha Mueller committed
550
| |
Micha Mueller's avatar
Micha Mueller committed
551
| OID | OID suffix which together with the OIDPrefix forms the unique OID identifying a value to query.
Micha Mueller's avatar
Micha Mueller committed
552
| passphrase | has to be at least 8 characters long
Micha Mueller's avatar
Micha Mueller committed
553

554
## sysFS <a name="sysfs"></a>
Micha Mueller's avatar
Micha Mueller committed
555
556

SysFS sensors read data from sysFS files. The configuration file of the plugin corresponds to the generic plugin configuration with standalone sensors. Additionally for a sysFS sensor the following parameters are mandatory/possible:
557

Micha Mueller's avatar
Micha Mueller committed
558
559
Explanation of the values specific for the sysFS plugin:

Micha Mueller's avatar
Micha Mueller committed
560
561
562
| Value | Explanation |
|:----- |:----------- |
| path | Path to the sysFS file the sensor should read from. This parameter is mandatory.
563
| filter | One can define an optional filter if the sysFS file consists of more than only the sensor value. Please note the following points for filters: <br> 1.  The filter supports substitutions. For substitution sed syntax ("s/.../.../") is used. Therefore extended regular expressions (ERE) are used as regex-syntax. ERE is closest to Basic RE (BRE), which is actually used by sed, but requires less escaping. <br> 2.  If a \ ("backslash") is needed in the regex (for escaping), always use \\ ("double backslash") as the regex is read in as string and strings also escape with backslash <br> 3.  Whitespaces are actually used as value separators in the config files. If your filter requires whitespaces either use [[:space:]] in the regex or put it in quotation marks ("") <br> 4.  To be able to reference parts of the match (for substitution) use groups. Groups are created with parentheses. <br>  5.  If using character classes like [[:digit:]] always make sure to use double brackets ("[[" and "]]") or they will not be recognized. <br>  See [ERE-syntax](https://www.gnu.org/software/sed/manual/html_node/ERE-syntax.html#ERE-syntax) <br>  See [substitution syntax](http://www.boost.org/doc/libs/1_65_1/libs/regex/doc/html/boost_regex/format/sed_format.html)
564

565
## PDU <a name="pdu"></a>
566
567
568

The Power Delivery Unit (PDU) plugin is in charge of sending a network-request to the PDUs and gathering specified sensor data from the XML-file response.

Micha Mueller's avatar
Micha Mueller committed
569
Explanation of the values specific for the PDU plugin:
Micha Mueller's avatar
Micha Mueller committed
570

571
| Value | Explanation |
572
|:----- |:----------- |
573
| host | Hostname and (optional) port where to fetch the XML-file with sensor data from. If no port is specified, 443 is used. The plugin requests the file via HTTPS.
Micha Mueller's avatar
Micha Mueller committed
574
| request | Define the request to be sent to the host via HTTPS as a string. One should put the request in quotation marks (' " ') to enable the use of whitespaces within the request. Special characters (like usage of ' " ' within the request) should be escaped (' " ' --> ' \" '; ' \ ' --> ' \\\\ '; newline --> ' \n '; ...).
575
| path | Define a dot-separated path to the value to be read in the XML file. One can specify attribute values a node has to fulfil in brackets after the node. Even multiple (comma-separated) attributes can be given, however no whitespaces should be used (!) as they will not be filtered and could therefore be treat as part of the attributes name.
576

577
## BACnet <a name="bacnet"></a>
578
579

The BACnet plugin enables dcdbpusher to communicate and request data from devices which communicate via the BACnet protocol. A so called "read property" request is sent by the plugin to the BACnet devices as configured in the config file. The response value is then stored in the database. Usually one is only interested in collecting the current reading of a BACnet device (property PROP_PRESENT_VALUE, ID 85). However, also reading of other properties is supported.
580
> NOTE &ensp;&ensp;&ensp; On startup BACnet plugin does no device discovery. Instead it relies on the user providing a file with addresses of all required BACnet devices. One can generate such an address-file for example by using the `bacwi` demo tool provided by the BACnet-Stack.
581
582
583
584
585

Explanation of the values specific for the BACnet plugin:

| Value | Explanation |
|:----- |:----------- |
586
| address_cache | (Path to and) filename of the address cache file where the addresses of BACnet devices are stored (as noted above).
Micha Mueller's avatar
Micha Mueller committed
587
| interface | Network interface (IPv4) which is to be used by the plugin to send its "Read Property" requests.
588
| port | Port to use on the interface
589
| timeout |	Value of µ-seconds to wait for a response packet.
590
591
592
| apdu_timeout | Value of µ-seconds before sending a request times out.
| apdu_retries | How often should sending a request be retried.
| templates | One can define template properties in this section for convenience.
593
| factor | Described in the section for the [IPMI-plugin](#ipmi).
594
595
596
597
598
599
| devices | Starts the part in the config file where the actual BACnet devices are configured. A BACnet device consists of multiple nested parts: device > objects > properties.
| instance (device) | Instance of the BACnet-device.
| type | Type of the object within the device.
| instance (object) | Instance of the object within the device.
| id | ID of the property to be read from the BACnet device-object. Assignment of numbers to properties is done according to the enum as defined in `bacenum.h`.

600
## Opa (Intel Omni-Path Architecture) <a name="opa"></a>
601
602
603
604
605
606
607
608
609
610
611

The Opa plugin enables dcdbpusher to query various counters from omni-path interconnects.

Explanation of the values specific for the Opa plugin:

| Value | Explanation |
|:----- |:----------- |
| hfiNum | Number of which omni-path Host Fabric Interface to query (starting with 1)
| portNum | Number of which omni-path port to query (starting with 1)
| cntData | Name which data counter to query. A list of possible values can be found below.

612

Micha Mueller's avatar
Micha Mueller committed
613
> NOTE &ensp;&ensp;&ensp; As opa counters are usually always monotonic, the delta attribute is by default set to true for all sensors. One has to explicitly set delta to "off" for a sensor to overwrite this behaviour.
614

615
### counterData <a name="opaCounterData"></a>
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645

Possible values for cntData:
* portXmitData
* portRcvData
* portXmitPkts
* portRcvPkts
* portMulticastXmitPkts
* portMulticastRcvPkts
* localLinkIntegrityErrors
* fmConfigErrors
* portRcvErrors
* excessiveBufferOverruns
* portRcvConstraintErrors
* portRcvSwitchRelayErrors
* portXmitDiscards
* portXmitConstraintErrors
* portRcvRemotePhysicalErrors
* swPortCongestion
* portXmitWait
* portRcvFECN
* portRcvBECN
* portXmitTimeCong
* portXmitWastedBW
* portXmitWaitData
* portRcvBubble
* portMarkFECN
* linkErrorRecovery
* linkDowned
* uncorrectableErrors

Alessio Netti's avatar
Alessio Netti committed
646
647
648
649
650
651
652
## ProcFS (/proc filesystem) <a name="procfs"></a>

The ProcFS plugin enables dcdbpusher to sample resource usage metrics from a variety of files in the /proc virtual filesystem generated by the Linux kernel. Each defined sensor group is assigned to a specific file, which is periodically parsed. Currently supported files for sampling are:
* /proc/vmstat: contains virtual memory-related usage metrics;
* /proc/meminfo: contains RAM memory-related usage metrics (note that some of the metrics overlap with /proc/vmstat);
* /proc/stat: contains CPU usage-related metrics, both at system and core levels.

Alessio Netti's avatar
Alessio Netti committed
653
654
655
Note that the ProcFS plugin can operate in two distinct modes, with respect to MQTT topics:
* Automatic: if no sensors are specified, all metrics discovered in the underlying parsed file are acquired; sensors and MQTT topics are generated for them. Please be careful when configuring the plugin so that its MQTT topics do not overlap with those of other plugins.
* Manual: If at least one sensor is specified, only the corresponding metrics are acquired, and all other metrics in the parsed file are discarded. MQTT topics are assigned accordingly, using the mqttPrefix, mqttPart and mqttSuffix fields.
Alessio Netti's avatar
Alessio Netti committed
656
657
658
659
660

Explanation of the values specific for the ProcFS plugin:

| Value | Explanation |
|:----- |:----------- |
Alessio Netti's avatar
Alessio Netti committed
661
| type | The type of the file parsed by the sensor group. Can be either "vmstat", "meminfo", "procstat" or "sar"
Alessio Netti's avatar
Alessio Netti committed
662
| path | Path of the file, if different from the default path in the /proc filesystem
Alessio Netti's avatar
Alessio Netti committed
663
| cpus | Defines the set of CPU cores for which metrics must be collected. Only affects extraction of core-specific metrics (e.g. those in /proc/stat), whereas system-level metrics are acquired regardless of this setting. If no CPU cores set is defined, metrics for all available CPU cores will be collected. This parameter follows the same syntax as in the Perf-event plugin.
664
665
| htVal | Specify a multiplier for CPU aggregation. All CPUs where (CPU-number % htVal) has the same result are aggregated together. If specified, only CPUs which are included in the "cpus" field are aggregated. See Perf-event plugin for more details.
| scalingFactor | A scaling factor to be applied to ratio-like metrics. Default is 1000000.
666
| mqttSuffix | the mqttSuffix field in the ProcFS plugin, for sensors that are CPU-related such as the ones in procstat files, behaves as described for the perf-event plugin.
Alessio Netti's avatar
Alessio Netti committed
667
668
669
670
671
672
673
674
675

Additionally, sensors in the ProcFS plugin (defined with the "metric" keyword) support the following additional values:

| Value | Explanation |
|:----- |:----------- |
| type | The type of the specific metric associated to the sensor. This field must match the exact name of a metric in the underlying parsed file. If such a match does not exist, the sensor is discarded.
| perCpu | Boolean. If set to "on", the metric will be collected for each CPU core specified with the "cpus" sensor group parameter, or for all CPU cores if none is specified. Otherwise, the metric will be collected only at system level. This parameter has no effect on metrics that are not acquired at CPU core level (e.g. those in /proc/vmstat and /proc/meminfo).

The "type" field can be inferred for each sensor by simply checking the underlying file parsed by the sensor group. For /proc/stat files, on the other hand, CPU core-related metrics are collected in separate columns, which adopt the following naming scheme that can be used to define sensors: 
Alessio Netti's avatar
Alessio Netti committed
676
677
678
679
680
681
682
683
684
685
* col_user 
* col_nice 
* col_system 
* col_idle 
* col_iowait
* col_irq
* col_softirq
* col_steal
* col_guest
* col_guest_nice
Alessio Netti's avatar
Alessio Netti committed
686
687

Additional CPU-related metrics (that may be introduced in future versions of the Linux kernel) are not supported by the DCDB ProcFS plugin.
Alessio Netti's avatar
Alessio Netti committed
688
Note that for /proc/meminfo instances, an additional synthetic sensor of type "MemUsed" can be defined. This sensor will automatically extract the amount of used memory from the MemTotal and MemFree values present in meminfo files.
Alessio Netti's avatar
Alessio Netti committed
689

690
691
## Caliper <a name="caliper"></a>

692
693
694
695
The Caliper plugin collects application introspection data and therefore allows for application performance analysis in retrospect. This plugin is special as it does not work on its own but also requires a corresponding Caliper framework service running on application side. Please see Caliper's [official documentation](https://software.llnl.gov/Caliper/) for an exhaustive introduction.
The Caliper plugin supports two use cases:
* **Sampling** Low overhead automatic sampling of program counter (PC) values. Allows to analyze how much time was spent in a function in retrospect. Is the default case and enabled at all times.
* **Instrumentation** The user can instrument its application with Caliper annotations. The event data generated by the annotations is picked up by Pusher in addition to the sampling data and stored within DCDB. The annotation data can then be correlated with other monitoring data and allows for more fine-grained introspection than the sampling approach. On the downside, this usually induces more overhead.
696
697

### Caliper framework side
698
699
700
Caliper is an application introspection system. Its functionality stems from so called services. To work with the Pusher plugin the custom Dcdbpusher service for Caliper is required as well as the stock timestamp, pthread, and sampler service. For instrumentation, the event service is required as well.
Caliper has to be integrated into the application. This can be done either manually from the application developer or more automated by the system administrator by "hijacking" applications, e.g. overwriting main methods before execution. For the sampling case it is sufficient to use the Caliper framework just once, i.e. initialize it somewhere. However, one can still use the full functionality of Caliper services at own will in parallel.
The dcdbpusher service retrieves all relevant data from snapshots (Timestamp, CPU, PC (sampling), annotation data (instrumentation)). In case of sampling, the PC value is resolved to the actual function name via the binary's symbol data. Retrieved data is temporarily stored in a thread-local buffer. Eventually it gets written to a shared-memory queue which is used to communicate with the Pusher plugin.
701

702
The Caliper services can be controlled by the environment variables listed below:
Micha Müller's avatar
Pusher:    
Micha Müller committed
703
704
705

| Value | Explanation |
|:----- |:----------- |
706
707
708
709
| CALI_SERVICES_ENABLE | Specify which Caliper services to enable. Should be at least `event:sampler:timestamp:pthread:dcdbpusher`.
| CALI_SAMPLER_FREQUENCY | Frequency of the sampler service in Hz.
| CALI_TIMER_TIMESTAMP | Must be set to `true` to enrich all snapshots with timestamps of their creation.
| CALI_DCDBPUSHER_SUS_CYCLE | Symbol update service (SUS) cycle. To resolve PC values to function names the symbol data of the binary and loaded libraries is locally buffered in a so called "symbol index". In case a symbol could not be resolved by the dcdbpusher service (e.g. because the PC points to a newly loaded library that has not yet been indexed) it informs the background SUS thread to update the symbol index. Updating the symbol index is a heavy blocking task. To limit overhead and avoid continuous rebuild of the symbol index this environment variable can be used to set the cycle interval of the SUS in seconds (e.g. `export CALI_DCDBPUSHER_SUS_CYCLE=x`). The SUS only checks every x seconds if a symbol data update is requested. Increasing this value reduces overhead of repeated symbol index rebuilds but decreases responsiveness if rebuilds are requested seldomly. Default is 15 seconds.
Micha Müller's avatar
Pusher:    
Micha Müller committed
710

711
712
### Pusher plugin side
The pusher plugin serves as data sink for the snapshot data from the Caliper service. It can handle multiple different applications at once. However, it is mainly intended for only one application with multiple threads/(MPI-)processes. 
713
714
The plugin consumes the snapshot data from the shared-memory queue. For each unique snapshot data a new sensor is created. Subsequent encounters of the same data (function name or annotation) a reading value of 1 is stored with the sensor.
After an application terminates/timeouts or the maxSensors value is reached all sensors get cleared.
715
716
717
718
719

Explanation of the values specific for this plugin:

| Value | Explanation |
|:----- |:----------- |
720
| interval | In case of sampling, the interval value of a SensorGroup (or SingleSensor) has a small side effect. Within the same read cycle, multiple encounters of the same function name will be aggregated. Instead of a value of one for each encounter, only the aggregated value at the end of the read cycle will be actually stored with the corresponding sensor. Therefore the read cycle interval also determines the granularity of the sampling data. A lower interval results in more fine-grained sampling data resolution but also requires more memory in the storage backend. 
721
722
| maxSensors | To limit indefinite memory usage by the creation of new Sensor object one can specify a threshold here. If the number of sensors exceeds this value, they will be cleared. Default is 500.
| timeout | Number of read cycles after which an Caliper-application is assumed to be terminated if no new values have been received. Connection (shared memory) is teared down on timeout. Default is 15.
723

724
725
726
727
728
### Shortcomings
Usage of the Caliper plugin is currently obstructed by a few shortcomings:
* The Caliper framework has to be integrated manually by the user into its application for this plugin to work.
* The Caliper framework seems to interfere with Intel libraries, which may cause [application crashes](https://github.com/LLNL/Caliper/issues/223).

729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
## Metadata Management <a name="metadataManagement"></a>

Sensor metadata can be included in Pusher configurations, and will be published to the Storage Backend if the _auto-publish_ feature is enabled. A metadata block looks like the following:

```
...

group g2 {						
 	sensor pw {				
		mqttsuffix	/power		
		...
		
		metadata {
			unit	      Watt
			scale	      1000
			ttl           3600000
			operations    avg5,min,max
		}		
	}
}

...
```

Available fields that can be published as metadata are the following:

| Value | Explanation |
|:----- |:----------- |
| unit | String containing the unit of measure for the sensor, if any. |
| scale | Scaling factor (as a floating point value) to be applied to readings of the sensor upon queries. |
| ttl | Time to live for the readings of this sensor in milliseconds, after which they are automatically deleted from the Storage Backend. |
| monotonic | Boolean flag specifying whether the sensor is monotonic or not. |
| integrable | Boolean flag specifying whether the sensor's time series can be integrated or not. |
| interval | Sampling interval in milliseconds of the sensor. |
| operations | Comma-separated lists of operations available for the sensor, whose values can be retrieved by appending their names to the sensor name. |

765
An additional _isOperation_ field is available for the output sensors of operators in the Wintermute framework. If these output sensors are generated starting from a single input, this field allows to publish them as _operations_ of the latter, and will be listed in the associated database entry. For this to apply, however, the MQTT topic of the output sensor must be identical to that of the input, plus a suffix that describes the operation. Enabling this option invalidates all other metadata fields.
766

767
## Writing own plugins <a name="writingOwnPlugins"></a>
Micha Müller's avatar
Micha Müller committed
768
769
770
First make sure you read the [plugins](#plugins) section. 

It is recommended to use the `pluginGenerator/generatePlugin.sh` script to kick off plugin development. Running `./generatePlugin.sh -h` gives instructions on how to use the script. On success, the script generates all required source files for a new plugin with instructions on how to continue from there.
771