README.md 46.8 KB
Newer Older
1
2
3
# DCDB Pusher

### Table of contents
4
5
6
7
1. [Introduction](#introduction)
2. [dcdbpusher](#dcdbpusher)
    1. [Global Configuration](#globalConfiguration)
    2. [Rest API](#restApi)
Alessio Netti's avatar
Alessio Netti committed
8
        1. [List of resources](#listOfResources)
Micha Mueller's avatar
Micha Mueller committed
9
        2. [Examples](#restExamples)
10
11
12
13
14
15
16
17
18
19
20
21
    3. [MQTT topic](#mqttTopic)
3. [Plugins](#plugins)
    1. [IPMI](#ipmi)
    2. [Perf-event](#perf)
        1. [type and config](#perfTypeConfig)
        2. [Footnotes](#perfFootnotes)
    3. [SNMP](#snmp)
    4. [SysFS](#sysfs)
    5. [PDU](#pdu)
    6. [BACnet](#bacnet)
    7. [OPA](#opa)
        1. [counterData](#opaCounterData)
Alessio Netti's avatar
Alessio Netti committed
22
    8. [ProcFS](#procfs)
23
    9. [Caliper](#caliper)
24
25
    10. [Metadata Management](#metadataManagement)
    11. [Writing own plugins](#writingOwnPlugins)
26
27

## Introduction <a name="introduction"></a>
Micha Mueller's avatar
Micha Mueller committed
28
29
30
DCDB (DataCenter DataBase) is a database to collect various (sensor-)values of a datacenter for further analysis.
Harvesting of the data is task of the dcdbpusher.

31
# dcdbpusher <a name="dcdbpusher"></a>
Micha Mueller's avatar
Micha Mueller committed
32

Micha Mueller's avatar
Micha Mueller committed
33
This is a general MQTT pusher which sends values of various sensors to the DCDB-database.
Alessio Netti's avatar
Alessio Netti committed
34
It ships with plugins for BACnet, IPMI, PDU (proprietary Power Delivery Unit, but could be used as XML plugin), perfcounter, SNMP and sysFS, among others.
Micha Mueller's avatar
Micha Mueller committed
35
36
37
38
39
40
41
42
43

Build it by simply running
```bash
make
```
or alternatively use
```bash
make debug
```
Micha Mueller's avatar
Micha Mueller committed
44
within the `dcdbpusher` directory to build a version which will print additional debug-information during runtime.
Micha Mueller's avatar
Micha Mueller committed
45

46
The logic for the various sensors is encapsulated into plugins (shared dynamic libraries; the makefile will take care of compiling them for you). The dcdbpusher will dynamically open the libraries if they are specified in the [global configuration](#GC) file. Vice versa, if selected sensor-functionality, e.g. sysFS is not specified, the corresponding shared library libdcdbplugin_sysfs.so does not have to be present. 
Micha Mueller's avatar
Micha Mueller committed
47

Micha Mueller's avatar
Micha Mueller committed
48
49
50
51
You can run dcdbpusher by executing
```bash
./dcdbpusher path/to/configfile/
```
52
or run
Micha Mueller's avatar
Micha Mueller committed
53
54
55
```bash
./dcdbpusher -h
```
Micha Mueller's avatar
Micha Mueller committed
56
to print the help-section of dcdbpusher.
Micha Mueller's avatar
Micha Mueller committed
57

Alessio Netti's avatar
Alessio Netti committed
58
Dcdbpusher will check the given file-path for the global configuration file which has to be named `dcdbpusher.conf`.
Micha Mueller's avatar
Micha Mueller committed
59

60
### Global Configuration  <a name="globalConfiguration"></a>
Micha Mueller's avatar
Micha Mueller committed
61

Micha Mueller's avatar
Micha Mueller committed
62
The global configuration specifies various settings for dcdbpusher in general, e.g. which plugins should be loaded etc.
Alessio Netti's avatar
Alessio Netti committed
63
Please have a look at the provided `config/dcdbpusher.conf` example to get familiar with the file scheme. The example also forms a good starting point for writing a custom `dcdbpusher.conf`. The different sections and values are explained in the following table:
Micha Mueller's avatar
Micha Mueller committed
64

Micha Mueller's avatar
Micha Mueller committed
65
| Value | Explanation |
Micha Mueller's avatar
Micha Mueller committed
66
67
|:----- |:----------- |
| global | Wrapper structure for the global values.
Micha Mueller's avatar
Micha Mueller committed
68
69
| mqttBroker | Define address and port of the MQTT-broker which collects the messages (sensor values) send by dcdbpusher.
| mqttprefix | To not rewrite a full MQTT-topic for every sensor one can specify here a consistent prefix.
70
| sensorpattern | pattern used to perform automatic sensor name publishing. See the corresponding [section](#autopublish) for more information.
Alessio Netti's avatar
Alessio Netti committed
71
72
| threads | Specify how many threads should be created to handle the sensors asynchronously, as well as the Wintermute plugins. Default value of threads is 1. Note that the MQTT Pusher always starts an extra thread. So the actual number of started threads is always one more than defined here. Specifying not enough threads can result in a delay for some sensors until they are read.
| maxMsgNum | To avoid publishing too many MQTT messages at once you can define here a maximum count of values that are published in one turn. After reaching this limit the MQTT Pusher will be forced to sleep for a short time before continuing.
73
74
|maxInflightMsgNum|Maximum number of messages that can be "inflight". This is a MQTT term and should match the broker's setting. Set to 0 for unlimited.
|maxQueuedMsgNum|Maximum number of MQTT messages (including "inflight") that should be queued. This is to limit the amount of memory that is used for buffering. Set to 0 for unlimited.
Micha Mueller's avatar
Micha Mueller committed
75
76
77
78
| verbosity | Level of detail in the logfile (dcdb.log). Set to a value between 5 (all log-messages, default) and 0 (only fatal messages). NOTE: level of verbosity for the command-line log can be set via the -v flag independently when invoking dcdbpusher.
| daemonize | Set to 'true' if dcdbpusher should run detached as daemon. Default is false.
| tempdir | One can specify a writeable directory where dcdbpusher can write its temporary and logging files to. Default is the current (' ./ ' ) directory.
| cacheInterval | Define a time interval in seconds. The last sensor readings within this time interval will be kept. This value can be overwritten by plugins.
Micha Mueller's avatar
Micha Mueller committed
79
| | |
80
| restAPI | Bundles all values related to the RestAPI. See the corresponding [section](#restApi) for more information on supported functionality.
81
| address | Define (IP-)address and port where the REST API server should run on.
82
83
84
| certificate | Provide the (path and) file which the HTTPS server should use as certificate.
| privateKey | Provide the (path and) file which should be used as corresponding private key for the certificate. If private key and certificate are stored in the same file one should nevertheless provide the path to the cert-file here again.
| dhFile | Provide the (path and) file where Diffie-Hellman parameters for the key exchange are stored.
Alessio Netti's avatar
Alessio Netti committed
85
| user | This struct is used to define username and password for users of the REST API, along with their respective allowed REST operations (i.e., GET, PUT, POST).
Micha Mueller's avatar
Micha Mueller committed
86
87
| | |
| plugins | In this section one can specify the plugins which should be used.
Micha Mueller's avatar
Micha Mueller committed
88
| plugin _name_ | The plugin name is used to build the corresponding lib-name (e.g. sysfs --> libdcdbplugin_sysfs.1.0)
Micha Mueller's avatar
Micha Mueller committed
89
| path | Specify the path where the plugin (the shared library) is located. If left empty, dcdbpusher will look in the default lib-directories (usr/lib and friends) for the plugin-file.
Alessio Netti's avatar
Alessio Netti committed
90
| config | One can specify a separate config-file (including path to it) for the plugin to use. If not specified, dcdbpusher will look up pluginName.conf (e.g. sysfs.conf) in the same directory where dcdbpusher.conf is located.
Micha Mueller's avatar
Micha Mueller committed
91
92
| | |

Alessio Netti's avatar
Alessio Netti committed
93
94
95
Formats of the other sensor-specific config-files are explained in the corresponding [subsections](#ipmi), 
 while example configuration-files can be found in the `config/` directory. An explanation of how to deploy Wintermute 
 data analytics plugins can be found in the corresponding readme document.
Micha Mueller's avatar
Micha Mueller committed
96
97


98
## REST API <a name="restApi"></a>
99

Alessio Netti's avatar
Alessio Netti committed
100
Dcdbpusher runs a HTTPS server which provides some functionality to be controlled over a RESTful API. The API is by default hosted at port 8000 on 127.0.0.1 but the address can be changed in [`dcdbpusher.conf`](#globalConfiguration).
101

Alessio Netti's avatar
Alessio Netti committed
102
103
A HTTPS request to dcdbpusher should have the following format: `[GET|PUT|POST] host:port[resource]?[queries]`.
Tables with allowed resources sorted by REST methods can be found below. A query consists of a key-value pair of the format `key=value`. Multiple queries are separated by semicolons(';'). For all requests (except /help) basic authentication credentials must be provided.
104

Alessio Netti's avatar
Alessio Netti committed
105
### List of resources <a name="listOfResources"></a>
106

107
108
<table>
  <tr>
Alessio Netti's avatar
Alessio Netti committed
109
    <td colspan="2"><b>Resource</b></td>
Micha Mueller's avatar
Micha Mueller committed
110
    <td colspan="2">Description</td>
111
112
  </tr>
  <tr>
Micha Mueller's avatar
Micha Mueller committed
113
114
115
116
117
  	<td>Query</td>
  	<td>Value</td>
  	<td>Opt.</td>
  	<td>Description</td>
  </tr>
Micha Mueller's avatar
Micha Mueller committed
118
119
120
</table>

<table>
Micha Mueller's avatar
Micha Mueller committed
121
  <tr>
Micha Mueller's avatar
Micha Mueller committed
122
    <td colspan="2"><b>GET /help</b></td>
Micha Mueller's avatar
Micha Mueller committed
123
    <td colspan="2">Return a cheatsheet of possible REST API endpoints.</td>
Micha Mueller's avatar
Micha Mueller committed
124
125
126
127
  </tr>
  <tr>
  	<td colspan="4">No queries.</td>
  </tr>
Micha Mueller's avatar
Micha Mueller committed
128
129
130
</table>

<table>
Micha Mueller's avatar
Micha Mueller committed
131
  <tr>
Micha Mueller's avatar
Micha Mueller committed
132
    <td colspan="2"><b>GET /plugins</b></td>
Micha Mueller's avatar
Micha Mueller committed
133
134
135
136
137
    <td colspan="2">List all loaded dcdbpusher plugins.</td>
  </tr>
  <tr>
  	<td>json</td>
  	<td>"true"</td>
Micha Mueller's avatar
Micha Mueller committed
138
  	<td>Yes</td>
Micha Mueller's avatar
Micha Mueller committed
139
140
  	<td>Format response as json.</td>
  </tr>
Micha Mueller's avatar
Micha Mueller committed
141
142
143
</table>

<table>
Micha Mueller's avatar
Micha Mueller committed
144
  <tr>
Micha Mueller's avatar
Micha Mueller committed
145
    <td colspan="2"><b>GET /sensors</b></td>
Micha Mueller's avatar
Micha Mueller committed
146
147
148
149
150
    <td colspan="2">List all sensors of a specific plugin.</td>
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
Micha Mueller's avatar
Micha Mueller committed
151
  	<td>No</td>
Micha Mueller's avatar
Micha Mueller committed
152
153
154
155
156
  	<td>Specify the plugin.</td>
  </tr>
  <tr>
  	<td>json</td>
  	<td>"true"</td>
Micha Mueller's avatar
Micha Mueller committed
157
  	<td>Yes</td>
Micha Mueller's avatar
Micha Mueller committed
158
159
  	<td>Format response as json.</td>
  </tr>
Micha Mueller's avatar
Micha Mueller committed
160
161
162
</table>

<table>
Micha Mueller's avatar
Micha Mueller committed
163
  <tr>
Micha Mueller's avatar
Micha Mueller committed
164
    <td colspan="2"><b>GET /average</b></td>
165
    <td colspan="2">Get the average of the last readings of a sensor. Also allows access to analytics sensors.</td>
Micha Mueller's avatar
Micha Mueller committed
166
167
168
169
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
Micha Mueller's avatar
Micha Mueller committed
170
  	<td>No</td>
Micha Mueller's avatar
Micha Mueller committed
171
172
173
174
  	<td>Specify the plugin.</td>
  </tr>
  <tr>
  	<td>sensor</td>
175
  	<td>All sensor names of the plugin or the operator manager.</td>
Micha Mueller's avatar
Micha Mueller committed
176
  	<td>No</td>
Micha Mueller's avatar
Micha Mueller committed
177
178
179
180
181
  	<td>Specify the sensor within the plugin.</td>
  </tr>
  <tr>
  	<td>interval</td>
  	<td>Number of seconds.</td>
Micha Mueller's avatar
Micha Mueller committed
182
  	<td>Yes</td>
Micha Mueller's avatar
Micha Mueller committed
183
184
  	<td>Use only readings more recent than (now - interval) for average calculation. Defaults to zero, i.e. all cached sensor readings are included in average calculation.</td>
  </tr>
Micha Mueller's avatar
Micha Mueller committed
185
186
</table>

Alessio Netti's avatar
Alessio Netti committed
187
188
189
190
191
192
193
194
195
196
197
198
199
<table>
  <tr>
    <td colspan="2"><b>PUT /quit</b></td>
    <td colspan="2">Exits the Pusher with a user-specified return code.</td>
  </tr>
  <tr>
  	<td>code</td>
  	<td>Return code.</td>
  	<td>Yes</td>
  	<td>Return code to be used when exiting.</td>
  </tr>
</table>

200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
<table>
  <tr>
    <td colspan="2"><b>PUT /load</b></td>
    <td colspan="2">Load and intitialize a new plugin but do not start it. Use the /start request to kick off the plugin's data collection.</td>
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>Plugin name.</td>
  	<td>No</td>
  	<td>Name of the new plugin. Is used to build the shared library file name which holds the plugin. Shared lib file name is of the form libdcdbplugin_PLUGINNAME.so (or .dylib for Apple).</td>
  </tr>
  <tr>
  	<td>path</td>
  	<td>A file path.</td>
  	<td>Yes</td>
  	<td>Path to where the shared library for the plugin is located. If not specified the default library directories (urs/lib and friends) are searched.</td>
  </tr>
  <tr>
  	<td>config</td>
  	<td>A file path including file name.</td>
  	<td>Yes</td>
  	<td>Path and name of the plugin configuration file. If not specified we will search for "./PLUGINNAME.conf".</td>
  </tr>
</table>

<table>
  <tr>
    <td colspan="2"><b>PUT /unload</b></td>
    <td colspan="2">Unload a plugin, removing it completely from dcdbpusher. To use the plugin again one has to /load it first.</td>
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
  	<td>No</td>
  	<td>Specify the plugin.</td>
  </tr>
</table>

Micha Mueller's avatar
Micha Mueller committed
238
<table>
Micha Mueller's avatar
Micha Mueller committed
239
  <tr>
240
241
    <td colspan="2"><b>PUT /reload</b></td>
    <td colspan="2">Reload a plugin's configuration (includes fresh creation of a plugin's sensors and a plugin restart).</td>
Micha Mueller's avatar
Micha Mueller committed
242
243
244
245
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
Micha Mueller's avatar
Micha Mueller committed
246
  	<td>No</td>
Micha Mueller's avatar
Micha Mueller committed
247
248
  	<td>Specify the plugin.</td>
  </tr>
Micha Mueller's avatar
Micha Mueller committed
249
250
251
</table>

<table>
Micha Mueller's avatar
Micha Mueller committed
252
  <tr>
253
254
    <td colspan="2"><b>POST /start</b></td>
    <td colspan="2">Start a plugin, i.e. its sensors start polling.</td>
Micha Mueller's avatar
Micha Mueller committed
255
256
257
258
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
Micha Mueller's avatar
Micha Mueller committed
259
  	<td>No</td>
Micha Mueller's avatar
Micha Mueller committed
260
261
  	<td>Specify the plugin.</td>
  </tr>
Micha Mueller's avatar
Micha Mueller committed
262
263
264
</table>

<table>
Micha Mueller's avatar
Micha Mueller committed
265
  <tr>
266
267
    <td colspan="2"><b>POST /stop</b></td>
    <td colspan="2">Stop a plugin, i.e. its sensors stop polling.</td>
Micha Mueller's avatar
Micha Mueller committed
268
269
270
271
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
Micha Mueller's avatar
Micha Mueller committed
272
  	<td>No</td>
Micha Mueller's avatar
Micha Mueller committed
273
  	<td>Specify the plugin.</td>
274
275
276
  </tr>
</table>

Micha Mueller's avatar
Micha Mueller committed
277
> NOTE &ensp;&ensp;&ensp;&ensp;&ensp; Opt. = Optional
278

279
### Examples <a name="restExamples"></a>
280

Micha Mueller's avatar
Micha Mueller committed
281
Two examples for HTTPS requests (authentication credentials not shown):
Micha Mueller's avatar
Micha Mueller committed
282
283

```bash
Micha Mueller's avatar
Micha Mueller committed
284
GET https://localhost:8000/average?plugin=sysfs;sensor=freq1;interval=15
Micha Mueller's avatar
Micha Mueller committed
285
286
```
```bash
Micha Mueller's avatar
Micha Mueller committed
287
PUT https://localhost:8000/stop?plugin=bacnet
Micha Mueller's avatar
Micha Mueller committed
288
```
289

Micha Mueller's avatar
Micha Mueller committed
290
## MQTT topic <a name="mqttTopic"></a>
291

292
For communication between the different DCDB-components (database, dcdbpusher) the [MQTT protocol](https://mqtt.org/) is used. In order to identify each sensor, each has to have a unique MQTT topic assigned. The topic for a sensor is built by appending up to 4 parts:
293
294
295
296
1. mqttprefix    (e.g. /mysystem)
2. mqttpart of entity (if supported by plugin, e.g. /host0)
3. mqttpart of group    (e.g. /eth0)
4. mqttsuffix    (e.g. /xmitdata)
Micha Mueller's avatar
Micha Mueller committed
297

298
Then the topic for the sensor is /mysystem/host0/eth0/xmitdata.
299

300
301
302
303
Additionally, sensors can be published automatically to the Storage Backend under their specified MQTT topics, by using the _auto-publish_ feature. Such feature is enabled via the _-a_ switch to DCDB Pusher. This way, the metadata tables in the Storage Backend will be populated with the information of all instantiated sensors, and these will become visible for queries.



304
# Plugins <a name ="plugins"></a>
Micha Mueller's avatar
Micha Mueller committed
305

Micha Mueller's avatar
Micha Mueller committed
306
The core of dcdbpusher is responsible of collecting all the values read by the sensors and sending them to the database. However, the main functionality of the sensors comes from the various plugins. Every plugin corresponds to a special sensor functionality.
307
All the different plugins share some same general principles in common regarding the sensor structure and configuration. Those principles should also be obeyed when [writing own plugins](#writingOwnPlugins).
Micha Mueller's avatar
Micha Mueller committed
308
1. There are three hierarchical levels (from bottom up):
Micha Mueller's avatar
Micha Mueller committed
309
310
311
    1. Sensors
    2. Groups
    3. Entities (optional)
Micha Mueller's avatar
Micha Mueller committed
312
313
2. There are no sensors on its own. Every sensor belongs to a group.
3. Multiple groups may or may not be aggregated by an entity. Entities can be optionally used by the plugin developer to aggregate groups which belong together, e.g. because they all query the same host.
Micha Mueller's avatar
Micha Mueller committed
314
315
316
317
318
319
4. Every hierarchical level is associated with some attributes. In the following are some hints on how one (when developing own plugins) should decide which attributes are associated with which level. Also for every level the common base attributes are listed (with explanation), which are specified independently of a plugin:
    1. Entities (if present) hold all attributes which are required to query the represented entity or all its associated groups have in common. Common entity attributes:
        * __default__     (One can define the name of a template group (see below) whose values and groups should be used as default)
        * Other entity attributes could be: mqttPart, protocol-version, host address and port.
    2. Groups hold all attributes which multiple sensors belonging to it share in common. Common group attributes:
        * __interval__    (Time in [ms] between two consecutive sensor reads. Default is 1000[ms] = 1[s])
320
321
        * __queueSize__   (Maximum number of sensor readings to queue to bridge connectivity issues with the CollectAgent. Default is 1024.
	* __minValues__   (Minimum number of sensor reads the sensors in a group should gather before they are sent together to the database. Useful to reduce MQTT-overhead. Default is 1 (every sensor value is sent on its own))
322
        * __mqttPart__    (Part for the [mqtt-topic](#mqttTopic) all sensors in this group should share in common)
Micha Mueller's avatar
Micha Mueller committed
323
        * __default__     (One can define the name of a template group (see below) whose values and sensors should be used as default)
Micha Mueller's avatar
Micha Mueller committed
324
    3. Sensors hold only those attributes which are necessary to uniquely identify the target sensor. Common base attributes:
325
        * __mqttsuffix__  (to make its [mqtt-topic](#mqttTopic) unique)
326
        * __delta__ (identifies a monotonic sensor. If set to "true", differences between successive readings are collected)
327
        * __deltaMax__ (used only for monotonic sensors. Establishes the maximum wrap-around value for the accumulator. Default is LLONG_MAX.)
Alessio Netti's avatar
Alessio Netti committed
328
        * __subSampling__ (subsampling factor S. If S>=1, only one reading every S is sent over MQTT, and the others are kept locally. If S<1, readings are never sent out and only kept locally)
329
		* __publish__ (if set to "true", the sensor will be published when the auto-publish feature is enabled. Otherwise it is omitted. Default is "true".)
Micha Mueller's avatar
Micha Mueller committed
330
5. Be aware that naming of sensor/group/entity is not fixed. A plugin developer can name them as he likes, e.g. counter/multicounter/host.
331
6. It is possible to define template sensors, groups, or entities in the config file. To specify a template sensor/group/entity simply prefix its definition with `template_` (see the example below). You can reference them later by using the `default` attribute. A template entity can consist of groups and these in turn can consist of sensors. When using a template, all of its attribute values are copied to the actual sensor. Copied attributes can be overwritten in the actual entity/group/sensor (some of them even should be overwritten, e.g. the mqttPart). Groups/sensors associated with a template are copied to the actual entity/group. One can specify further groups/sensors which are then added to those copied from the template. If a group's/sensor's name is identical to one of the groups/sensors introduced by the template, it will not be added but instead overwrites the corresponding group/sensor of the template (overwrite means: specified attributes replace template attributes. Otherwise template values are kept). This can be used to purposefully overwrite single (attributes of) groups/sensors introduced by a template. Template entitys/groups/sensors themself are never used in live operation of the plugin. They are purely cosmetic for convenient configuration.
Micha Mueller's avatar
Micha Mueller committed
332
333
 
In the following two abstract config files are shown to visualize the structure, one with the optional entity level and one without. A real example configuration file for every plugin should be provided in the `/config` directory. One should use them as a starting point to write own configuration files. 
Micha Mueller's avatar
Micha Mueller committed
334
```
Micha Mueller's avatar
Micha Mueller committed
335
336
337
338
 Without entity:
------------------------------------------------

global {
339
	mqttprefix /myprefix
Micha Mueller's avatar
Micha Mueller committed
340
341
342
343
344
345
346
	cacheInterval 120
	...
}

template_group temp1 {			;template group named temp1 (is not used in live operation)
	interval	1000			;While it is possible define entities/groups/sensors without
	minValues	3				;name it is strictly disregarded. Naming entities/groups/sensors
347
	mqttPart	/aa				;simplifies debugging and especially enables one to reference
Micha Mueller's avatar
Micha Mueller committed
348
349
								;templates later on. Also names should be always unique.
	sensor s1 {
350
		mqttsuffix		/s1
Micha Mueller's avatar
Micha Mueller committed
351
352
353
354
		...						;usually the sensor would require additional attributes
	}

	sensor s2 {
355
		mqttsuffix		/s2
Micha Mueller's avatar
Micha Mueller committed
356
357
358
359
360
361
		...
	}
}

group g1 {
	default		temp1			;use temp1 as template group
362
	mqttPart	/bb				;overwrite the mqttPart from temp1, to avoid identical
Micha Mueller's avatar
Micha Mueller committed
363
364
								;mqtt-topics if another group uses the same template
	sensor s3 {					;g1 has now 3 sensors: s1, s2 (both taken over from temp1)
365
		mqttsuffix		/s3		;and s3
Micha Mueller's avatar
Micha Mueller committed
366
367
368
369
370
371
		...
	}
}

group g2 {						;g2 consists of only one sensor (s21) and uses
 	sensor s21 {				;for every attribute the default value
372
		mqttsuffix	/s21		;by using a longer mqttsuffix we do not need a
Micha Mueller's avatar
Micha Mueller committed
373
374
375
376
377
		...						;group mqtt-part
	}
}

...
Micha Mueller's avatar
Micha Mueller committed
378
379
```

Micha Mueller's avatar
Micha Mueller committed
380
381
382
383
384
385
386
387
```
 With entity:
------------------------------------------------

global {
	...
}

Micha Mueller's avatar
Micha Mueller committed
388
template_entity temp1 {				;template entity which is not used in live operation
Micha Mueller's avatar
Micha Mueller committed
389
390
391
392
393
	...								;here go entity attributes

	group g1 {						
		interval	1000
		minValues	3
394
		mqttPart	/aa
Micha Mueller's avatar
Micha Mueller committed
395
396
		
		sensor s1 {
397
			mqttsuffix		/s1
Micha Mueller's avatar
Micha Mueller committed
398
399
400
401
			...						;usually the sensor would require additional attributes
		}
	
		sensor s2 {
402
			mqttsuffix		/s2
Micha Mueller's avatar
Micha Mueller committed
403
404
405
406
407
408
409
410
411
412
			...
		}
	}
}

entity ent1 {
	default		temp1				;use temp1 as template entity
	
	group g2 {						;ent1 has now two groups (g1 and g2) with a total of
	 	sensor s21 {				;3 sensors (s1, s2, s21)
413
			mqttsuffix	/s21
Micha Mueller's avatar
Micha Mueller committed
414
415
416
417
418
419
420
			...
		}
	}
}

...
```
Alessio Netti's avatar
Alessio Netti committed
421
One should have noticed the global section in the examples which was not mentioned before. In this section the user can (but is not obligated to) overwrite values from the `dcdbpusher.conf` for this plugin or specify other settings which are global for this plugin.
422

423
## IPMI <a name="ipmi"></a>
Micha Mueller's avatar
Micha Mueller committed
424

Micha Mueller's avatar
Micha Mueller committed
425
The [IPMI](https://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface) plugin enables dcdbpusher to collect sensor values offered by a baseboard management controller (BMC).
426

427
Explanation of the values specific for the IPMI plugin:
428

Micha Mueller's avatar
Micha Mueller committed
429
| Value | Explanation |
430
431
432
433
434
|:----- |:----------- |
| sessiontimeout | Session timeout value for the IPMI-connection
| retransmissiontimeout | Retransmission timeout value for the IPMI-connection
| username | For the remote IPMI-connection login credentials are required
| password | For the remote IPMI-connection login credentials are required
435
436
| ipmiversion | IPMI version to use for LAN connections (1 or 2)
| cipher | Cipher to use for IPMI 2.0 LAN connections (currently supported: 0, 1, 2, 3, 6, 7, 8, 11, 12)
437
| cmd | One can define a raw IPMI-command (in hex-notation) to be sent. In this case also the start and stop fields for the response have to be defined. Alternatively, one can define the record-ID of the sensor (see below).
438
439
| lsb | Offset where the least significant byte of the wanted return value of an IPMI raw command in the IPMI response<sup>[1](#ipmifn1)</sup>
| msb | Offset where the most significant byte of the wanted return value of an IPMI raw command in the IPMI response<sup>[1](#ipmifn1)</sup>
Micha Mueller's avatar
Micha Mueller committed
440
| recordId | Define the record-ID of the sensor to be read. One can look up the corresponding record-IDs for every sensor with the "ipmi-sensors" command line tool (ships with the freeipmi-library). Alternatively, one can define a raw IPMI-command (see above).
441
| factor | One can specify a factor to scale the read value before it is stored in the database (to adjust precision).
442
#### Footnotes <a name="ipmiFootnotes"></a>
443

444
<a name="ipmifn1">**1**</a>: &ensp; Use lsb > msb values if response is Little-endian (LSB first), use lsb < msb values if response is Big-Endian (MSB first). Maximum length is 8 bytes.  
445

446
## Perf-event <a name="perf"></a>
Micha Mueller's avatar
Micha Mueller committed
447

Micha Mueller's avatar
Micha Mueller committed
448
The Perfevent functionality is tasked with collecting data from the CPUs various performance counters (PMUs).
449
> NOTE &ensp;&ensp;&ensp; The perf-event plugin measures PMUs for all processes running on a specific CPU. Therefore a value of less than 1 is required in `/proc/sys/kernel/perf_event_paranoid`. Other values (>=1) restrict the access to PMUs. See this [footnote](#fn1) for additional information.
Micha Mueller's avatar
Micha Mueller committed
450

Micha Mueller's avatar
Micha Mueller committed
451
452
453
454
455
456
457
Explanation of the values specific for the perfevent plugin:

| Value | Explanation |
|:----- |:----------- |
| type | Type of which the counter should be. Each type determines different possible values for the config-field. Possible type-values are described below.
| config | Together with the type-field config determines which performance counter should be read. Possible values and what they measure are listed below.
| cpus | One can define a comma-separated list of cpu numbers (also value ranges can be specified, e.g. 2-4 equals 2,3,4). The hardware counter will then be only opened on the specified cpus.
458
| htVal | Specify multiplier for CPU aggregation. All CPUs where (CPU-number % htVal) has the same result are aggregated together. Only CPUs which are included in the "cpus" field (or all CPUs if the "cpus" field is not present) are aggregated. Background: To reduce the amount of pushed sensor data, it is possible to aggregate cpu readings. This feature is specifically aimed at processors which are hyper-threading enabled but can also come in handy for other use cases. Only the values pushed via the MQTT-Pusher are aggregated. There still exist sensors for each CPU and they store unaggregated readings in their local caches.
459
| mqttsufffix | In the context of the perfevent plugin the CPU id is integrated in the suffix. Sensors will be duplicated in order to open hardware counter for each CPU. Therefore an identifier in the style of "/cpuxx" will be pre-prended to the mqttSuffix when building the topics.
Micha Mueller's avatar
Micha Mueller committed
460

Micha Mueller's avatar
Micha Mueller committed
461

462
463
464
> NOTE &ensp;&ensp;&ensp; As perfevent counters are usually always monotonic, the delta attribute is by default set to true for all sensors. One has to explicitly set delta to "off" for a sensor to overwrite this behaviour.


465
### type and config <a name="perfTypeConfig"></a>
Micha Mueller's avatar
Micha Mueller committed
466

Micha Mueller's avatar
Micha Mueller committed
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
(see the [perf_event_open man-page](http://man7.org/linux/man-pages/man2/perf_event_open.2.html) for more detailed explanations)

| Type | Config | Explanation |
|:----:|:------ |:----------- |
| PERF_TYPE_HARDWARE | | generalized hardware CPU events
| " | PERF_COUNT_HW_CPU_CYCLES | total cycles (affected by frequency scaling)
| " | PERF_COUNT_HW_INSTRUCTIONS | retired instructions
| " | PERF_COUNT_HW_CACHE_REFERENCES | cache accesses (usually last level)
| " | PERF_COUNT_HW_CACHE_MISSES | cache misses (usually last level)
| " | PERF_COUNT_HW_BRANCH_INSTRUCTIONS | retired branch instructions
| " | PERF_COUNT_HW_BRANCH_MISSES | mispredicted branch instructions
| " | PERF_COUNT_HW_BUS_CYCLES | bus cycles
| " | PERF_COUNT_HW_STALLED_CYCLES_FRONTEND | stalled cycles during issue
| " | PERF_COUNT_HW_STALLED_CYCLES_BACKEND  | stalled cycles during retirement
| " | PERF_COUNT_HW_REF_CPU_CYCLES | total cycles (unaffected by frequency scaling)
482
| | | |
Micha Mueller's avatar
Micha Mueller committed
483
484
485
486
487
488
489
490
491
492
493
| PERF_TYPE_SOFTWARE | | software events provided by the kernel
| " | PERF_COUNT_SW_CPU_CLOCK | reports CPU clock
| " | PERF_COUNT_SW_TASK_CLOCK | clock count specific to the running task
| " | PERF_COUNT_SW_PAGE_FAULTS | number of page faults
| " | PERF_COUNT_SW_CONTEXT_SWITCHES | count of context switches
| " | PERF_COUNT_SW_CPU_MIGRATIONS | times the process has migrated to a new CPU
| " | PERF_COUNT_SW_PAGE_FAULTS_MIN | number of minor page faults (no disk-I/O)
| " | PERF_COUNT_SW_PAGE_FAULTS_MAJ | number of major page faults (disk-I/O was required)
| " | PERF_COUNT_SW_ALIGNMENT_FAULTS | alignment faults when accessing unaligned memory
| " | PERF_COUNT_SW_EMULATION_FAULTS | number of unimplemented instructions which had to be emulated
| " | PERF_COUNT_SW_DUMMY | placeholder which counts nothing
494
| | | |
Micha Mueller's avatar
Micha Mueller committed
495
496
| PERF_TYPE_TRACEPOINT | | not yet implemented
| PERF_TYPE_HW_CACHE | | not yet implemented
497
| | | |
Micha Mueller's avatar
Micha Mueller committed
498
| PERF_TYPE_RAW | | user can define architecture-specific raw events here.
499
| " | *XXXX* | Config must be a raw event config value, see <sup>[2](#fn2)</sup>
500
| | | |
Micha Mueller's avatar
Micha Mueller committed
501
| PERF_TYPE_BREAKPOINT | --- | config not required, any values will be ignored. However config must still be specified (even if empty)
502
|<Custom>|<Custom>| dynamic PMU event, see <sup>[3](#fn3)</sup>
Micha Mueller's avatar
Micha Mueller committed
503

504
#### Footnotes <a name="perfFootnotes"></a>
Micha Mueller's avatar
Micha Mueller committed
505
506
507

Taken from the [perf_event_open man-page](http://man7.org/linux/man-pages/man2/perf_event_open.2.html):

508
509
510
<a name="fn1">**1**</a>: &ensp; The pid and cpu arguments allow specifying which process and CPU to monitor:  
[...]  
pid == -1 and cpu >= 0  
511
This measures all processes/threads on the specified CPU. This requires CAP_SYS_ADMIN capability or a /proc/sys/kernel/perf_event_paranoid value of less than 1.
512
513
514
515

[...]

The perf_event_paranoid file can be set to restrict access to the performance counters.
Micha Mueller's avatar
Micha Mueller committed
516
517
518
519
520
521
522

| Value | Restriction |
|:-----:|:----------- |
| 2 | allow only user-space measurements (default since Linux 4.6) |
| 1 | allow both kernel and user measurements (default before Linux 4.6) |
| 0 | allow access to CPU-specific data but not raw trace-point samples |
| -1 | no restrictions |
Micha Mueller's avatar
Micha Mueller committed
523
524
525
	
The existence of the perf_event_paranoid file is the official method for determining if a kernel supports perf_event_open()

Micha Mueller's avatar
Micha Mueller committed
526
<a name="fn2">**2**</a>: &ensp; If type is *PERF_TYPE_RAW*, then a custom "raw" config value is needed. Most CPUs support events that are not covered by the "generalized" events. These are implementation defined; see your CPU manual (for example the Intel Volume 3B documentation or the AMD BIOS and Kernel Developer Guide). The libpfm4 library can be used to translate from the name in the architectural manual to the raw hex value perf_event_open() expects in this field.
Micha Mueller's avatar
Micha Mueller committed
527

528
<a name="fn3">**3**</a>: &ensp; Custom type and Config values can be specified to use the PMU of a specific device. The necessary configuration parameters can be obtained from the type and config files the respective in /sys/devices/<device> tree.
529

530
## snmp <a name="snmp"></a>
Micha Mueller's avatar
Micha Mueller committed
531

Micha Mueller's avatar
Micha Mueller committed
532
The SNMP plugin enables dcdbpusher to talk with devices which have an SNMP agent running and query requests from them. A SNMP sensor corresponds to a single value as identified by the unique OID. Sensors are aggregated by connections. See the exemplary snmp.conf file in the `config/` directory.
533
> NOTE &ensp;&ensp;&ensp; In the SNMP context the word privacy is used synonymously for encryption.
Micha Mueller's avatar
Micha Mueller committed
534

Micha Mueller's avatar
Micha Mueller committed
535
536
Explanation of the values specific for the SNMP plugin:

Micha Mueller's avatar
Micha Mueller committed
537
538
539
540
| Value | Explanation |
|:----- |:----------- |
| connection | An aggregating connection
| Type | Type of the SNMP application which runs on the device queried by the connection. Currently only the type Agent is supported.
541
| Host | Host name of the device which is to be queried. Follows net-snmp's [<transport-specifier>:]<transport-address> format, e.g. udp:hostname:161
Micha Mueller's avatar
Micha Mueller committed
542
| OIDPrefix | This OIDPrefix is used for all following sensors.
Micha Mueller's avatar
Micha Mueller committed
543
544
545
546
547
548
549
550
551
| |
| Version | Which SNMP version to use (either 2 (maps to 2c) or 3).
| Community | Which SNMP community to use (required only if version 2 is used).
| Username | Username to authenticate with (only required for version 3).
| SecLevel | The security level to be used (only required for version 3). Can be either `noAuthNoPriv` for no authentication and privacy ("privacy" is SNMPs synonym for encryption), `authNoPriv` for only authentication and `authPriv` for authentication and privacy.
| AuthProto | Which protocol to use for authentication (only required for version 3 and if SecLevel != noAuthNoPriv). Can be MD5 or SHA1.
| AuthKey | The passphrase for authentication (only required for version 3 and if SecLevel != noAuthNoPriv). Must be at least 8 characters long.
| PrivProto | Which protocol to use for privacy (only required for version 3 and if SecLevel = AuthPriv). Can be DES or AES.
| PrivKey | The passphrase for privacy encryption (only required for version 3 and if SecLevel = AuthPriv). Must be at least 8 characters long.
Micha Mueller's avatar
Micha Mueller committed
552
| mqttPart | Connection specific MQTT-part which is appended to the MQTT-prefix and succeded by the sensor specific suffix.
Micha Mueller's avatar
Micha Mueller committed
553
| |
Micha Mueller's avatar
Micha Mueller committed
554
| OID | OID suffix which together with the OIDPrefix forms the unique OID identifying a value to query.
Micha Mueller's avatar
Micha Mueller committed
555
| passphrase | has to be at least 8 characters long
Micha Mueller's avatar
Micha Mueller committed
556

557
## sysFS <a name="sysfs"></a>
Micha Mueller's avatar
Micha Mueller committed
558
559

SysFS sensors read data from sysFS files. The configuration file of the plugin corresponds to the generic plugin configuration with standalone sensors. Additionally for a sysFS sensor the following parameters are mandatory/possible:
560

Micha Mueller's avatar
Micha Mueller committed
561
562
Explanation of the values specific for the sysFS plugin:

Micha Mueller's avatar
Micha Mueller committed
563
564
565
| Value | Explanation |
|:----- |:----------- |
| path | Path to the sysFS file the sensor should read from. This parameter is mandatory.
566
| filter | One can define an optional filter if the sysFS file consists of more than only the sensor value. Please note the following points for filters: <br> 1.  The filter supports substitutions. For substitution sed syntax ("s/.../.../") is used. Therefore extended regular expressions (ERE) are used as regex-syntax. ERE is closest to Basic RE (BRE), which is actually used by sed, but requires less escaping. <br> 2.  If a \ ("backslash") is needed in the regex (for escaping), always use \\ ("double backslash") as the regex is read in as string and strings also escape with backslash <br> 3.  Whitespaces are actually used as value separators in the config files. If your filter requires whitespaces either use [[:space:]] in the regex or put it in quotation marks ("") <br> 4.  To be able to reference parts of the match (for substitution) use groups. Groups are created with parentheses. <br>  5.  If using character classes like [[:digit:]] always make sure to use double brackets ("[[" and "]]") or they will not be recognized. <br>  See [ERE-syntax](https://www.gnu.org/software/sed/manual/html_node/ERE-syntax.html#ERE-syntax) <br>  See [substitution syntax](http://www.boost.org/doc/libs/1_65_1/libs/regex/doc/html/boost_regex/format/sed_format.html)
567

568
## PDU <a name="pdu"></a>
569
570
571

The Power Delivery Unit (PDU) plugin is in charge of sending a network-request to the PDUs and gathering specified sensor data from the XML-file response.

Micha Mueller's avatar
Micha Mueller committed
572
Explanation of the values specific for the PDU plugin:
Micha Mueller's avatar
Micha Mueller committed
573

574
| Value | Explanation |
575
|:----- |:----------- |
576
| host | Hostname and (optional) port where to fetch the XML-file with sensor data from. If no port is specified, 443 is used. The plugin requests the file via HTTPS.
Micha Mueller's avatar
Micha Mueller committed
577
| request | Define the request to be sent to the host via HTTPS as a string. One should put the request in quotation marks (' " ') to enable the use of whitespaces within the request. Special characters (like usage of ' " ' within the request) should be escaped (' " ' --> ' \" '; ' \ ' --> ' \\\\ '; newline --> ' \n '; ...).
578
| path | Define a dot-separated path to the value to be read in the XML file. One can specify attribute values a node has to fulfil in brackets after the node. Even multiple (comma-separated) attributes can be given, however no whitespaces should be used (!) as they will not be filtered and could therefore be treat as part of the attributes name.
579

580
## BACnet <a name="bacnet"></a>
581
582

The BACnet plugin enables dcdbpusher to communicate and request data from devices which communicate via the BACnet protocol. A so called "read property" request is sent by the plugin to the BACnet devices as configured in the config file. The response value is then stored in the database. Usually one is only interested in collecting the current reading of a BACnet device (property PROP_PRESENT_VALUE, ID 85). However, also reading of other properties is supported.
583
> NOTE &ensp;&ensp;&ensp; On startup BACnet plugin does no device discovery. Instead it relies on the user providing a file with addresses of all required BACnet devices. One can generate such an address-file for example by using the `bacwi` demo tool provided by the BACnet-Stack.
584
585
586
587
588

Explanation of the values specific for the BACnet plugin:

| Value | Explanation |
|:----- |:----------- |
589
| address_cache | (Path to and) filename of the address cache file where the addresses of BACnet devices are stored (as noted above).
Micha Mueller's avatar
Micha Mueller committed
590
| interface | Network interface (IPv4) which is to be used by the plugin to send its "Read Property" requests.
591
| port | Port to use on the interface
592
| timeout |	Value of µ-seconds to wait for a response packet.
593
594
595
| apdu_timeout | Value of µ-seconds before sending a request times out.
| apdu_retries | How often should sending a request be retried.
| templates | One can define template properties in this section for convenience.
596
| factor | Described in the section for the [IPMI-plugin](#ipmi).
597
598
599
600
601
602
| devices | Starts the part in the config file where the actual BACnet devices are configured. A BACnet device consists of multiple nested parts: device > objects > properties.
| instance (device) | Instance of the BACnet-device.
| type | Type of the object within the device.
| instance (object) | Instance of the object within the device.
| id | ID of the property to be read from the BACnet device-object. Assignment of numbers to properties is done according to the enum as defined in `bacenum.h`.

603
## Opa (Intel Omni-Path Architecture) <a name="opa"></a>
604
605
606
607
608
609
610
611
612
613
614

The Opa plugin enables dcdbpusher to query various counters from omni-path interconnects.

Explanation of the values specific for the Opa plugin:

| Value | Explanation |
|:----- |:----------- |
| hfiNum | Number of which omni-path Host Fabric Interface to query (starting with 1)
| portNum | Number of which omni-path port to query (starting with 1)
| cntData | Name which data counter to query. A list of possible values can be found below.

615

Micha Mueller's avatar
Micha Mueller committed
616
> NOTE &ensp;&ensp;&ensp; As opa counters are usually always monotonic, the delta attribute is by default set to true for all sensors. One has to explicitly set delta to "off" for a sensor to overwrite this behaviour.
617

618
### counterData <a name="opaCounterData"></a>
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648

Possible values for cntData:
* portXmitData
* portRcvData
* portXmitPkts
* portRcvPkts
* portMulticastXmitPkts
* portMulticastRcvPkts
* localLinkIntegrityErrors
* fmConfigErrors
* portRcvErrors
* excessiveBufferOverruns
* portRcvConstraintErrors
* portRcvSwitchRelayErrors
* portXmitDiscards
* portXmitConstraintErrors
* portRcvRemotePhysicalErrors
* swPortCongestion
* portXmitWait
* portRcvFECN
* portRcvBECN
* portXmitTimeCong
* portXmitWastedBW
* portXmitWaitData
* portRcvBubble
* portMarkFECN
* linkErrorRecovery
* linkDowned
* uncorrectableErrors

Alessio Netti's avatar
Alessio Netti committed
649
650
651
652
653
654
655
## ProcFS (/proc filesystem) <a name="procfs"></a>

The ProcFS plugin enables dcdbpusher to sample resource usage metrics from a variety of files in the /proc virtual filesystem generated by the Linux kernel. Each defined sensor group is assigned to a specific file, which is periodically parsed. Currently supported files for sampling are:
* /proc/vmstat: contains virtual memory-related usage metrics;
* /proc/meminfo: contains RAM memory-related usage metrics (note that some of the metrics overlap with /proc/vmstat);
* /proc/stat: contains CPU usage-related metrics, both at system and core levels.

Alessio Netti's avatar
Alessio Netti committed
656
657
658
Note that the ProcFS plugin can operate in two distinct modes, with respect to MQTT topics:
* Automatic: if no sensors are specified, all metrics discovered in the underlying parsed file are acquired; sensors and MQTT topics are generated for them. Please be careful when configuring the plugin so that its MQTT topics do not overlap with those of other plugins.
* Manual: If at least one sensor is specified, only the corresponding metrics are acquired, and all other metrics in the parsed file are discarded. MQTT topics are assigned accordingly, using the mqttPrefix, mqttPart and mqttSuffix fields.
Alessio Netti's avatar
Alessio Netti committed
659
660
661
662
663

Explanation of the values specific for the ProcFS plugin:

| Value | Explanation |
|:----- |:----------- |
Alessio Netti's avatar
Alessio Netti committed
664
| type | The type of the file parsed by the sensor group. Can be either "vmstat", "meminfo", "procstat" or "sar"
Alessio Netti's avatar
Alessio Netti committed
665
| path | Path of the file, if different from the default path in the /proc filesystem
Alessio Netti's avatar
Alessio Netti committed
666
| cpus | Defines the set of CPU cores for which metrics must be collected. Only affects extraction of core-specific metrics (e.g. those in /proc/stat), whereas system-level metrics are acquired regardless of this setting. If no CPU cores set is defined, metrics for all available CPU cores will be collected. This parameter follows the same syntax as in the Perf-event plugin.
667
668
| htVal | Specify a multiplier for CPU aggregation. All CPUs where (CPU-number % htVal) has the same result are aggregated together. If specified, only CPUs which are included in the "cpus" field are aggregated. See Perf-event plugin for more details.
| scalingFactor | A scaling factor to be applied to ratio-like metrics. Default is 1000000.
669
| mqttSuffix | the mqttSuffix field in the ProcFS plugin, for sensors that are CPU-related such as the ones in procstat files, behaves as described for the perf-event plugin.
Alessio Netti's avatar
Alessio Netti committed
670
671
672
673
674
675
676
677
678

Additionally, sensors in the ProcFS plugin (defined with the "metric" keyword) support the following additional values:

| Value | Explanation |
|:----- |:----------- |
| type | The type of the specific metric associated to the sensor. This field must match the exact name of a metric in the underlying parsed file. If such a match does not exist, the sensor is discarded.
| perCpu | Boolean. If set to "on", the metric will be collected for each CPU core specified with the "cpus" sensor group parameter, or for all CPU cores if none is specified. Otherwise, the metric will be collected only at system level. This parameter has no effect on metrics that are not acquired at CPU core level (e.g. those in /proc/vmstat and /proc/meminfo).

The "type" field can be inferred for each sensor by simply checking the underlying file parsed by the sensor group. For /proc/stat files, on the other hand, CPU core-related metrics are collected in separate columns, which adopt the following naming scheme that can be used to define sensors: 
Alessio Netti's avatar
Alessio Netti committed
679
680
681
682
683
684
685
686
687
688
* col_user 
* col_nice 
* col_system 
* col_idle 
* col_iowait
* col_irq
* col_softirq
* col_steal
* col_guest
* col_guest_nice
Alessio Netti's avatar
Alessio Netti committed
689
690

Additional CPU-related metrics (that may be introduced in future versions of the Linux kernel) are not supported by the DCDB ProcFS plugin.
Alessio Netti's avatar
Alessio Netti committed
691
Note that for /proc/meminfo instances, an additional synthetic sensor of type "MemUsed" can be defined. This sensor will automatically extract the amount of used memory from the MemTotal and MemFree values present in meminfo files.
Alessio Netti's avatar
Alessio Netti committed
692

693
694
## Caliper <a name="caliper"></a>

695
696
697
698
The Caliper plugin collects application introspection data and therefore allows for application performance analysis in retrospect. This plugin is special as it does not work on its own but also requires a corresponding Caliper framework service running on application side. Please see Caliper's [official documentation](https://software.llnl.gov/Caliper/) for an exhaustive introduction.
The Caliper plugin supports two use cases:
* **Sampling** Low overhead automatic sampling of program counter (PC) values. Allows to analyze how much time was spent in a function in retrospect. Is the default case and enabled at all times.
* **Instrumentation** The user can instrument its application with Caliper annotations. The event data generated by the annotations is picked up by Pusher in addition to the sampling data and stored within DCDB. The annotation data can then be correlated with other monitoring data and allows for more fine-grained introspection than the sampling approach. On the downside, this usually induces more overhead.
699
700

### Caliper framework side
701
702
703
Caliper is an application introspection system. Its functionality stems from so called services. To work with the Pusher plugin the custom Dcdbpusher service for Caliper is required as well as the stock timestamp, pthread, and sampler service. For instrumentation, the event service is required as well.
Caliper has to be integrated into the application. This can be done either manually from the application developer or more automated by the system administrator by "hijacking" applications, e.g. overwriting main methods before execution. For the sampling case it is sufficient to use the Caliper framework just once, i.e. initialize it somewhere. However, one can still use the full functionality of Caliper services at own will in parallel.
The dcdbpusher service retrieves all relevant data from snapshots (Timestamp, CPU, PC (sampling), annotation data (instrumentation)). In case of sampling, the PC value is resolved to the actual function name via the binary's symbol data. Retrieved data is temporarily stored in a thread-local buffer. Eventually it gets written to a shared-memory queue which is used to communicate with the Pusher plugin.
704

705
The Caliper services can be controlled by the environment variables listed below:
Micha Müller's avatar
Pusher:    
Micha Müller committed
706
707
708

| Value | Explanation |
|:----- |:----------- |
709
710
711
712
| CALI_SERVICES_ENABLE | Specify which Caliper services to enable. Should be at least `event:sampler:timestamp:pthread:dcdbpusher`.
| CALI_SAMPLER_FREQUENCY | Frequency of the sampler service in Hz.
| CALI_TIMER_TIMESTAMP | Must be set to `true` to enrich all snapshots with timestamps of their creation.
| CALI_DCDBPUSHER_SUS_CYCLE | Symbol update service (SUS) cycle. To resolve PC values to function names the symbol data of the binary and loaded libraries is locally buffered in a so called "symbol index". In case a symbol could not be resolved by the dcdbpusher service (e.g. because the PC points to a newly loaded library that has not yet been indexed) it informs the background SUS thread to update the symbol index. Updating the symbol index is a heavy blocking task. To limit overhead and avoid continuous rebuild of the symbol index this environment variable can be used to set the cycle interval of the SUS in seconds (e.g. `export CALI_DCDBPUSHER_SUS_CYCLE=x`). The SUS only checks every x seconds if a symbol data update is requested. Increasing this value reduces overhead of repeated symbol index rebuilds but decreases responsiveness if rebuilds are requested seldomly. Default is 15 seconds.
Micha Müller's avatar
Pusher:    
Micha Müller committed
713

714
715
### Pusher plugin side
The pusher plugin serves as data sink for the snapshot data from the Caliper service. It can handle multiple different applications at once. However, it is mainly intended for only one application with multiple threads/(MPI-)processes. 
716
717
The plugin consumes the snapshot data from the shared-memory queue. For each unique snapshot data a new sensor is created. Subsequent encounters of the same data (function name or annotation) a reading value of 1 is stored with the sensor.
After an application terminates/timeouts or the maxSensors value is reached all sensors get cleared.
718
719
720
721
722

Explanation of the values specific for this plugin:

| Value | Explanation |
|:----- |:----------- |
723
| interval | In case of sampling, the interval value of a SensorGroup (or SingleSensor) has a small side effect. Within the same read cycle, multiple encounters of the same function name will be aggregated. Instead of a value of one for each encounter, only the aggregated value at the end of the read cycle will be actually stored with the corresponding sensor. Therefore the read cycle interval also determines the granularity of the sampling data. A lower interval results in more fine-grained sampling data resolution but also requires more memory in the storage backend. 
724
725
| maxSensors | To limit indefinite memory usage by the creation of new Sensor object one can specify a threshold here. If the number of sensors exceeds this value, they will be cleared. Default is 500.
| timeout | Number of read cycles after which an Caliper-application is assumed to be terminated if no new values have been received. Connection (shared memory) is teared down on timeout. Default is 15.
726

727
728
729
730
731
### Shortcomings
Usage of the Caliper plugin is currently obstructed by a few shortcomings:
* The Caliper framework has to be integrated manually by the user into its application for this plugin to work.
* The Caliper framework seems to interfere with Intel libraries, which may cause [application crashes](https://github.com/LLNL/Caliper/issues/223).

732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
## Metadata Management <a name="metadataManagement"></a>

Sensor metadata can be included in Pusher configurations, and will be published to the Storage Backend if the _auto-publish_ feature is enabled. A metadata block looks like the following:

```
...

group g2 {						
 	sensor pw {				
		mqttsuffix	/power		
		...
		
		metadata {
			unit	      Watt
			scale	      1000
			ttl           3600000
			operations    avg5,min,max
		}		
	}
}

...
```

Available fields that can be published as metadata are the following:

| Value | Explanation |
|:----- |:----------- |
| unit | String containing the unit of measure for the sensor, if any. |
| scale | Scaling factor (as a floating point value) to be applied to readings of the sensor upon queries. |
| ttl | Time to live for the readings of this sensor in milliseconds, after which they are automatically deleted from the Storage Backend. |
| monotonic | Boolean flag specifying whether the sensor is monotonic or not. |
| integrable | Boolean flag specifying whether the sensor's time series can be integrated or not. |
| interval | Sampling interval in milliseconds of the sensor. |
| operations | Comma-separated lists of operations available for the sensor, whose values can be retrieved by appending their names to the sensor name. |

768
An additional _isOperation_ field is available for the output sensors of operators in the Wintermute framework. If these output sensors are generated starting from a single input, this field allows to publish them as _operations_ of the latter, and will be listed in the associated database entry. For this to apply, however, the MQTT topic of the output sensor must be identical to that of the input, plus a suffix that describes the operation. Enabling this option invalidates all other metadata fields.
769

770
## Writing own plugins <a name="writingOwnPlugins"></a>
Micha Müller's avatar
Micha Müller committed
771
772
773
First make sure you read the [plugins](#plugins) section. 

It is recommended to use the `pluginGenerator/generatePlugin.sh` script to kick off plugin development. Running `./generatePlugin.sh -h` gives instructions on how to use the script. On success, the script generates all required source files for a new plugin with instructions on how to continue from there.
774