README.md 43 KB
Newer Older
1
2
3
# DCDB Pusher

### Table of contents
4
5
6
7
1. [Introduction](#introduction)
2. [dcdbpusher](#dcdbpusher)
    1. [Global Configuration](#globalConfiguration)
    2. [Rest API](#restApi)
Micha Mueller's avatar
Micha Mueller committed
8
        1. [List of ressources](#listOfRessources)
Micha Mueller's avatar
Micha Mueller committed
9
        2. [Examples](#restExamples)
10
11
12
13
14
15
16
17
18
19
20
21
    3. [MQTT topic](#mqttTopic)
3. [Plugins](#plugins)
    1. [IPMI](#ipmi)
    2. [Perf-event](#perf)
        1. [type and config](#perfTypeConfig)
        2. [Footnotes](#perfFootnotes)
    3. [SNMP](#snmp)
    4. [SysFS](#sysfs)
    5. [PDU](#pdu)
    6. [BACnet](#bacnet)
    7. [OPA](#opa)
        1. [counterData](#opaCounterData)
Alessio Netti's avatar
Alessio Netti committed
22
    8. [ProcFS](#procfs)
23
24
    9. [Caliper](#caliper)
    10. [Writing own plugins](#writingOwnPlugins)
25
26

## Introduction <a name="introduction"></a>
Micha Mueller's avatar
Micha Mueller committed
27
28
29
DCDB (DataCenter DataBase) is a database to collect various (sensor-)values of a datacenter for further analysis.
Harvesting of the data is task of the dcdbpusher.

30
# dcdbpusher <a name="dcdbpusher"></a>
Micha Mueller's avatar
Micha Mueller committed
31

Micha Mueller's avatar
Micha Mueller committed
32
This is a general MQTT pusher which sends values of various sensors to the DCDB-database.
33
It ships with plugins for BACnet, IPMI, PDU(proprietary Power Delivery Unit, but could be used as XML plugin), perfcounter, SNMP and sysFS.
Micha Mueller's avatar
Micha Mueller committed
34
35
36
37
38
39
40
41
42

Build it by simply running
```bash
make
```
or alternatively use
```bash
make debug
```
Micha Mueller's avatar
Micha Mueller committed
43
within the `dcdbpusher` directory to build a version which will print additional debug-information during runtime.
Micha Mueller's avatar
Micha Mueller committed
44

45
The logic for the various sensors is encapsulated into plugins (shared dynamic libraries; the makefile will take care of compiling them for you). The dcdbpusher will dynamically open the libraries if they are specified in the [global configuration](#GC) file. Vice versa, if selected sensor-functionality, e.g. sysFS is not specified, the corresponding shared library libdcdbplugin_sysfs.so does not have to be present. 
Micha Mueller's avatar
Micha Mueller committed
46

Micha Mueller's avatar
Micha Mueller committed
47
48
49
50
You can run dcdbpusher by executing
```bash
./dcdbpusher path/to/configfile/
```
51
or run
Micha Mueller's avatar
Micha Mueller committed
52
53
54
```bash
./dcdbpusher -h
```
Micha Mueller's avatar
Micha Mueller committed
55
to print the help-section of dcdbpusher.
Micha Mueller's avatar
Micha Mueller committed
56

Alessio Netti's avatar
Alessio Netti committed
57
Dcdbpusher will check the given file-path for the global configuration file which has to be named `dcdbpusher.conf`.
Micha Mueller's avatar
Micha Mueller committed
58

59
### Global Configuration  <a name="globalConfiguration"></a>
Micha Mueller's avatar
Micha Mueller committed
60

Micha Mueller's avatar
Micha Mueller committed
61
The global configuration specifies various settings for dcdbpusher in general, e.g. which plugins should be loaded etc.
Alessio Netti's avatar
Alessio Netti committed
62
Please have a look at the provided `config/dcdbpusher.conf` example to get familiar with the file scheme. The example also forms a good starting point for writing a custom `dcdbpusher.conf`. The different sections and values are explained in the following table:
Micha Mueller's avatar
Micha Mueller committed
63

Micha Mueller's avatar
Micha Mueller committed
64
| Value | Explanation |
Micha Mueller's avatar
Micha Mueller committed
65
66
|:----- |:----------- |
| global | Wrapper structure for the global values.
Micha Mueller's avatar
Micha Mueller committed
67
68
| mqttBroker | Define address and port of the MQTT-broker which collects the messages (sensor values) send by dcdbpusher.
| mqttprefix | To not rewrite a full MQTT-topic for every sensor one can specify here a consistent prefix.
69
| sensorpattern | pattern used to perform automatic sensor name publishing. See the corresponding [section](#autopublish) for more information.
Micha Mueller's avatar
Micha Mueller committed
70
| threads | Specify how many threads should be created to handle the sensors async. Default value of threads is 1. Note that the MQTTPusher always starts an extra thread. So the actual number of started threads is always one more than defined here. Specifying not enough threads can result in a delay for some sensors until they are read.
71
| maxMsgNum | To avoid publishing too many MQTT messages at once you can define here a maximum count of values that are published in one turn. After reaching this limit the MQTTPusher will be forced to sleep for a short time before continuing.
72
73
|maxInflightMsgNum|Maximum number of messages that can be "inflight". This is a MQTT term and should match the broker's setting. Set to 0 for unlimited.
|maxQueuedMsgNum|Maximum number of MQTT messages (including "inflight") that should be queued. This is to limit the amount of memory that is used for buffering. Set to 0 for unlimited.
Micha Mueller's avatar
Micha Mueller committed
74
75
76
77
| verbosity | Level of detail in the logfile (dcdb.log). Set to a value between 5 (all log-messages, default) and 0 (only fatal messages). NOTE: level of verbosity for the command-line log can be set via the -v flag independently when invoking dcdbpusher.
| daemonize | Set to 'true' if dcdbpusher should run detached as daemon. Default is false.
| tempdir | One can specify a writeable directory where dcdbpusher can write its temporary and logging files to. Default is the current (' ./ ' ) directory.
| cacheInterval | Define a time interval in seconds. The last sensor readings within this time interval will be kept. This value can be overwritten by plugins.
Micha Mueller's avatar
Micha Mueller committed
78
| | |
79
| restAPI | Bundles all values related to the RestAPI. See the corresponding [section](#restApi) for more information on supported functionality.
80
| address | Define (IP-)address and port where the REST API server should run on.
81
82
83
| certificate | Provide the (path and) file which the HTTPS server should use as certificate.
| privateKey | Provide the (path and) file which should be used as corresponding private key for the certificate. If private key and certificate are stored in the same file one should nevertheless provide the path to the cert-file here again.
| dhFile | Provide the (path and) file where Diffie-Hellman parameters for the key exchange are stored.
Micha Mueller's avatar
Micha Mueller committed
84
| authkey | This struct is used to define authentication key tokens for the REST API. Within the struct, define which operations over the REST API are allowed for the token (e.g. PUTReq or GETReq). Each token must be unique.
Micha Mueller's avatar
Micha Mueller committed
85
86
| | |
| plugins | In this section one can specify the plugins which should be used.
Micha Mueller's avatar
Micha Mueller committed
87
| plugin _name_ | The plugin name is used to build the corresponding lib-name (e.g. sysfs --> libdcdbplugin_sysfs.1.0)
Micha Mueller's avatar
Micha Mueller committed
88
| path | Specify the path where the plugin (the shared library) is located. If left empty, dcdbpusher will look in the default lib-directories (usr/lib and friends) for the plugin-file.
Alessio Netti's avatar
Alessio Netti committed
89
| config | One can specify a separate config-file (including path to it) for the plugin to use. If not specified, dcdbpusher will look up pluginName.conf (e.g. sysfs.conf) in the same directory where dcdbpusher.conf is located.
Micha Mueller's avatar
Micha Mueller committed
90
91
| | |

92
Formats of the other sensor-specific config-files are explained in the corresponding [subsections](#ipmi). Example configuration-files can be found in the `config/` directory.
Micha Mueller's avatar
Micha Mueller committed
93
94


95
## REST API <a name="restApi"></a>
96

97
Dcdbpusher runs a HTTPS server which provides some functionality to be controlled over a RESTful API. The API is by default hosted at port 8000 on 127.0.0.1 but the address can be changed in the [`dcdbpusher.conf`](#globalConfiguration).
98
99

A HTTPS request to dcdbpusher should have the following format: `[GET|PUT] host:port[ressource]?[queries]`.
100
Tables with allowed ressources sorted by REST methods can be found below. A query consists of a key-value pair of the format `key=value`. Multiple queries are separated by semicolons(';'). For all requests (except /help) basic authentication credentials must be provided.
101

Micha Mueller's avatar
Micha Mueller committed
102
### List of ressources <a name="listOfRessources"></a>
103

104
105
<table>
  <tr>
Micha Mueller's avatar
Micha Mueller committed
106
    <td colspan="2"><b>Ressource</b></td>
Micha Mueller's avatar
Micha Mueller committed
107
    <td colspan="2">Description</td>
108
109
  </tr>
  <tr>
Micha Mueller's avatar
Micha Mueller committed
110
111
112
113
114
  	<td>Query</td>
  	<td>Value</td>
  	<td>Opt.</td>
  	<td>Description</td>
  </tr>
Micha Mueller's avatar
Micha Mueller committed
115
116
117
</table>

<table>
Micha Mueller's avatar
Micha Mueller committed
118
  <tr>
Micha Mueller's avatar
Micha Mueller committed
119
    <td colspan="2"><b>GET /help</b></td>
Micha Mueller's avatar
Micha Mueller committed
120
    <td colspan="2">Return a cheatsheet of possible REST API endpoints.</td>
Micha Mueller's avatar
Micha Mueller committed
121
122
123
124
  </tr>
  <tr>
  	<td colspan="4">No queries.</td>
  </tr>
Micha Mueller's avatar
Micha Mueller committed
125
126
127
</table>

<table>
Micha Mueller's avatar
Micha Mueller committed
128
  <tr>
Micha Mueller's avatar
Micha Mueller committed
129
    <td colspan="2"><b>GET /plugins</b></td>
Micha Mueller's avatar
Micha Mueller committed
130
131
132
133
134
    <td colspan="2">List all loaded dcdbpusher plugins.</td>
  </tr>
  <tr>
  	<td>json</td>
  	<td>"true"</td>
Micha Mueller's avatar
Micha Mueller committed
135
  	<td>Yes</td>
Micha Mueller's avatar
Micha Mueller committed
136
137
  	<td>Format response as json.</td>
  </tr>
Micha Mueller's avatar
Micha Mueller committed
138
139
140
</table>

<table>
Micha Mueller's avatar
Micha Mueller committed
141
  <tr>
Micha Mueller's avatar
Micha Mueller committed
142
    <td colspan="2"><b>GET /sensors</b></td>
Micha Mueller's avatar
Micha Mueller committed
143
144
145
146
147
    <td colspan="2">List all sensors of a specific plugin.</td>
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
Micha Mueller's avatar
Micha Mueller committed
148
  	<td>No</td>
Micha Mueller's avatar
Micha Mueller committed
149
150
151
152
153
  	<td>Specify the plugin.</td>
  </tr>
  <tr>
  	<td>json</td>
  	<td>"true"</td>
Micha Mueller's avatar
Micha Mueller committed
154
  	<td>Yes</td>
Micha Mueller's avatar
Micha Mueller committed
155
156
  	<td>Format response as json.</td>
  </tr>
Micha Mueller's avatar
Micha Mueller committed
157
158
159
</table>

<table>
Micha Mueller's avatar
Micha Mueller committed
160
  <tr>
Micha Mueller's avatar
Micha Mueller committed
161
    <td colspan="2"><b>GET /average</b></td>
162
    <td colspan="2">Get the average of the last readings of a sensor. Also allows access to analytics sensors.</td>
Micha Mueller's avatar
Micha Mueller committed
163
164
165
166
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
Micha Mueller's avatar
Micha Mueller committed
167
  	<td>No</td>
Micha Mueller's avatar
Micha Mueller committed
168
169
170
171
  	<td>Specify the plugin.</td>
  </tr>
  <tr>
  	<td>sensor</td>
172
  	<td>All sensor names of the plugin or the operator manager.</td>
Micha Mueller's avatar
Micha Mueller committed
173
  	<td>No</td>
Micha Mueller's avatar
Micha Mueller committed
174
175
176
177
178
  	<td>Specify the sensor within the plugin.</td>
  </tr>
  <tr>
  	<td>interval</td>
  	<td>Number of seconds.</td>
Micha Mueller's avatar
Micha Mueller committed
179
  	<td>Yes</td>
Micha Mueller's avatar
Micha Mueller committed
180
181
  	<td>Use only readings more recent than (now - interval) for average calculation. Defaults to zero, i.e. all cached sensor readings are included in average calculation.</td>
  </tr>
Micha Mueller's avatar
Micha Mueller committed
182
183
</table>

184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
<table>
  <tr>
    <td colspan="2"><b>PUT /load</b></td>
    <td colspan="2">Load and intitialize a new plugin but do not start it. Use the /start request to kick off the plugin's data collection.</td>
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>Plugin name.</td>
  	<td>No</td>
  	<td>Name of the new plugin. Is used to build the shared library file name which holds the plugin. Shared lib file name is of the form libdcdbplugin_PLUGINNAME.so (or .dylib for Apple).</td>
  </tr>
  <tr>
  	<td>path</td>
  	<td>A file path.</td>
  	<td>Yes</td>
  	<td>Path to where the shared library for the plugin is located. If not specified the default library directories (urs/lib and friends) are searched.</td>
  </tr>
  <tr>
  	<td>config</td>
  	<td>A file path including file name.</td>
  	<td>Yes</td>
  	<td>Path and name of the plugin configuration file. If not specified we will search for "./PLUGINNAME.conf".</td>
  </tr>
</table>

<table>
  <tr>
    <td colspan="2"><b>PUT /unload</b></td>
    <td colspan="2">Unload a plugin, removing it completely from dcdbpusher. To use the plugin again one has to /load it first.</td>
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
  	<td>No</td>
  	<td>Specify the plugin.</td>
  </tr>
</table>

Micha Mueller's avatar
Micha Mueller committed
222
<table>
Micha Mueller's avatar
Micha Mueller committed
223
  <tr>
Micha Mueller's avatar
Micha Mueller committed
224
    <td colspan="2"><b>PUT /start</b></td>
Micha Mueller's avatar
Micha Mueller committed
225
226
227
228
229
    <td colspan="2">Start a plugin, i.e. its sensors start polling.</td>
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
Micha Mueller's avatar
Micha Mueller committed
230
  	<td>No</td>
Micha Mueller's avatar
Micha Mueller committed
231
232
  	<td>Specify the plugin.</td>
  </tr>
Micha Mueller's avatar
Micha Mueller committed
233
234
235
</table>

<table>
Micha Mueller's avatar
Micha Mueller committed
236
  <tr>
Micha Mueller's avatar
Micha Mueller committed
237
    <td colspan="2"><b>PUT /stop</b></td>
Micha Mueller's avatar
Micha Mueller committed
238
239
240
241
242
    <td colspan="2">Stop a plugin, i.e. its sensors stop polling.</td>
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
Micha Mueller's avatar
Micha Mueller committed
243
  	<td>No</td>
Micha Mueller's avatar
Micha Mueller committed
244
245
  	<td>Specify the plugin.</td>
  </tr>
Micha Mueller's avatar
Micha Mueller committed
246
247
248
</table>

<table>
Micha Mueller's avatar
Micha Mueller committed
249
  <tr>
Micha Mueller's avatar
Micha Mueller committed
250
    <td colspan="2"><b>PUT /reload</b></td>
Micha Mueller's avatar
Micha Mueller committed
251
252
253
254
255
    <td colspan="2">Reload a plugin's configuration (includes fresh creation of a plugin's sensors and a plugin restart).</td>
  </tr>
  <tr>
  	<td>plugin</td>
  	<td>All plugin names.</td>
Micha Mueller's avatar
Micha Mueller committed
256
  	<td>No</td>
Micha Mueller's avatar
Micha Mueller committed
257
  	<td>Specify the plugin.</td>
258
259
260
  </tr>
</table>

Micha Mueller's avatar
Micha Mueller committed
261
> NOTE &ensp;&ensp;&ensp;&ensp;&ensp; Opt. = Optional
262

263
### Examples <a name="restExamples"></a>
264

Micha Mueller's avatar
Micha Mueller committed
265
Two examples for HTTPS requests (authentication credentials not shown):
Micha Mueller's avatar
Micha Mueller committed
266
267

```bash
Micha Mueller's avatar
Micha Mueller committed
268
GET https://localhost:8000/average?plugin=sysfs;sensor=freq1;interval=15
Micha Mueller's avatar
Micha Mueller committed
269
270
```
```bash
Micha Mueller's avatar
Micha Mueller committed
271
PUT https://localhost:8000/stop?plugin=bacnet
Micha Mueller's avatar
Micha Mueller committed
272
```
273

Micha Mueller's avatar
Micha Mueller committed
274
## MQTT topic <a name="mqttTopic"></a>
275

276
For communication between the different DCDB-components (database, dcdbpusher) the [MQTT protocol](https://mqtt.org/) is used. In order to identify each sensor, everyone has to have a unique MQTT topic assigned. A MQTT topic for DCDB consists of exactly 112 bits (= 28 hex characters), not including '/' separators. The topic for a sensor is built by appending up to 4 parts:
277
278
279
280
1. mqttprefix    (e.g. /mysystem)
2. mqttpart of entity (if supported by plugin, e.g. /host0)
3. mqttpart of group    (e.g. /eth0)
4. mqttsuffix    (e.g. /xmitdata)
Micha Mueller's avatar
Micha Mueller committed
281

282
Then the topic for the sensor is /mysystem/host0/eth0/xmitdata.
283

284
# Plugins <a name ="plugins"></a>
Micha Mueller's avatar
Micha Mueller committed
285

Micha Mueller's avatar
Micha Mueller committed
286
The core of dcdbpusher is responsible of collecting all the values read by the sensors and sending them to the database. However, the main functionality of the sensors comes from the various plugins. Every plugin corresponds to a special sensor functionality.
287
All the different plugins share some same general principles in common regarding the sensor structure and configuration. Those principles should also be obeyed when [writing own plugins](#writingOwnPlugins).
Micha Mueller's avatar
Micha Mueller committed
288
1. There are three hierarchical levels (from bottom up):
Micha Mueller's avatar
Micha Mueller committed
289
290
291
    1. Sensors
    2. Groups
    3. Entities (optional)
Micha Mueller's avatar
Micha Mueller committed
292
293
2. There are no sensors on its own. Every sensor belongs to a group.
3. Multiple groups may or may not be aggregated by an entity. Entities can be optionally used by the plugin developer to aggregate groups which belong together, e.g. because they all query the same host.
Micha Mueller's avatar
Micha Mueller committed
294
295
296
297
298
299
300
4. Every hierarchical level is associated with some attributes. In the following are some hints on how one (when developing own plugins) should decide which attributes are associated with which level. Also for every level the common base attributes are listed (with explanation), which are specified independently of a plugin:
    1. Entities (if present) hold all attributes which are required to query the represented entity or all its associated groups have in common. Common entity attributes:
        * __default__     (One can define the name of a template group (see below) whose values and groups should be used as default)
        * Other entity attributes could be: mqttPart, protocol-version, host address and port.
    2. Groups hold all attributes which multiple sensors belonging to it share in common. Common group attributes:
        * __interval__    (Time in [ms] between two consecutive sensor reads. Default is 1000[ms] = 1[s])
        * __minValues__   (Minimum number of sensor reads the sensors in a group should gather before they are sent together to the database. Useful to reduce MQTT-overhead. Default is 1 (every sensor value is sent on its own))
301
        * __mqttPart__    (Part for the [mqtt-topic](#mqttTopic) all sensors in this group should share in common)
Micha Mueller's avatar
Micha Mueller committed
302
        * __default__     (One can define the name of a template group (see below) whose values and sensors should be used as default)
Micha Mueller's avatar
Micha Mueller committed
303
    3. Sensors hold only those attributes which are necessary to uniquely identify the target sensor. Common base attributes:
304
        * __mqttsuffix__  (to make its [mqtt-topic](#mqttTopic) unique)
305
        * __delta__ (identifies a monotonic sensor. If set to "true", differences between successive readings are collected)
Alessio Netti's avatar
Alessio Netti committed
306
        * __subSampling__ (subsampling factor S. If S>=1, only one reading every S is sent over MQTT, and the others are kept locally. If S<1, readings are never sent out and only kept locally)
307
		* __publish__ (if set to "true", the sensor will be published when the auto-publish feature is enabled. Otherwise it is omitted. Default is "true".)
Micha Mueller's avatar
Micha Mueller committed
308
309
310
311
5. Be aware that naming of sensor/group/entity is not fixed. A plugin developer can name them as he likes, e.g. counter/multicounter/host.
6. It is possible to define template groups or entities in the config file, but not template sensors (as a sensor should only consists of attributs which make him unique this would not be too useful). To specify a template group/entity simply prefix its definition with `template_` (see the example below). You can reference them later by using the `default` attribute. A template entity can consist of groups and these in turn can consist of sensors. When using a template, all of its attribute values are copied to the actual sensor. Copied attributes can be overwritten in the actual entity/sensor (some of them even should be overwritten, e.g. the mqttPart). However, groups/sensors associated with a template are copied to the actual entity/group and can NOT be overwritten. One can specify further groups/sensors which are then added to those copied from the template. Template entitys/groups itself or sensors within them are never used in live operation of the plugin. They are purely cosmetic for convenient configuration.
 
In the following two abstract config files are shown to visualize the structure, one with the optional entity level and one without. A real example configuration file for every plugin should be provided in the `/config` directory. One should use them as a starting point to write own configuration files. 
Micha Mueller's avatar
Micha Mueller committed
312
```
Micha Mueller's avatar
Micha Mueller committed
313
314
315
316
 Without entity:
------------------------------------------------

global {
317
	mqttprefix /myprefix
Micha Mueller's avatar
Micha Mueller committed
318
319
320
321
322
323
324
	cacheInterval 120
	...
}

template_group temp1 {			;template group named temp1 (is not used in live operation)
	interval	1000			;While it is possible define entities/groups/sensors without
	minValues	3				;name it is strictly disregarded. Naming entities/groups/sensors
325
	mqttPart	/aa				;simplifies debugging and especially enables one to reference
Micha Mueller's avatar
Micha Mueller committed
326
327
								;templates later on. Also names should be always unique.
	sensor s1 {
328
		mqttsuffix		/s1
Micha Mueller's avatar
Micha Mueller committed
329
330
331
332
		...						;usually the sensor would require additional attributes
	}

	sensor s2 {
333
		mqttsuffix		/s2
Micha Mueller's avatar
Micha Mueller committed
334
335
336
337
338
339
		...
	}
}

group g1 {
	default		temp1			;use temp1 as template group
340
	mqttPart	/bb				;overwrite the mqttPart from temp1, to avoid identical
Micha Mueller's avatar
Micha Mueller committed
341
342
								;mqtt-topics if another group uses the same template
	sensor s3 {					;g1 has now 3 sensors: s1, s2 (both taken over from temp1)
343
		mqttsuffix		/s3		;and s3
Micha Mueller's avatar
Micha Mueller committed
344
345
346
347
348
349
		...
	}
}

group g2 {						;g2 consists of only one sensor (s21) and uses
 	sensor s21 {				;for every attribute the default value
350
		mqttsuffix	/s21		;by using a longer mqttsuffix we do not need a
Micha Mueller's avatar
Micha Mueller committed
351
352
353
354
355
		...						;group mqtt-part
	}
}

...
Micha Mueller's avatar
Micha Mueller committed
356
357
```

Micha Mueller's avatar
Micha Mueller committed
358
359
360
361
362
363
364
365
```
 With entity:
------------------------------------------------

global {
	...
}

Micha Mueller's avatar
Micha Mueller committed
366
template_entity temp1 {				;template entity which is not used in live operation
Micha Mueller's avatar
Micha Mueller committed
367
368
369
370
371
	...								;here go entity attributes

	group g1 {						
		interval	1000
		minValues	3
372
		mqttPart	/aa
Micha Mueller's avatar
Micha Mueller committed
373
374
		
		sensor s1 {
375
			mqttsuffix		/s1
Micha Mueller's avatar
Micha Mueller committed
376
377
378
379
			...						;usually the sensor would require additional attributes
		}
	
		sensor s2 {
380
			mqttsuffix		/s2
Micha Mueller's avatar
Micha Mueller committed
381
382
383
384
385
386
387
388
389
390
			...
		}
	}
}

entity ent1 {
	default		temp1				;use temp1 as template entity
	
	group g2 {						;ent1 has now two groups (g1 and g2) with a total of
	 	sensor s21 {				;3 sensors (s1, s2, s21)
391
			mqttsuffix	/s21
Micha Mueller's avatar
Micha Mueller committed
392
393
394
395
396
397
398
			...
		}
	}
}

...
```
Alessio Netti's avatar
Alessio Netti committed
399
One should have noticed the global section in the examples which was not mentioned before. In this section the user can (but is not obligated to) overwrite values from the `dcdbpusher.conf` for this plugin or specify other settings which are global for this plugin.
400

401
## IPMI <a name="ipmi"></a>
Micha Mueller's avatar
Micha Mueller committed
402

Micha Mueller's avatar
Micha Mueller committed
403
The [IPMI](https://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface) plugin enables dcdbpusher to collect sensor values offered by a baseboard management controller (BMC).
404

405
Explanation of the values specific for the IPMI plugin:
406

Micha Mueller's avatar
Micha Mueller committed
407
| Value | Explanation |
408
409
410
411
412
|:----- |:----------- |
| sessiontimeout | Session timeout value for the IPMI-connection
| retransmissiontimeout | Retransmission timeout value for the IPMI-connection
| username | For the remote IPMI-connection login credentials are required
| password | For the remote IPMI-connection login credentials are required
413
414
| ipmiversion | IPMI version to use for LAN connections (1 or 2)
| cipher | Cipher to use for IPMI 2.0 LAN connections (currently supported: 0, 1, 2, 3, 6, 7, 8, 11, 12)
415
| cmd | One can define a raw IPMI-command (in hex-notation) to be sent. In this case also the start and stop fields for the response have to be defined. Alternatively, one can define the record-ID of the sensor (see below).
416
417
| lsb | Offset where the least significant byte of the wanted return value of an IPMI raw command in the IPMI response<sup>[1](#ipmifn1)</sup>
| msb | Offset where the most significant byte of the wanted return value of an IPMI raw command in the IPMI response<sup>[1](#ipmifn1)</sup>
Micha Mueller's avatar
Micha Mueller committed
418
| recordId | Define the record-ID of the sensor to be read. One can look up the corresponding record-IDs for every sensor with the "ipmi-sensors" command line tool (ships with the freeipmi-library). Alternatively, one can define a raw IPMI-command (see above).
419
| factor | One can specify a factor to scale the read value before it is stored in the database (to adjust precision).
420
#### Footnotes <a name="ipmiFootnotes"></a>
421

422
<a name="ipmifn1">**1**</a>: &ensp; Use lsb > msb values if response is Little-endian (LSB first), use lsb < msb values if response is Big-Endian (MSB first). Maximum length is 8 bytes.  
423

424
## Perf-event <a name="perf"></a>
Micha Mueller's avatar
Micha Mueller committed
425

Micha Mueller's avatar
Micha Mueller committed
426
The Perfevent functionality is tasked with collecting data from the CPUs various performance counters (PMUs).
427
> NOTE &ensp;&ensp;&ensp; The perf-event plugin measures PMUs for all processes running on a specific CPU. Therefore a value of less than 1 is required in `/proc/sys/kernel/perf_event_paranoid`. Other values (>=1) restrict the access to PMUs. See this [footnote](#fn1) for additional information.
Micha Mueller's avatar
Micha Mueller committed
428

Micha Mueller's avatar
Micha Mueller committed
429
430
431
432
433
434
435
Explanation of the values specific for the perfevent plugin:

| Value | Explanation |
|:----- |:----------- |
| type | Type of which the counter should be. Each type determines different possible values for the config-field. Possible type-values are described below.
| config | Together with the type-field config determines which performance counter should be read. Possible values and what they measure are listed below.
| cpus | One can define a comma-separated list of cpu numbers (also value ranges can be specified, e.g. 2-4 equals 2,3,4). The hardware counter will then be only opened on the specified cpus.
436
| htVal | Specify multiplier for CPU aggregation. All CPUs where (CPU-number % htVal) has the same result are aggregated together. Only CPUs which are included in the "cpus" field (or all CPUs if the "cpus" field is not present) are aggregated. Background: To reduce the amount of pushed sensor data, it is possible to aggregate cpu readings. This feature is specifically aimed at processors which are hyper-threading enabled but can also come in handy for other use cases. Only the values pushed via the MQTT-Pusher are aggregated. There still exist sensors for each CPU and they store unaggregated readings in their local caches.
437
| mqttsufffix | In the context of the perfevent plugin the CPU id is integrated in the suffix. Sensors will be duplicated in order to open hardware counter for each CPU. Therefore an identifier in the style of "/cpuxx" will be pre-prended to the mqttSuffix when building the topics.
Micha Mueller's avatar
Micha Mueller committed
438

Micha Mueller's avatar
Micha Mueller committed
439

440
441
442
> NOTE &ensp;&ensp;&ensp; As perfevent counters are usually always monotonic, the delta attribute is by default set to true for all sensors. One has to explicitly set delta to "off" for a sensor to overwrite this behaviour.


443
### type and config <a name="perfTypeConfig"></a>
Micha Mueller's avatar
Micha Mueller committed
444

Micha Mueller's avatar
Micha Mueller committed
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
(see the [perf_event_open man-page](http://man7.org/linux/man-pages/man2/perf_event_open.2.html) for more detailed explanations)

| Type | Config | Explanation |
|:----:|:------ |:----------- |
| PERF_TYPE_HARDWARE | | generalized hardware CPU events
| " | PERF_COUNT_HW_CPU_CYCLES | total cycles (affected by frequency scaling)
| " | PERF_COUNT_HW_INSTRUCTIONS | retired instructions
| " | PERF_COUNT_HW_CACHE_REFERENCES | cache accesses (usually last level)
| " | PERF_COUNT_HW_CACHE_MISSES | cache misses (usually last level)
| " | PERF_COUNT_HW_BRANCH_INSTRUCTIONS | retired branch instructions
| " | PERF_COUNT_HW_BRANCH_MISSES | mispredicted branch instructions
| " | PERF_COUNT_HW_BUS_CYCLES | bus cycles
| " | PERF_COUNT_HW_STALLED_CYCLES_FRONTEND | stalled cycles during issue
| " | PERF_COUNT_HW_STALLED_CYCLES_BACKEND  | stalled cycles during retirement
| " | PERF_COUNT_HW_REF_CPU_CYCLES | total cycles (unaffected by frequency scaling)
460
| | | |
Micha Mueller's avatar
Micha Mueller committed
461
462
463
464
465
466
467
468
469
470
471
| PERF_TYPE_SOFTWARE | | software events provided by the kernel
| " | PERF_COUNT_SW_CPU_CLOCK | reports CPU clock
| " | PERF_COUNT_SW_TASK_CLOCK | clock count specific to the running task
| " | PERF_COUNT_SW_PAGE_FAULTS | number of page faults
| " | PERF_COUNT_SW_CONTEXT_SWITCHES | count of context switches
| " | PERF_COUNT_SW_CPU_MIGRATIONS | times the process has migrated to a new CPU
| " | PERF_COUNT_SW_PAGE_FAULTS_MIN | number of minor page faults (no disk-I/O)
| " | PERF_COUNT_SW_PAGE_FAULTS_MAJ | number of major page faults (disk-I/O was required)
| " | PERF_COUNT_SW_ALIGNMENT_FAULTS | alignment faults when accessing unaligned memory
| " | PERF_COUNT_SW_EMULATION_FAULTS | number of unimplemented instructions which had to be emulated
| " | PERF_COUNT_SW_DUMMY | placeholder which counts nothing
472
| | | |
Micha Mueller's avatar
Micha Mueller committed
473
474
| PERF_TYPE_TRACEPOINT | | not yet implemented
| PERF_TYPE_HW_CACHE | | not yet implemented
475
| | | |
Micha Mueller's avatar
Micha Mueller committed
476
| PERF_TYPE_RAW | | user can define architecture-specific raw events here.
477
| " | *XXXX* | Config must be a raw event config value, see <sup>[2](#fn2)</sup>
478
| | | |
Micha Mueller's avatar
Micha Mueller committed
479
| PERF_TYPE_BREAKPOINT | --- | config not required, any values will be ignored. However config must still be specified (even if empty)
480
|<Custom>|<Custom>| dynamic PMU event, see <sup>[3](#fn3)</sup>
Micha Mueller's avatar
Micha Mueller committed
481

482
#### Footnotes <a name="perfFootnotes"></a>
Micha Mueller's avatar
Micha Mueller committed
483
484
485

Taken from the [perf_event_open man-page](http://man7.org/linux/man-pages/man2/perf_event_open.2.html):

486
487
488
<a name="fn1">**1**</a>: &ensp; The pid and cpu arguments allow specifying which process and CPU to monitor:  
[...]  
pid == -1 and cpu >= 0  
489
This measures all processes/threads on the specified CPU. This requires CAP_SYS_ADMIN capability or a /proc/sys/kernel/perf_event_paranoid value of less than 1.
490
491
492
493

[...]

The perf_event_paranoid file can be set to restrict access to the performance counters.
Micha Mueller's avatar
Micha Mueller committed
494
495
496
497
498
499
500

| Value | Restriction |
|:-----:|:----------- |
| 2 | allow only user-space measurements (default since Linux 4.6) |
| 1 | allow both kernel and user measurements (default before Linux 4.6) |
| 0 | allow access to CPU-specific data but not raw trace-point samples |
| -1 | no restrictions |
Micha Mueller's avatar
Micha Mueller committed
501
502
503
	
The existence of the perf_event_paranoid file is the official method for determining if a kernel supports perf_event_open()

Micha Mueller's avatar
Micha Mueller committed
504
<a name="fn2">**2**</a>: &ensp; If type is *PERF_TYPE_RAW*, then a custom "raw" config value is needed. Most CPUs support events that are not covered by the "generalized" events. These are implementation defined; see your CPU manual (for example the Intel Volume 3B documentation or the AMD BIOS and Kernel Developer Guide). The libpfm4 library can be used to translate from the name in the architectural manual to the raw hex value perf_event_open() expects in this field.
Micha Mueller's avatar
Micha Mueller committed
505

506
<a name="fn3">**3**</a>: &ensp; Custom type and Config values can be specified to use the PMU of a specific device. The necessary configuration parameters can be obtained from the type and config files the respective in /sys/devices/<device> tree.
507

508
## snmp <a name="snmp"></a>
Micha Mueller's avatar
Micha Mueller committed
509

Micha Mueller's avatar
Micha Mueller committed
510
The SNMP plugin enables dcdbpusher to talk with devices which have an SNMP agent running and query requests from them. A SNMP sensor corresponds to a single value as identified by the unique OID. Sensors are aggregated by connections. See the exemplary snmp.conf file in the `config/` directory.
511
> NOTE &ensp;&ensp;&ensp; In the SNMP context the word privacy is used synonymously for encryption.
Micha Mueller's avatar
Micha Mueller committed
512

Micha Mueller's avatar
Micha Mueller committed
513
514
Explanation of the values specific for the SNMP plugin:

Micha Mueller's avatar
Micha Mueller committed
515
516
517
518
519
520
521
| Value | Explanation |
|:----- |:----------- |
| connection | An aggregating connection
| Type | Type of the SNMP application which runs on the device queried by the connection. Currently only the type Agent is supported.
| Host | Host name of the device which is to be queried.
| Port | The SNMP port should be usually 161. No changes should be required here.
| OIDPrefix | This OIDPrefix is used for all following sensors.
Micha Mueller's avatar
Micha Mueller committed
522
523
524
525
526
527
528
529
530
| |
| Version | Which SNMP version to use (either 2 (maps to 2c) or 3).
| Community | Which SNMP community to use (required only if version 2 is used).
| Username | Username to authenticate with (only required for version 3).
| SecLevel | The security level to be used (only required for version 3). Can be either `noAuthNoPriv` for no authentication and privacy ("privacy" is SNMPs synonym for encryption), `authNoPriv` for only authentication and `authPriv` for authentication and privacy.
| AuthProto | Which protocol to use for authentication (only required for version 3 and if SecLevel != noAuthNoPriv). Can be MD5 or SHA1.
| AuthKey | The passphrase for authentication (only required for version 3 and if SecLevel != noAuthNoPriv). Must be at least 8 characters long.
| PrivProto | Which protocol to use for privacy (only required for version 3 and if SecLevel = AuthPriv). Can be DES or AES.
| PrivKey | The passphrase for privacy encryption (only required for version 3 and if SecLevel = AuthPriv). Must be at least 8 characters long.
Micha Mueller's avatar
Micha Mueller committed
531
| mqttPart | Connection specific MQTT-part which is appended to the MQTT-prefix and succeded by the sensor specific suffix.
Micha Mueller's avatar
Micha Mueller committed
532
| |
Micha Mueller's avatar
Micha Mueller committed
533
| OID | OID suffix which together with the OIDPrefix forms the unique OID identifying a value to query.
Micha Mueller's avatar
Micha Mueller committed
534
| passphrase | has to be at least 8 characters long
Micha Mueller's avatar
Micha Mueller committed
535

536
## sysFS <a name="sysfs"></a>
Micha Mueller's avatar
Micha Mueller committed
537
538

SysFS sensors read data from sysFS files. The configuration file of the plugin corresponds to the generic plugin configuration with standalone sensors. Additionally for a sysFS sensor the following parameters are mandatory/possible:
539

Micha Mueller's avatar
Micha Mueller committed
540
541
Explanation of the values specific for the sysFS plugin:

Micha Mueller's avatar
Micha Mueller committed
542
543
544
| Value | Explanation |
|:----- |:----------- |
| path | Path to the sysFS file the sensor should read from. This parameter is mandatory.
545
| filter | One can define an optional filter if the sysFS file consists of more than only the sensor value. Please note the following points for filters: <br> 1.  The filter supports substitutions. For substitution sed syntax ("s/.../.../") is used. Therefore extended regular expressions (ERE) are used as regex-syntax. ERE is closest to Basic RE (BRE), which is actually used by sed, but requires less escaping. <br> 2.  If a \ ("backslash") is needed in the regex (for escaping), always use \\ ("double backslash") as the regex is read in as string and strings also escape with backslash <br> 3.  Whitespaces are actually used as value separators in the config files. If your filter requires whitespaces either use [[:space:]] in the regex or put it in quotation marks ("") <br> 4.  To be able to reference parts of the match (for substitution) use groups. Groups are created with parentheses. <br>  5.  If using character classes like [[:digit:]] always make sure to use double brackets ("[[" and "]]") or they will not be recognized. <br>  See [ERE-syntax](https://www.gnu.org/software/sed/manual/html_node/ERE-syntax.html#ERE-syntax) <br>  See [substitution syntax](http://www.boost.org/doc/libs/1_65_1/libs/regex/doc/html/boost_regex/format/sed_format.html)
546

547
## PDU <a name="pdu"></a>
548
549
550

The Power Delivery Unit (PDU) plugin is in charge of sending a network-request to the PDUs and gathering specified sensor data from the XML-file response.

Micha Mueller's avatar
Micha Mueller committed
551
Explanation of the values specific for the PDU plugin:
Micha Mueller's avatar
Micha Mueller committed
552

553
| Value | Explanation |
554
|:----- |:----------- |
555
| host | Hostname and (optional) port where to fetch the XML-file with sensor data from. If no port is specified, 443 is used. The plugin requests the file via HTTPS.
556
| TTL | To avoid requesting a current XML-file every time a sensor wants to read his value, one can define a time to live (TTL) for the file here. A new XML-file is requested at the earliest if the TTL has expired. Default value is 1000[ms].
Micha Mueller's avatar
Micha Mueller committed
557
| request | Define the request to be sent to the host via HTTPS as a string. One should put the request in quotation marks (' " ') to enable the use of whitespaces within the request. Special characters (like usage of ' " ' within the request) should be escaped (' " ' --> ' \" '; ' \ ' --> ' \\\\ '; newline --> ' \n '; ...).
558
| path | Define a dot-separated path to the value to be read in the XML file. One can specify attribute values a node has to fulfil in brackets after the node. Even multiple (comma-separated) attributes can be given, however no whitespaces should be used (!) as they will not be filtered and could therefore be treat as part of the attributes name.
559

560
## BACnet <a name="bacnet"></a>
561
562

The BACnet plugin enables dcdbpusher to communicate and request data from devices which communicate via the BACnet protocol. A so called "read property" request is sent by the plugin to the BACnet devices as configured in the config file. The response value is then stored in the database. Usually one is only interested in collecting the current reading of a BACnet device (property PROP_PRESENT_VALUE, ID 85). However, also reading of other properties is supported.
563
> NOTE &ensp;&ensp;&ensp; On startup BACnet plugin does no device discovery. Instead it relies on the user providing a file with addresses of all required BACnet devices. One can generate such an address-file for example by using the `bacwi` demo tool provided by the BACnet-Stack.
564
565
566
567
568

Explanation of the values specific for the BACnet plugin:

| Value | Explanation |
|:----- |:----------- |
569
| address_cache | (Path to and) filename of the address cache file where the addresses of BACnet devices are stored (as noted above).
Micha Mueller's avatar
Micha Mueller committed
570
| interface | Network interface (IPv4) which is to be used by the plugin to send its "Read Property" requests.
571
| port | Port to use on the interface
572
| timeout |	Value of µ-seconds to wait for a response packet.
573
574
575
| apdu_timeout | Value of µ-seconds before sending a request times out.
| apdu_retries | How often should sending a request be retried.
| templates | One can define template properties in this section for convenience.
576
| factor | Described in the section for the [IPMI-plugin](#ipmi).
577
578
579
580
581
582
| devices | Starts the part in the config file where the actual BACnet devices are configured. A BACnet device consists of multiple nested parts: device > objects > properties.
| instance (device) | Instance of the BACnet-device.
| type | Type of the object within the device.
| instance (object) | Instance of the object within the device.
| id | ID of the property to be read from the BACnet device-object. Assignment of numbers to properties is done according to the enum as defined in `bacenum.h`.

583
## Opa (Intel Omni-Path Architecture) <a name="opa"></a>
584
585
586
587
588
589
590
591
592
593
594

The Opa plugin enables dcdbpusher to query various counters from omni-path interconnects.

Explanation of the values specific for the Opa plugin:

| Value | Explanation |
|:----- |:----------- |
| hfiNum | Number of which omni-path Host Fabric Interface to query (starting with 1)
| portNum | Number of which omni-path port to query (starting with 1)
| cntData | Name which data counter to query. A list of possible values can be found below.

595

Micha Mueller's avatar
Micha Mueller committed
596
> NOTE &ensp;&ensp;&ensp; As opa counters are usually always monotonic, the delta attribute is by default set to true for all sensors. One has to explicitly set delta to "off" for a sensor to overwrite this behaviour.
597

598
### counterData <a name="opaCounterData"></a>
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628

Possible values for cntData:
* portXmitData
* portRcvData
* portXmitPkts
* portRcvPkts
* portMulticastXmitPkts
* portMulticastRcvPkts
* localLinkIntegrityErrors
* fmConfigErrors
* portRcvErrors
* excessiveBufferOverruns
* portRcvConstraintErrors
* portRcvSwitchRelayErrors
* portXmitDiscards
* portXmitConstraintErrors
* portRcvRemotePhysicalErrors
* swPortCongestion
* portXmitWait
* portRcvFECN
* portRcvBECN
* portXmitTimeCong
* portXmitWastedBW
* portXmitWaitData
* portRcvBubble
* portMarkFECN
* linkErrorRecovery
* linkDowned
* uncorrectableErrors

Alessio Netti's avatar
Alessio Netti committed
629
630
631
632
633
634
635
## ProcFS (/proc filesystem) <a name="procfs"></a>

The ProcFS plugin enables dcdbpusher to sample resource usage metrics from a variety of files in the /proc virtual filesystem generated by the Linux kernel. Each defined sensor group is assigned to a specific file, which is periodically parsed. Currently supported files for sampling are:
* /proc/vmstat: contains virtual memory-related usage metrics;
* /proc/meminfo: contains RAM memory-related usage metrics (note that some of the metrics overlap with /proc/vmstat);
* /proc/stat: contains CPU usage-related metrics, both at system and core levels.

Alessio Netti's avatar
Alessio Netti committed
636
637
638
Note that the ProcFS plugin can operate in two distinct modes, with respect to MQTT topics:
* Automatic: if no sensors are specified, all metrics discovered in the underlying parsed file are acquired; sensors and MQTT topics are generated for them. Please be careful when configuring the plugin so that its MQTT topics do not overlap with those of other plugins.
* Manual: If at least one sensor is specified, only the corresponding metrics are acquired, and all other metrics in the parsed file are discarded. MQTT topics are assigned accordingly, using the mqttPrefix, mqttPart and mqttSuffix fields.
Alessio Netti's avatar
Alessio Netti committed
639
640
641
642
643

Explanation of the values specific for the ProcFS plugin:

| Value | Explanation |
|:----- |:----------- |
Alessio Netti's avatar
Alessio Netti committed
644
| type | The type of the file parsed by the sensor group. Can be either "vmstat", "meminfo", "procstat" or "sar"
Alessio Netti's avatar
Alessio Netti committed
645
| path | Path of the file, if different from the default path in the /proc filesystem
Alessio Netti's avatar
Alessio Netti committed
646
| cpus | Defines the set of CPU cores for which metrics must be collected. Only affects extraction of core-specific metrics (e.g. those in /proc/stat), whereas system-level metrics are acquired regardless of this setting. If no CPU cores set is defined, metrics for all available CPU cores will be collected. This parameter follows the same syntax as in the Perf-event plugin.
647
648
| htVal | Specify a multiplier for CPU aggregation. All CPUs where (CPU-number % htVal) has the same result are aggregated together. If specified, only CPUs which are included in the "cpus" field are aggregated. See Perf-event plugin for more details.
| scalingFactor | A scaling factor to be applied to ratio-like metrics. Default is 1000000.
649
| mqttSuffix | the mqttSuffix field in the ProcFS plugin, for sensors that are CPU-related such as the ones in procstat files, behaves as described for the perf-event plugin.
Alessio Netti's avatar
Alessio Netti committed
650
651
652
653
654
655
656
657
658

Additionally, sensors in the ProcFS plugin (defined with the "metric" keyword) support the following additional values:

| Value | Explanation |
|:----- |:----------- |
| type | The type of the specific metric associated to the sensor. This field must match the exact name of a metric in the underlying parsed file. If such a match does not exist, the sensor is discarded.
| perCpu | Boolean. If set to "on", the metric will be collected for each CPU core specified with the "cpus" sensor group parameter, or for all CPU cores if none is specified. Otherwise, the metric will be collected only at system level. This parameter has no effect on metrics that are not acquired at CPU core level (e.g. those in /proc/vmstat and /proc/meminfo).

The "type" field can be inferred for each sensor by simply checking the underlying file parsed by the sensor group. For /proc/stat files, on the other hand, CPU core-related metrics are collected in separate columns, which adopt the following naming scheme that can be used to define sensors: 
Alessio Netti's avatar
Alessio Netti committed
659
660
661
662
663
664
665
666
667
668
* col_user 
* col_nice 
* col_system 
* col_idle 
* col_iowait
* col_irq
* col_softirq
* col_steal
* col_guest
* col_guest_nice
Alessio Netti's avatar
Alessio Netti committed
669
670

Additional CPU-related metrics (that may be introduced in future versions of the Linux kernel) are not supported by the DCDB ProcFS plugin.
Alessio Netti's avatar
Alessio Netti committed
671
Note that for /proc/meminfo instances, an additional synthetic sensor of type "MemUsed" can be defined. This sensor will automatically extract the amount of used memory from the MemTotal and MemFree values present in meminfo files.
Alessio Netti's avatar
Alessio Netti committed
672

673
674
675
676
677
678
679
680
681
682
## Caliper <a name="caliper"></a>

The Caliper plugin collects application sample data and therefore allows for application performance analysis in retrospect. The plugin receives program counter (PC) values at periodic time intervals from the [Caliper](https://github.com/LLNL/Caliper) framework and tries to resolve the PC to a symbol name (aka function name) during runtime. Currently, this plugin is intended to get insight into usage of provided system libraries used by applications.
This plugin is special as it does not work on its own but also requires a corresponding Caliper framework service running on application side. Please see Caliper's [official documentation](https://software.llnl.gov/Caliper/) for an exhaustive introduction.

### Caliper framework side
Caliper is an application introspection system. Its functionality stems from so called services. To work with the Pusher plugin the custom Dcdbpusher service for Caliper is required as well as the stock pthread and sampler service. Further on, a patched version of the stock timestamp Caliper service is required for nanosecond precision.
Caliper has to be integrated into the application. This can be done either manually from the application developer or more automated by the system administrator by "hijacking" applications, e.g. overwriting main methods before execution. For the dcdbpusher service it is sufficient to use the Caliper framework just once, i.e. initialize it somewhere. However, one can still use the full functionality of Caliper services at own will in parallel.
The dcdbpusher service retrieves all symbol (function name) data from the application and associated libraries and stores it in a file shared with the Pusher plugin. The service processes snapshots from the sampler on a per-thread basis. It retrieves all required data (program counter, cpu and timestamp) and makes the data accessible for the Pusher plugin via a queue realized in the shared memory file.

Micha Müller's avatar
Pusher:    
Micha Müller committed
683
684
685
686
687
688
The dcdbpusher Caliper service can be controlled by the environment variables listed below:

| Value | Explanation |
|:----- |:----------- |
| CALI_DCDBPUSHER_SUS_CYCLE | Symbol update service (SUS) cycle. In case a symbol could not be resolved by the Pusher plugin it informs the background SUS on Caliper service side to update the symbol data in the shared memory file. Updating the symbol data is a heavy blocking task. To limit overhead and avoid continuous rebuild of symbol data this environment variable can be used to set the cycle interval of the SUS in seconds (e.g. `export CALI_DCDBPUSHER_SUS_CYCLE=x`). The SUS only checks every x seconds if a symbol data update is requested. Increasing this value reduces overhead of repeated symbol data rebuilds but decreases responsiveness if rebuilds are requested seldomly. Default is 15 seconds.

689
690
691
692
693
694
695
696
697
698
699
### Pusher plugin side
The pusher plugin serves as data sink for the snapshot data from the Caliper service. It can handle multiple different applications at once. However, it is mainly intended for only one application with multiple threads/(MPI-)processes. 
The plugin consumes the PC (snapshot) data from shared memory and resolves it to function names via the provided shared symbol data. If a PC value could not be resolved it requests a rebuild of the symbol data index. Every read cycle the plugin consumes all snapshots available in the queue of a process.
From every snapshot the plugin builds a name of the form CPU/BinaryFile::functionName. CPU is the cpu number where the snapshot was captured, BinaryFile the full path of the executable or library the PC resolves to and functionName the symbol within the binary (optional, as functionName cannot be resolved always). For every unique name a new sensor is created. The number of encounters of a sensor name during one read cycle gets stored in the sensor from where it will be pushed to the CollectAgent. Therefore the read cycle interval also determines the granularity of the sampling data. A lower interval results in more fine-grained sampling data resolution but also requires more memory in the storage backend. After an application terminates and "disconnects" the corresponding sensors may get cleared.

Explanation of the values specific for this plugin:

| Value | Explanation |
|:----- |:----------- |
| timeout | Number of read cycles after which an Caliper-application is assumed to be terminated if no new values have been received. Connection (shared memory) is teared down on timeout.

700
## Writing own plugins <a name="writingOwnPlugins"></a>
Micha Müller's avatar
Micha Müller committed
701
702
703
First make sure you read the [plugins](#plugins) section. 

It is recommended to use the `pluginGenerator/generatePlugin.sh` script to kick off plugin development. Running `./generatePlugin.sh -h` gives instructions on how to use the script. On success, the script generates all required source files for a new plugin with instructions on how to continue from there.
704