DCDB (DataCenter DataBase) is a database to collect various (sensor-)values of a datacenter for further analysis.
...
...
@@ -418,6 +419,7 @@ Explanation of the values specific for the IPMI plugin:
#### Footnotes <a name="ipmiFootnotes"></a>
<aname="ipmifn1">**1**</a>:   Use lsb > msb values if response is Little-endian (LSB first), use lsb < msb values if response is Big-Endian (MSB first). Maximum length is 8 bytes.
## Perf-event <a name="perf"></a>
The Perfevent functionality is tasked with collecting data from the CPUs various performance counters (PMUs).
...
...
@@ -501,6 +503,7 @@ The existence of the perf_event_paranoid file is the official method for determi
<aname="fn2">**2**</a>:   If type is *PERF_TYPE_RAW*, then a custom "raw" config value is needed. Most CPUs support events that are not covered by the "generalized" events. These are implementation defined; see your CPU manual (for example the Intel Volume 3B documentation or the AMD BIOS and Kernel Developer Guide). The libpfm4 library can be used to translate from the name in the architectural manual to the raw hex value perf_event_open() expects in this field.
<aname="fn3">**3**</a>:   Custom type and Config values can be specified to use the PMU of a specific device. The necessary configuration parameters can be obtained from the type and config files the respective in /sys/devices/<device> tree.
## snmp <a name="snmp"></a>
The SNMP plugin enables dcdbpusher to talk with devices which have an SNMP agent running and query requests from them. A SNMP sensor corresponds to a single value as identified by the unique OID. Sensors are aggregated by connections. See the exemplary snmp.conf file in the `config/` directory.
...
...
@@ -664,6 +667,27 @@ The "type" field can be inferred for each sensor by simply checking the underlyi
Additional CPU-related metrics (that may be introduced in future versions of the Linux kernel) are not supported by the DCDB ProcFS plugin.
Note that for /proc/meminfo instances, an additional synthetic sensor of type "MemUsed" can be defined. This sensor will automatically extract the amount of used memory from the MemTotal and MemFree values present in meminfo files.
## Caliper <a name="caliper"></a>
The Caliper plugin collects application sample data and therefore allows for application performance analysis in retrospect. The plugin receives program counter (PC) values at periodic time intervals from the [Caliper](https://github.com/LLNL/Caliper) framework and tries to resolve the PC to a symbol name (aka function name) during runtime. Currently, this plugin is intended to get insight into usage of provided system libraries used by applications.
This plugin is special as it does not work on its own but also requires a corresponding Caliper framework service running on application side. Please see Caliper's [official documentation](https://software.llnl.gov/Caliper/) for an exhaustive introduction.
### Caliper framework side
Caliper is an application introspection system. Its functionality stems from so called services. To work with the Pusher plugin the custom Dcdbpusher service for Caliper is required as well as the stock pthread and sampler service. Further on, a patched version of the stock timestamp Caliper service is required for nanosecond precision.
Caliper has to be integrated into the application. This can be done either manually from the application developer or more automated by the system administrator by "hijacking" applications, e.g. overwriting main methods before execution. For the dcdbpusher service it is sufficient to use the Caliper framework just once, i.e. initialize it somewhere. However, one can still use the full functionality of Caliper services at own will in parallel.
The dcdbpusher service retrieves all symbol (function name) data from the application and associated libraries and stores it in a file shared with the Pusher plugin. The service processes snapshots from the sampler on a per-thread basis. It retrieves all required data (program counter, cpu and timestamp) and makes the data accessible for the Pusher plugin via a queue realized in the shared memory file.
### Pusher plugin side
The pusher plugin serves as data sink for the snapshot data from the Caliper service. It can handle multiple different applications at once. However, it is mainly intended for only one application with multiple threads/(MPI-)processes.
The plugin consumes the PC (snapshot) data from shared memory and resolves it to function names via the provided shared symbol data. If a PC value could not be resolved it requests a rebuild of the symbol data index. Every read cycle the plugin consumes all snapshots available in the queue of a process.
From every snapshot the plugin builds a name of the form CPU/BinaryFile::functionName. CPU is the cpu number where the snapshot was captured, BinaryFile the full path of the executable or library the PC resolves to and functionName the symbol within the binary (optional, as functionName cannot be resolved always). For every unique name a new sensor is created. The number of encounters of a sensor name during one read cycle gets stored in the sensor from where it will be pushed to the CollectAgent. Therefore the read cycle interval also determines the granularity of the sampling data. A lower interval results in more fine-grained sampling data resolution but also requires more memory in the storage backend. After an application terminates and "disconnects" the corresponding sensors may get cleared.
Explanation of the values specific for this plugin:
| Value | Explanation |
|:----- |:----------- |
| timeout | Number of read cycles after which an Caliper-application is assumed to be terminated if no new values have been received. Connection (shared memory) is teared down on timeout.
## Writing own plugins <a name="writingOwnPlugins"></a>
First make sure you read the [plugins](#plugins) section.