In January 2021 we will introduce a 10 GB quota for project repositories. Higher limits for individual projects will be available on request. Please see https://doku.lrz.de/display/PUBLIC/GitLab for more information.

Commit 0f2b3e30 authored by Micha Müller's avatar Micha Müller

Pusher Readme: Update Caliper plugin section

parent 15bd792a
......@@ -675,32 +675,43 @@ Note that for /proc/meminfo instances, an additional synthetic sensor of type "M
## Caliper <a name="caliper"></a>
The Caliper plugin collects application sample data and therefore allows for application performance analysis in retrospect. The plugin receives program counter (PC) values at periodic time intervals from the [Caliper](https://github.com/LLNL/Caliper) framework and tries to resolve the PC to a symbol name (aka function name) during runtime. Currently, this plugin is intended to get insight into usage of provided system libraries used by applications.
This plugin is special as it does not work on its own but also requires a corresponding Caliper framework service running on application side. Please see Caliper's [official documentation](https://software.llnl.gov/Caliper/) for an exhaustive introduction.
The Caliper plugin collects application introspection data and therefore allows for application performance analysis in retrospect. This plugin is special as it does not work on its own but also requires a corresponding Caliper framework service running on application side. Please see Caliper's [official documentation](https://software.llnl.gov/Caliper/) for an exhaustive introduction.
The Caliper plugin supports two use cases:
* **Sampling** Low overhead automatic sampling of program counter (PC) values. Allows to analyze how much time was spent in a function in retrospect. Is the default case and enabled at all times.
* **Instrumentation** The user can instrument its application with Caliper annotations. The event data generated by the annotations is picked up by Pusher in addition to the sampling data and stored within DCDB. The annotation data can then be correlated with other monitoring data and allows for more fine-grained introspection than the sampling approach. On the downside, this usually induces more overhead.
### Caliper framework side
Caliper is an application introspection system. Its functionality stems from so called services. To work with the Pusher plugin the custom Dcdbpusher service for Caliper is required as well as the stock pthread and sampler service. Further on, a patched version of the stock timestamp Caliper service is required for nanosecond precision.
Caliper has to be integrated into the application. This can be done either manually from the application developer or more automated by the system administrator by "hijacking" applications, e.g. overwriting main methods before execution. For the dcdbpusher service it is sufficient to use the Caliper framework just once, i.e. initialize it somewhere. However, one can still use the full functionality of Caliper services at own will in parallel.
The dcdbpusher service retrieves all symbol (function name) data from the application and associated libraries and stores it in a file shared with the Pusher plugin. The service processes snapshots from the sampler on a per-thread basis. It retrieves all required data (program counter, cpu and timestamp) and makes the data accessible for the Pusher plugin via a queue realized in the shared memory file.
Caliper is an application introspection system. Its functionality stems from so called services. To work with the Pusher plugin the custom Dcdbpusher service for Caliper is required as well as the stock timestamp, pthread, and sampler service. For instrumentation, the event service is required as well.
Caliper has to be integrated into the application. This can be done either manually from the application developer or more automated by the system administrator by "hijacking" applications, e.g. overwriting main methods before execution. For the sampling case it is sufficient to use the Caliper framework just once, i.e. initialize it somewhere. However, one can still use the full functionality of Caliper services at own will in parallel.
The dcdbpusher service retrieves all relevant data from snapshots (Timestamp, CPU, PC (sampling), annotation data (instrumentation)). In case of sampling, the PC value is resolved to the actual function name via the binary's symbol data. Retrieved data is temporarily stored in a thread-local buffer. Eventually it gets written to a shared-memory queue which is used to communicate with the Pusher plugin.
The dcdbpusher Caliper service can be controlled by the environment variables listed below:
The Caliper services can be controlled by the environment variables listed below:
| Value | Explanation |
|:----- |:----------- |
| CALI_DCDBPUSHER_SUS_CYCLE | Symbol update service (SUS) cycle. In case a symbol could not be resolved by the Pusher plugin it informs the background SUS on Caliper service side to update the symbol data in the shared memory file. Updating the symbol data is a heavy blocking task. To limit overhead and avoid continuous rebuild of symbol data this environment variable can be used to set the cycle interval of the SUS in seconds (e.g. `export CALI_DCDBPUSHER_SUS_CYCLE=x`). The SUS only checks every x seconds if a symbol data update is requested. Increasing this value reduces overhead of repeated symbol data rebuilds but decreases responsiveness if rebuilds are requested seldomly. Default is 15 seconds.
| CALI_SERVICES_ENABLE | Specify which Caliper services to enable. Should be at least `event:sampler:timestamp:pthread:dcdbpusher`.
| CALI_SAMPLER_FREQUENCY | Frequency of the sampler service in Hz.
| CALI_TIMER_TIMESTAMP | Must be set to `true` to enrich all snapshots with timestamps of their creation.
| CALI_DCDBPUSHER_SUS_CYCLE | Symbol update service (SUS) cycle. To resolve PC values to function names the symbol data of the binary and loaded libraries is locally buffered in a so called "symbol index". In case a symbol could not be resolved by the dcdbpusher service (e.g. because the PC points to a newly loaded library that has not yet been indexed) it informs the background SUS thread to update the symbol index. Updating the symbol index is a heavy blocking task. To limit overhead and avoid continuous rebuild of the symbol index this environment variable can be used to set the cycle interval of the SUS in seconds (e.g. `export CALI_DCDBPUSHER_SUS_CYCLE=x`). The SUS only checks every x seconds if a symbol data update is requested. Increasing this value reduces overhead of repeated symbol index rebuilds but decreases responsiveness if rebuilds are requested seldomly. Default is 15 seconds.
### Pusher plugin side
The pusher plugin serves as data sink for the snapshot data from the Caliper service. It can handle multiple different applications at once. However, it is mainly intended for only one application with multiple threads/(MPI-)processes.
The plugin consumes the PC (snapshot) data from shared memory and resolves it to function names via the provided shared symbol data. If a PC value could not be resolved it requests a rebuild of the symbol data index. Every read cycle the plugin consumes all snapshots available in the queue of a process.
From every snapshot the plugin builds a name of the form CPU/BinaryFile::functionName. CPU is the cpu number where the snapshot was captured, BinaryFile the full path of the executable or library the PC resolves to and functionName the symbol within the binary (optional, as functionName cannot be resolved always). For every unique name a new sensor is created. The number of encounters of a sensor name during one read cycle gets stored in the sensor from where it will be pushed to the CollectAgent. Therefore the read cycle interval also determines the granularity of the sampling data. A lower interval results in more fine-grained sampling data resolution but also requires more memory in the storage backend. After an application terminates and "disconnects" the corresponding sensors may get cleared.
The plugin consumes the snapshot data from the shared-memory queue. For each unique snapshot data a new sensor is created. Subsequent encounters of the same data (function name or annotation) a reading value of 1 is stored with the sensor.
After an application terminates/timeouts or the maxSensors value is reached all sensors get cleared.
Explanation of the values specific for this plugin:
| Value | Explanation |
|:----- |:----------- |
| interval | In case of sampling, the interval value of a SensorGroup (or SingleSensor) has a small side effect. Within the same read cycle, multiple encounters of the same function name will be aggregated. Instead of a value of one for each encounter, only the aggregated value at the end of the read cycle will be actually stored with the corresponding sensor. Therefore the read cycle interval also determines the granularity of the sampling data. A lower interval results in more fine-grained sampling data resolution but also requires more memory in the storage backend.
| maxSensors | To limit indefinite memory usage by the creation of new Sensor object one can specify a threshold here. If the number of sensors exceeds this value, they will be cleared. Default is 500.
| timeout | Number of read cycles after which an Caliper-application is assumed to be terminated if no new values have been received. Connection (shared memory) is teared down on timeout. Default is 15.
### Shortcomings
Usage of the Caliper plugin is currently obstructed by a few shortcomings:
* The Caliper framework has to be integrated manually by the user into its application for this plugin to work.
* The Caliper framework seems to interfere with Intel libraries, which may cause [application crashes](https://github.com/LLNL/Caliper/issues/223).
## Metadata Management <a name="metadataManagement"></a>
Sensor metadata can be included in Pusher configurations, and will be published to the Storage Backend if the _auto-publish_ feature is enabled. A metadata block looks like the following:
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment