dcdb issueshttps://gitlab.lrz.de/dcdb/dcdb/-/issues2024-02-09T11:08:03+01:00https://gitlab.lrz.de/dcdb/dcdb/-/issues/23Document whitespace insertion in sensor ID naming in Cassandra DB2024-02-09T11:08:03+01:00Philipp Friesephilipp.friese@tum.deDocument whitespace insertion in sensor ID naming in Cassandra DBDirectly querying DCDB-aggregated data from the Cassandra DB shows unexpected sensor ID name padding.
Given a `dcdbpusher` configured only with the `procfs` plugin, and the stock `procfs` plugin configuration present in [`procfs.conf`](h...Directly querying DCDB-aggregated data from the Cassandra DB shows unexpected sensor ID name padding.
Given a `dcdbpusher` configured only with the `procfs` plugin, and the stock `procfs` plugin configuration present in [`procfs.conf`](https://gitlab.lrz.de/dcdb/dcdb/-/blob/development/dcdbpusher/config/procfs.conf), the following sensor IDs are inserted into the database:
```
cassandra@cqlsh> select sid,ws, count(*) from dcdb.sensordata group by sid,ws;
sid | ws | count
------------------------------+------+-------
/test/ctxt | 2821 | 18
/test/meminfo/anonpages | 2821 | 12
/test/vmstat/nr-file-pages | 2821 | 6
/test/col-user | 2821 | 12
/test/vmstat/nr-dirty-thresh | 2821 | 9
/test/cpu36/col-idle | 2821 | 18
/test/col-idle | 2821 | 18
/test/meminfo/memfree | 2821 | 6
```
Since the `sid` values are right-aligned, e.g. sensor ID `/test/col-idle` contains two trailing white-spaces, i.e. `'/test/col-idle '`. This is deliberate behaviour, as stated in a comment in function [`SensorId::mqttTopicConvert`](https://gitlab.lrz.de/dcdb/dcdb/-/blob/development/lib/src/sensorid.cpp?ref_type=heads#L48-59):
```cpp
> lib/src/sensorid.cpp:L48-59
/* Fill string with trailing whitespace to 128bits so Cassandra's ByteOrder
Partitioner creates proper numerically sorted tokens */
if (data.size() < 16) {
data.append(16 - data.size(), ' ');
}
```
I found this after digging through the code for some time in search of an explanation for the trailing whitespaces. It might be helpful to document this so that other users not familiar with these intricacies are not surprised.https://gitlab.lrz.de/dcdb/dcdb/-/issues/22Infinite loop while parsing option characters2024-01-26T16:55:21+01:00Philipp Friesephilipp.friese@tum.deInfinite loop while parsing option charactersThe DCDB source code assumes `char` to be signed. This causes issues on platforms with `char` being unsigned, such as ARMv8.
This mainly caused issues in `getopt` related routines:
```cpp
% dcdbpusher/dcdbpusher.cpp
int main(int argc, ...The DCDB source code assumes `char` to be signed. This causes issues on platforms with `char` being unsigned, such as ARMv8.
This mainly caused issues in `getopt` related routines:
```cpp
% dcdbpusher/dcdbpusher.cpp
int main(int argc, char **argv) {
[..]
char c;
while ((c = getopt(argc, argv, opts)) != -1) {
[..]
}
[..]
}
```
`getopt` [returns a signed `int`](https://www.man7.org/linux/man-pages/man3/getopt.3.html), which is set to `-1` once the option characters are exhausted. Given the cast to `char` in the code above, on platforms with unsigned `char`s, this leads to an infinite loop as `-1` is interpreted as `255`.
I encountered three cases:
- [`dcdbpusher.cpp`](https://gitlab.lrz.de/dcdb/dcdb/-/blob/development/dcdbpusher/dcdbpusher.cpp?ref_type=heads#L205-206)
- [`collectagent.cpp`](https://gitlab.lrz.de/dcdb/dcdb/-/blob/development/collectagent/collectagent.cpp?ref_type=heads#L180-181)
- [`GrafanaServer.cpp`](https://gitlab.lrz.de/dcdb/dcdb/-/blob/development/grafana/GrafanaServer.cpp?ref_type=heads#L120-121)
These issues appeared on a system with the following parameters:
- OS: OpenSUSE Leap 15.5
- Compiler: GCC-13 (13.2.1)
- Platform: ARMv8 aarch64 (Ampere Altra Max)
- Commit: [`8327f336be5d41a61c6f78df25048341e85af3f3`](https://gitlab.lrz.de/dcdb/dcdb/-/commit/8327f336be5d41a61c6f78df25048341e85af3f3) (development branch, cloned 2024-01-23)
I will create a merge request with a proposed fix - simply adjusting the variable type from `char` to `int`. With this change, I was able to resolve this issue on my end.https://gitlab.lrz.de/dcdb/dcdb/-/issues/21Management of default TTL values2019-12-17T14:08:10+01:00Ghost UserManagement of default TTL valuesAs of now, the values of a sensor always use its own TTL. If this value is defined for the sensor, the data will expire according to it, otherwise a default TTL of 0 is used, and the data never expires. The TTL value defined in the Colle...As of now, the values of a sensor always use its own TTL. If this value is defined for the sensor, the data will expire according to it, otherwise a default TTL of 0 is used, and the data never expires. The TTL value defined in the Collect Agent configurations is used only for sensors which are not published (no entry in the publishedsensors table).
We should evaluate whether it would be convenient or not to use the Collect Agent's TTL value also when a sensor is published, but has no TTL value associated to it, and not only when the sensor is not published at all.https://gitlab.lrz.de/dcdb/dcdb/-/issues/20Introduce separate table for sensor meta data2019-12-17T14:08:56+01:00Ott, MichaelIntroduce separate table for sensor meta dataMove all sensor meta data from the ``publishedsensors`` table to a separate table. The ``publishedsensors`` table would then only be used to really publish sensors (and hide others) and create aliases, whereas the new meta data table wou...Move all sensor meta data from the ``publishedsensors`` table to a separate table. The ``publishedsensors`` table would then only be used to really publish sensors (and hide others) and create aliases, whereas the new meta data table would hold information also about hidden sensors.
Fields in the meta data table:
* SID
* Unit
* Factor
How should the virtual sensor be handled? In a separate table?https://gitlab.lrz.de/dcdb/dcdb/-/issues/19OpenSSL 1.1.1c too recent2019-12-09T10:04:14+01:00Ghost UserOpenSSL 1.1.1c too recentOpenSSL update to version 1.1.1c **may** cause problems with other software.
If `dcdb/install/lib` is in `LD_LIBRARY_PATH` (e.g. by sourcing `dcdb/install/dcdb.bash`), dcdb's bleeding edge OpenSSL library may be used instead of built i...OpenSSL update to version 1.1.1c **may** cause problems with other software.
If `dcdb/install/lib` is in `LD_LIBRARY_PATH` (e.g. by sourcing `dcdb/install/dcdb.bash`), dcdb's bleeding edge OpenSSL library may be used instead of built in OpenSSL. This can cause issues with other software which rely on older/custom OpenSSL versions which still support protocols like SSLv3.
Immediate solution: clear dcdb's install directory from `LD_LIBRARY_PATH` (e.g. by opening a new terminal)
Long-term solution: wait until other software supports more recent OpenSSL libraries.https://gitlab.lrz.de/dcdb/dcdb/-/issues/18DCDBPusher segfault on return from main2019-06-17T14:33:57+02:00Ghost UserDCDBPusher segfault on return from mainDCDBPusher will crash with a segmentation fault after returning from main under certain configurations. A wide variety of factors affects how and when the bug occurs, such as:
- Number of plugins, sensors and sensor groups instantiated;...DCDBPusher will crash with a segmentation fault after returning from main under certain configurations. A wide variety of factors affects how and when the bug occurs, such as:
- Number of plugins, sensors and sensor groups instantiated;
- Compilation and execution environment;
- Minimal, uncorrelated changes to the code.
The bug has been confirmed to occur independently of the REST API, data analytics framework, and the specific pusher plugins being used. A temporary fix has been identified in NOT unloading dynamic libraries when destroying the PluginManager object. However, it is not certain whether this fix only reduced the likelihood of occurrence of the bug, or actually got rid of it.
We suspect the bug might be related to the termination and destruction of objects used by the BOOST ASIO backend (boost::asio::io_service) which might have global scope.https://gitlab.lrz.de/dcdb/dcdb/-/issues/17CI/CD Runners2020-05-02T09:49:09+02:00Ghost UserCI/CD RunnersSet up Runners for dcdb and dcdbpusher CI/CD pipelines.
Pipelines are switched off for the moment until I figured out how to properly set up a runnerSet up Runners for dcdb and dcdbpusher CI/CD pipelines.
Pipelines are switched off for the moment until I figured out how to properly set up a runnerhttps://gitlab.lrz.de/dcdb/dcdb/-/issues/16Automatic tests (Wishlist)2018-12-21T11:19:18+01:00Ghost UserAutomatic tests (Wishlist)Would be nice to have some sort of automatic build checks and test cases to detect errors (also for dcdbpusher). Perhaps we could set up GitLabs CI/CD for this.Would be nice to have some sort of automatic build checks and test cases to detect errors (also for dcdbpusher). Perhaps we could set up GitLabs CI/CD for this.https://gitlab.lrz.de/dcdb/dcdb/-/issues/15Collectagent scalability issues2018-12-19T11:30:33+01:00Ghost UserCollectagent scalability issuesCollectagent does not work properly when a relatively high incoming message rate is reached.
On a Macbook Pro 2016 (2.9Ghz Intel i5), issues can be consistently reproduced with the following series of actions:
1) Launch dcdbpusher wit...Collectagent does not work properly when a relatively high incoming message rate is reached.
On a Macbook Pro 2016 (2.9Ghz Intel i5), issues can be consistently reproduced with the following series of actions:
1) Launch dcdbpusher with full ProcFS instantiation and 1000ms sampling period (circa 2600 sensors/sec)
2) Launch second dcdbpusher with same configuration as No.1, but with a sampling period of 100ms (circa 26000 sensors/sec)
3) Kill second dcdbpusher after some time
Collectagent will work fine after step 1, with all sensors being pushed correctly to Cassandra and a message rate of circa 900/s, reflecting the number of metrics. After step 2, the message rate will slightly decrease, meaning collectagent cannot keep up with the number of incoming messages, and data in Cassandra will be added at an extremely low rate. After step 3, the message rate will sharply increase once again, and after a few seconds the collectagent is able to flush all incoming messages from dcdbpusher 1, whose recent data is once again reflected within Cassandra.
This problem is very likely scalability-related. Average processing time for a message with 10 sensor readings in collectagent is between 2 and 3 ms, which is too high. Issue is dramatically worsened when MQTT QoS features are enabled (level 1 or 2), leading to messages constantly timing out, being resent by dcdbpusher, and piling up indefinitely.
It should also be noted that all dcdbpusher connections are handled by one thread - sequentially - until the maximum number of connection slots (16) is reached. Setting a maximum number of connection slots of 1 does not lead to issues in the scenario presented above.
Hypothetical solutions:
* Switch to Cassandra async operations (likely main culprit)
* Aggregate multiple Cassandra insert operations in batches
* Avoid heap allocations when processing messages
* Pre-allocate cache vectors to avoid memory reallocationshttps://gitlab.lrz.de/dcdb/dcdb/-/issues/14dcdbquery not compiling2018-12-19T16:29:57+01:00Ghost Userdcdbquery not compilingdcdb (more precise: dcdbquery) is currently not compiling for me (using 'make all' on master-branch):
Multiple errors of:
``query.cpp:108:31: error: call of overloaded ‘TimeStamp(long long unsigned int)’ is ambiguous
DCDB::TimeStam...dcdb (more precise: dcdbquery) is currently not compiling for me (using 'make all' on master-branch):
Multiple errors of:
``query.cpp:108:31: error: call of overloaded ‘TimeStamp(long long unsigned int)’ is ambiguous
DCDB::TimeStamp prevT(0llu);``
Changing `0llu` to `static_cast<uint_64t>(0)` solves the compile errors but reveals linker errors, e.g.:
``/LRZ/dcdb/dcdb/tools/dcdbquery/query.cpp:142: undefined reference to `DCDB::delta(unsigned long, unsigned long, long*)'`` (abbreviated)https://gitlab.lrz.de/dcdb/dcdb/-/issues/13Minor refinements2019-03-13T08:35:15+01:00Ghost UserMinor refinementsSome minor refinements should be eventually made for collectagent and dcdbpusher to be consistent with each other:
* Option for defining cache interval is -c in dcdbpusher, -C in collectagent. It should be made uniform for both programs...Some minor refinements should be eventually made for collectagent and dcdbpusher to be consistent with each other:
* Option for defining cache interval is -c in dcdbpusher, -C in collectagent. It should be made uniform for both programs (-c in collectagent is used for the cassandra host);
* Make the syntax of configuration files uniform (e.g. True for boolean values is in some cases "on", in others "true" -> use only one value);
* Add logging to the collectagent using the BOOST facility like in dcdbpusher;
* Rename the global.conf configuration file for dcdbpusher to dcdbpusher.conf or pusher.conf;
* Add display of default values for command line options in dcdbpusher, like the collectagent does.https://gitlab.lrz.de/dcdb/dcdb/-/issues/12Manage sensor hierarchies in Grafana2019-12-17T14:09:24+01:00Daniele TafaniManage sensor hierarchies in GrafanaInvestigate the use of Grafana variables to define placeholders for sensor hierarchies (e.g., system, rack, chassis,...).Investigate the use of Grafana variables to define placeholders for sensor hierarchies (e.g., system, rack, chassis,...).WebUIhttps://gitlab.lrz.de/dcdb/dcdb/-/issues/11Fix collectagent anomalous behaviour when pushers terminate abruptly2018-11-16T15:55:36+01:00Ghost UserFix collectagent anomalous behaviour when pushers terminate abruptlyWhen a DCDB pusher terminates abruptly (without flushing the queues and properly closing its connections) the collect agent may enter a "frozen" state (likely caused by corruption in the input MQTT message queue) in which no further MQTT...When a DCDB pusher terminates abruptly (without flushing the queues and properly closing its connections) the collect agent may enter a "frozen" state (likely caused by corruption in the input MQTT message queue) in which no further MQTT messages are processed and pushed to the Cassandra database, even if new pushers connect. To restore normal functionality, the collect agent must be restarted.
MQTT messages are correctly sent by the pusher to the collect agent, but they are not correctly parsed and are therefore discarded (the collect agent will often show "message malformed" or "wrong mqtt topic format" messages).
The appearance of this anomalous behaviour is not deterministic, but depends on the amount and frequency of pushed sensors, resulting more frequent (roughly 40% probability at each pusher termination) when 2000+ sensors are pushed at each second. This bug mainly occurs when pushers are terminated without cleanup, but it has been observed once even in case of proper termination (under the SIGINT signal).https://gitlab.lrz.de/dcdb/dcdb/-/issues/10DCDB installation with rpm2018-04-26T14:28:05+02:00Daniele TafaniDCDB installation with rpmhttps://gitlab.lrz.de/dcdb/dcdb/-/issues/9Manage overflow for monotonic sensors2018-08-03T11:13:25+02:00Daniele TafaniManage overflow for monotonic sensorsImplement a mechanism to detect overflow for monotonic sensors (e.g., energy consumption) so that sensor operations would not be affected negatively (e.g., deltas). Basically substitute the field "integrable" with a nice field "flag" whi...Implement a mechanism to detect overflow for monotonic sensors (e.g., energy consumption) so that sensor operations would not be affected negatively (e.g., deltas). Basically substitute the field "integrable" with a nice field "flag" which will hold a 64 bit mask for different potential flags (e.g., integrable, monotonic, etc). This flag should be configurable via CLI (e.g., -flag monotonic,integrable,...). https://gitlab.lrz.de/dcdb/dcdb/-/issues/8Move standard command line arguments in dcdblib2018-04-25T13:36:33+02:00Daniele TafaniMove standard command line arguments in dcdblibStandard command line args (like -h hostname, -u user, -p pass) should be moved to dcdblib and only dedicated ones should remain within respective tools (e.g., sensor names, time ranges).Standard command line args (like -h hostname, -u user, -p pass) should be moved to dcdblib and only dedicated ones should remain within respective tools (e.g., sensor names, time ranges).https://gitlab.lrz.de/dcdb/dcdb/-/issues/7Design a DCDB Data Source plugin for Grafana2018-11-09T16:20:07+01:00Daniele TafaniDesign a DCDB Data Source plugin for GrafanaWe could start by looking at the generic JSON data source plugin here: https://github.com/grafana/simple-json-datasource
Or just by leveraging this: https://github.com/pjta/grafanaWe could start by looking at the generic JSON data source plugin here: https://github.com/grafana/simple-json-datasource
Or just by leveraging this: https://github.com/pjta/grafanaWebUIhttps://gitlab.lrz.de/dcdb/dcdb/-/issues/6Get rid of the week in the timestamps2018-05-17T11:16:46+02:00Daniele TafaniGet rid of the week in the timestampsWe should use composite partitioned keys in Cassandra to get rid of the week in the timestamps, allowing for a nice 1-to-1 match between sensor names and sids.We should use composite partitioned keys in Cassandra to get rid of the week in the timestamps, allowing for a nice 1-to-1 match between sensor names and sids.https://gitlab.lrz.de/dcdb/dcdb/-/issues/5Switch to Apache Cassandra 32019-05-22T13:22:39+02:00Daniele TafaniSwitch to Apache Cassandra 3Time to do it I guess...Time to do it I guess...https://gitlab.lrz.de/dcdb/dcdb/-/issues/4Use topics in dcdbquery to query sensors2019-06-17T14:33:25+02:00Ott, MichaelUse topics in dcdbquery to query sensorsAllow topics to be used in dcdbquery instead of published sensor namesAllow topics to be used in dcdbquery instead of published sensor names