Document whitespace insertion in sensor ID naming in Cassandra DB
Directly querying DCDB-aggregated data from the Cassandra DB shows unexpected sensor ID name padding.
Given a dcdbpusher
configured only with the procfs
plugin, and the stock procfs
plugin configuration present in procfs.conf
, the following sensor IDs are inserted into the database:
cassandra@cqlsh> select sid,ws, count(*) from dcdb.sensordata group by sid,ws;
sid | ws | count
------------------------------+------+-------
/test/ctxt | 2821 | 18
/test/meminfo/anonpages | 2821 | 12
/test/vmstat/nr-file-pages | 2821 | 6
/test/col-user | 2821 | 12
/test/vmstat/nr-dirty-thresh | 2821 | 9
/test/cpu36/col-idle | 2821 | 18
/test/col-idle | 2821 | 18
/test/meminfo/memfree | 2821 | 6
Since the sid
values are right-aligned, e.g. sensor ID /test/col-idle
contains two trailing white-spaces, i.e. '/test/col-idle '
. This is deliberate behaviour, as stated in a comment in function SensorId::mqttTopicConvert
:
> lib/src/sensorid.cpp:L48-59
/* Fill string with trailing whitespace to 128bits so Cassandra's ByteOrder
Partitioner creates proper numerically sorted tokens */
if (data.size() < 16) {
data.append(16 - data.size(), ' ');
}
I found this after digging through the code for some time in search of an explanation for the trailing whitespaces. It might be helpful to document this so that other users not familiar with these intricacies are not surprised.