Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in
Toggle navigation
dcdb
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Locked Files
Issues
7
Issues
7
List
Boards
Labels
Service Desk
Milestones
Iterations
Merge Requests
0
Merge Requests
0
Requirements
Requirements
List
Security & Compliance
Security & Compliance
Dependency List
License Compliance
Operations
Operations
Incidents
Analytics
Analytics
Code Review
Insights
Issue
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
dcdb
dcdb
Commits
1b55b4ea
Commit
1b55b4ea
authored
Feb 25, 2020
by
Weronika
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
added a README briefly describing every sensor
parent
899d4ceb
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
96 additions
and
0 deletions
+96
-0
dcdbpusher/sensors/nvml/README
dcdbpusher/sensors/nvml/README
+96
-0
No files found.
dcdbpusher/sensors/nvml/README
0 → 100644
View file @
1b55b4ea
This DCDB plugin uses the NVML libarary to capture the following GPU metrics:
* Power - sensor /test/nvml/power
Uses the nvmlDeviceGetPowerUsage function to retrieve power usage
for this GPU in milliwatts and its associated circuitry (e.g. memory).
* Temperature - sensor /test/nvml/temp
Uses the nvmlDeviceGetTemperature function to retrieve the current
temperature readings for the device, in degrees C.
* Energy - sensor /test/nvml/energy
Uses the nvmlDeviceGetTotalEnergyConsumption function to retrieve total
energy consumption for this GPU in millijoules (mJ) since the driver was
last reloaded.
* Running Compute Processes - sensor /test/nvml/run_prcs
Set up to use the nvmlDeviceGetComputeRunningProcesses function to get
the number of running processes with a compute context (e.g. CUDA
application which have active context) on the device.
* ECC errors - sensor /test/nvml/ecc_errors
Set up to use the nvmlDeviceGetTotalEccErrors function to retrieve the
NVML_MEMORY_ERROR_TYPE_CORRECTED type errors (a memory error that was
corrected for ECC errors; these are single bit errors for Texture memory;
these are errors fixed by resend) for the NVML_VOLATILE_ECC counter
(Volatile counts are reset each time the driver loads).
Requires ECC Mode to be enabled.
* Graphics Clock speed - sensor /test/nvml/clock_graphics
Set up to use the nvmlDeviceGetClock function to retrieves the clock speed
(current actual clock value) for the graphics clock domain in MHz.
* SM Clock speed - sensor /test/nvml/clock_sm
Set up to use the nvmlDeviceGetClock function to retrieves the clock speed
(current actual clock value) for the SM clock domain in MHz.
* Memory Clock speed - sensor /test/nvml/clocl_mem
Set up to use the nvmlDeviceGetClock function to retrieves the clock speed
(current actual clock value) for the memory clock domain in MHz.
* Total memory - sensor /test/nvml/memory_tot
Set up to use the nvmlDeviceGetMemoryInfo function to retrieve the amount
of total memory available on the device, in bytes.
* Free memory - sensor /test/nvml/memory_free
Set up to use the nvmlDeviceGetMemoryInfo function to retrieve the amount
of free memory available on the device, in bytes.
* Used memory - sensor /test/nvml/memory_used
Set up to use the nvmlDeviceGetMemoryInfo function to retrieve the amount
of used memory on the device, in bytes.
* Memory utilisation rate - sensor /test/nvml/util_mem
Set up to use the nvmlDeviceGetUtilizationRates function to retrieve the
current utilization rates for the memory subsystem. It's reported as a
percent of time over the past sample period during which global (device)
memory was being read or written.
* GPU utlisation - sensor /test/nvml/util_gpu
Set up to use the nvmlDeviceGetUtilizationRates function to retrieve the
current utilization rates for the gpu. It's reported as a percent of time
over the past sample period during which one or more kernels was executing
on the GPU.
* PCIe throughput - sensor /test/nvml/pcie_thru
Set up to use the DeviceGetPcieThroughput function to retrieve the PCIe
utilization information. This function is querying a byte counter over a
20ms interval and thus is the PCIe throughput (NVML_PCIE_UTIL_COUNT) over
that interval. The throughput is returned in KB/s.
Other possible counters are: NVML_PCIE_UTIL_TX_BYTES (transmitted bytes)
and NVML_PCIE_UTIL_RX_BYTES (received bytes).
* Fan - sensor /test/nvml/fan
Set up to use the nvmlDeviceGetFanSpeed funtion to retrieve the intended
operating speed of the device's fan. The fan speed is expressed as a percent
of the maximum, i.e. full speed is 100%.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment