A Java CLI turned in executable using GraalVM that is able to capture stats on disk IO usage per process and to turn this info into a stream of data that can be stored and interpreted by Prometheus to draw graphs that explain which process or group of processes is using the disk most during a period of time.
## Stats reads
pgio runs only on Linux modern kernel (> 2.2.x) reading /proc/vmstat (help no docs available on this one) and /proc/<pid>/io (http://man7.org/linux/man-pages/man5/proc.5.html)
Taking iotop as an explample (using this as a source of info: https://unix.stackexchange.com/questions/248197/iotop-showing-1-5-mb-s-of-disk-write-but-all-programs-have-0-00-b-s/248218#248218):
iotop read information per process from /proc/<pid>/io, in particular:
```
rchar: characters read
The number of bytes which this task has caused to be
read from storage. This is simply the sum of bytes
which this process passed to read(2) and similar system
calls. It includes things such as terminal I/O and is
unaffected by whether or not actual physical disk I/O
was required (the read might have been satisfied from
pagecache).
wchar: characters written
The number of bytes which this task has caused, or
shall cause to be written to disk. Similar caveats
apply here as with rchar.
```
iotop read information globally from /proc/vmstat, in particular:
```
pgpgin – Number of kilobytes the system has paged in from disk per second.
pgpgout – Number of kilobytes the system has paged out to disk per second.
```
Comnining per process info with global info iotop calculate the % of write/read throughput each process is consuming.
Anyway the iotop commented in stackexchange seems to be a python remake of original iotop (https://github.com/analogue/iotop).
In real iotop code (https://github.com/Tomas-M/iotop) is uses Taskstats (https://www.kernel.org/doc/Documentation/accounting/taskstats.txt).
Since we want to know per process bytes read and written we should look for:
```
read_bytes: bytes read
Attempt to count the number of bytes which this process
really did cause to be fetched from the storage layer.
This is accurate for block-backed filesystems.
write_bytes: bytes written
Attempt to count the number of bytes which this process
caused to be sent to the storage layer.
cancelled_write_bytes:
The big inaccuracy here is truncate. If a process
writes 1MB to a file and then deletes the file, it will
in fact perform no writeout. But it will have been
accounted as having caused 1MB of write. In other
words: this field represents the number of bytes which
this process caused to not happen, by truncating page‐
cache. A task can cause "negative" I/O too. If this
task truncates some dirty pagecache, some I/O which
another task has been accounted for (in its
write_bytes) will not be happening.
```
## Build
To build the project as a tar.gz with all Java dependencies:
```
mvn clean package -P assembler
```
To build the project as a tar.gz with an executable built using GraalVM `native-image` tool:
```
mvn clean package -P executable
```
## Run
This will generate output with collected stats of all processes (should be run as root) every 3 seconds:
```
bin/pgio
```
## Options
```
Option Description
------ -----------
-g, --group <String> Group results using specified group configuration file
-h, --help Displays this help message and quit
-i, --interval <String> Interval in milliseconds to gather stats (default:
3000)
-o, --show-other Print read/write data not accounted by any listed
process
--ppid <String> Parent pid of the process to scan (if not specified
will scan all processes)
--print-header Print header
-s, --show-system Print read/write data for the whole system
-v, --version Show version and quit
```
### Group configuration file
Specifing this file will enable a special mode where instead of single processes groups with aggregated data will be printed.
A JSON indicating groups has the following syntax (regular expression will use pattern from [java.util.regex.Pattern](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html):
value="WhitespaceAround: ''{0}'' is not followed by whitespace. Empty blocks may only be represented as '{}' when not part of a multi-block statement (4.1.3)"/>
<messagekey="ws.notPreceded"
value="WhitespaceAround: ''{0}'' is not preceded with whitespace."/>