Commit 2da6b52d authored by Matteo Melli's avatar Matteo Melli

Improved README.md

parent 1c32db6b
# pgio
A Java CLI turned in executable using GraalVM that is able to capture stats on disk IO usage per process and to turn this info into a stream of data that can be stored and interpreted by Prometheus to draw graphs that explain which process or group of processes is using the disk most during a period of time.
A Java CLI turned in executable using GraalVM that is able to capture disk IO
usage stats per process and to turn this info into a stream of data that can be
stored and interpreted (using CSV by default but supporting Prometheus) to
analyze and find out which part of postgresql is using the disk most during a
period of time.
## Stats reads
## Stats collected
pgio runs only on Linux modern kernel (> 2.2.x) reading /proc/vmstat (help no docs available on this one) and /proc/<pid>/io (http://man7.org/linux/man-pages/man5/proc.5.html)
pgio runs only on Linux modern kernels (>= 2.6.x) reading `/proc/vmstat` and
`/proc/<pid>/(cmdline|stat|io)`
(http://man7.org/linux/man-pages/man5/proc.5.html)
Taking iotop as an explample (using this as a source of info: https://unix.stackexchange.com/questions/248197/iotop-showing-1-5-mb-s-of-disk-write-but-all-programs-have-0-00-b-s/248218#248218):
Taking iotop as a reference (see https://unix.stackexchange.com/questions/248197/iotop-showing-1-5-mb-s-of-disk-write-but-all-programs-have-0-00-b-s/248218#248218):
iotop read information per process from /proc/<pid>/io, in particular:
iotop read information per process from `/proc/<pid>/io`, in particular:
```
rchar: characters read
......@@ -33,37 +39,47 @@ pgpgin – Number of kilobytes the system has paged in from disk per second.
pgpgout – Number of kilobytes the system has paged out to disk per second.
```
Comnining per process info with global info iotop calculate the % of write/read throughput each process is consuming.
Combining per process info with global info iotop calculate the % of write/read
throughput each process is consuming.
Anyway the iotop commented in stackexchange seems to be a python remake of original iotop (https://github.com/analogue/iotop).
Referenced iotop is a python rewrite of original iotop
(https://github.com/analogue/iotop).
In real iotop code (https://github.com/Tomas-M/iotop) is uses Taskstats (https://www.kernel.org/doc/Documentation/accounting/taskstats.txt).
Original iotop code (https://github.com/Tomas-M/iotop) uses Taskstats
(https://www.kernel.org/doc/Documentation/accounting/taskstats.txt).
Seems that stats used from taskstats calls includes following stats from
`/proc/<pid>/io`:
Since we want to know per process bytes read and written we should look for:
```
read_bytes: bytes read
Attempt to count the number of bytes which this process
really did cause to be fetched from the storage layer.
This is accurate for block-backed filesystems.
read_bytes: bytes read
Attempt to count the number of bytes which this process
really did cause to be fetched from the storage layer.
This is accurate for block-backed filesystems.
write_bytes: bytes written
Attempt to count the number of bytes which this process
caused to be sent to the storage layer.
write_bytes: bytes written
Attempt to count the number of bytes which this process
caused to be sent to the storage layer.
cancelled_write_bytes:
The big inaccuracy here is truncate. If a process
writes 1MB to a file and then deletes the file, it will
in fact perform no writeout. But it will have been
accounted as having caused 1MB of write. In other
words: this field represents the number of bytes which
this process caused to not happen, by truncating page‐
cache. A task can cause "negative" I/O too. If this
task truncates some dirty pagecache, some I/O which
another task has been accounted for (in its
write_bytes) will not be happening.
cancelled_write_bytes:
The big inaccuracy here is truncate. If a process
writes 1MB to a file and then deletes the file, it will
in fact perform no writeout. But it will have been
accounted as having caused 1MB of write. In other
words: this field represents the number of bytes which
this process caused to not happen, by truncating page‐
cache. A task can cause "negative" I/O too. If this
task truncates some dirty pagecache, some I/O which
another task has been accounted for (in its
write_bytes) will not be happening.
```
Since relation between global and per process info can be out of sync we include
`rchar` and `wchar` (more correlated to pgpgio and pgpgout but that could
include io stats not related to disk) stats together with `read_bytes`,
`write_bytes` and `cancelled_write_bytes` (directly related to disk io but less
precise) stats to allow a better analisys of the stats collected.
## Build
To build the project as a tar.gz with all Java dependencies:
......@@ -72,7 +88,8 @@ To build the project as a tar.gz with all Java dependencies:
mvn clean package -P assembler
```
To build the project as a tar.gz with an executable built using GraalVM `native-image` tool:
To build the project as a tar.gz with an executable built using GraalVM
`native-image` tool:
```
mvn clean package -P executable
......@@ -80,7 +97,8 @@ mvn clean package -P executable
## Run
This will generate output with collected stats of all processes (should be run as root) every 3 seconds:
This will generate output with collected stats of all processes (should be run
as root) every 3 seconds:
```
bin/pgio -D <postgresql data dir>
......@@ -115,9 +133,11 @@ Option Description
### Group configuration file
Specifing this file will enable a special mode where instead of single processes groups with aggregated data will be printed.
Specifing this file will enable a special mode where instead of single processes
groups with aggregated data will be printed.
A JSON indicating groups has the following syntax (regular expression will use pattern from [java.util.regex.Pattern](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html):
A JSON indicating groups has the following syntax (regular expression will use
pattern from [java.util.regex.Pattern](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html):
```
{
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment