Commit a2cb2757 authored by Matteo Melli's avatar Matteo Melli

Fixes and improvements for README.md

parent 16db12b4
# pgio
A Java CLI turned in executable using GraalVM that is able to capture disk IO
usage stats per process (that can be grouped by process type) and total of the system.
The tool produce a stream of data that can be
stored and interpreted (using CSV by default or exporting to Prometheus) to
analyze and find out which part of PostgreSQL is using the disk most during a
period of time.
usage stats per process group (default groups are based on common postgresql
installation) and total of the system.
The tool produce a stream of data that can be stored and interpreted (using
CSV by default or exporting to Prometheus) to analyze and find out which part
of PostgreSQL is using the disk most during a period of time.
## Stats collected
pgio runs only on Linux modern kernels (>= 2.6.x) reading `/proc/vmstat` and
`/proc/<pid>/(cmdline|stat|io)`
(http://man7.org/linux/man-pages/man5/proc.5.html)
(see [here](http://man7.org/linux/man-pages/man5/proc.5.html))
Taking iotop as a reference (see https://unix.stackexchange.com/questions/248197/iotop-showing-1-5-mb-s-of-disk-write-but-all-programs-have-0-00-b-s/248218#248218):
Taking iotop as a reference (see [here](https://unix.stackexchange.com/questions/248197/iotop-showing-1-5-mb-s-of-disk-write-but-all-programs-have-0-00-b-s/248218#248218)):
iotop read information per process from `/proc/<pid>/io`, in particular:
......@@ -43,11 +43,10 @@ pgpgout – Number of kilobytes the system has paged out to disk per second.
Combining per process info with global info iotop calculate the % of write/read
throughput each process is consuming.
Referenced iotop is a python rewrite of original iotop
(https://github.com/analogue/iotop).
Referenced iotop is a python rewrite of original [iotop](https://github.com/analogue/iotop).
Original iotop code (https://github.com/Tomas-M/iotop) uses Taskstats
(https://www.kernel.org/doc/Documentation/accounting/taskstats.txt).
Original [iotop code](https://github.com/Tomas-M/iotop) uses
[Taskstats](https://www.kernel.org/doc/Documentation/accounting/taskstats.txt).
Seems that stats used from taskstats calls includes following stats from
`/proc/<pid>/io`:
......@@ -89,17 +88,18 @@ To build the project as a tar.gz with all Java dependencies:
mvn clean package -P assembler
```
To build the project as a tar.gz with an executable built using GraalVM
To build the project as a tar.gz with an executable built using [GraalVM](https://www.graalvm.org/)
`native-image` tool:
```
export GRAALVM_HOME=<path to graalvm home>
mvn clean package -P executable
```
## Run
This will generate output with collected stats of all processes (should be run
as root) every 3 seconds:
This will generate output with collected stats of grouped processes (should be run
as user with sufficient privileges like postgres) every 3 seconds:
```
bin/pgio -D <postgresql data dir>
......@@ -134,34 +134,35 @@ Option Description
### Group configuration file
Specifing this file will enable a special mode where instead of single processes
groups with aggregated data will be printed.
Specifying this file will allow to change default groups configuration.
A JSON indicating groups has the following syntax (regular expression will use
pattern from [java.util.regex.Pattern](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html):
```
{
"<group name>" [ <list of regexp> ],
[
{ "<group name>": [ <list of regexp> ] }
...
}
]
```
For example to use with PostgreSQL we create a template using following `postgresql.json`:
Default group configuration is:
```
{
"archiver": [ ".*archiver.*" ],
"wal sender": [ ".*wal sender.*" ],
"bgwriter": [ ".*writer process.*" ],
"autovacuum": [ ".*autovacuum worker process.*" ],
"stats": [ ".*stats collector process.*" ],
"wal writer": [ ".*wal writer process.*" ],
"checkpoint": [ ".*checkpointer process.*" ],
"query": [ "postgres: " ],
}
[
{ "archiver": [ ".*archiver.*" ] },
{ "wal sender": [ ".*wal sender.*" ] },
{ "bgwriter": [ ".*writer process.*" ] },
{ "autovacuum": [ ".*autovacuum worker process.*" ] },
{ "stats": [ ".*stats collector process.*" ] },
{ "wal writer": [ ".*wal writer process.*" ] },
{ "checkpoint": [ ".*checkpointer process.*" ] },
{ "query": [ "postgres: " ] }
]
```
If configuration is save in file `postgresql.json` it can be used as follow:
```
bin/pgio -D <postgresql data dir> --advanced --group postgresql.json
```
......@@ -179,8 +180,6 @@ The expected output will include following groups:
The idea is that you can extend those groups to include other relevant
tools like wal-e, pgbouncer, particual user's queries.
Grouping stats by process type reduce the number of stats collected
and provide more understandable metrics.
## Prometheus service
......@@ -189,3 +188,58 @@ To start pgio as a prometheus service:
```
bin/pgio -D <postgresql data dir> --advanced --prometheus-service --prometheus-bind 0.0.0.0 --prometheus-port 9090
```
### CSV output example
```
$ pgio
timestamp,pid,ppid,label,rchar,wchar,read_bytes,write_bytes,cancelled_write_bytes
2018-12-19T14:42:11.070Z,"archiver",,,0,0,0,0,0
2018-12-19T14:42:11.070Z,"wal sender",,,0,0,0,0,0
2018-12-19T14:42:11.070Z,"bgwriter",,,0,0,0,0,0
2018-12-19T14:42:11.070Z,"autovacuum",,,0,0,0,0,0
2018-12-19T14:42:11.070Z,"stats",,,0,0,0,0,0
2018-12-19T14:42:11.070Z,"wal writer",,,0,0,0,0,0
2018-12-19T14:42:11.069Z,"checkpoint",,,0,0,0,0,0
2018-12-19T14:42:11.070Z,"other",,,0,0,0,0,0
2018-12-19T14:42:14.017Z,"archiver",,,0,0,0,0,0
2018-12-19T14:42:14.017Z,"wal sender",,,0,0,0,0,0
2018-12-19T14:42:14.016Z,"bgwriter",,,1,50061313,73728,50061312,0
2018-12-19T14:42:14.017Z,"autovacuum",,,0,0,0,0,0
2018-12-19T14:42:14.017Z,"stats",,,0,0,0,0,0
2018-12-19T14:42:14.016Z,"wal writer",,,0,50061312,73728,50061312,0
2018-12-19T14:42:14.016Z,"checkpoint",,,0,0,0,0,0
2018-12-19T14:42:14.017Z,"other",,,42409984,342833920,30908416,249819136,0
2018-12-19T14:42:17.017Z,"archiver",,,0,0,0,0,0
2018-12-19T14:42:17.017Z,"wal sender",,,0,0,0,0,0
2018-12-19T14:42:17.016Z,"bgwriter",,,0,75595776,135168,75595776,0
2018-12-19T14:42:17.017Z,"autovacuum",,,0,0,0,0,0
2018-12-19T14:42:17.016Z,"stats",,,0,0,0,0,0
2018-12-19T14:42:17.016Z,"wal writer",,,0,75595776,135168,75595776,0
2018-12-19T14:42:17.016Z,"checkpoint",,,0,0,0,0,0
2018-12-19T14:42:17.016Z,"other",,,44113920,359063552,45285376,244834304,0
2018-12-19T14:42:20.019Z,"archiver",,,0,0,0,0,0
2018-12-19T14:42:20.019Z,"wal sender",,,0,0,0,0,0
2018-12-19T14:42:20.019Z,"bgwriter",,,0,98852864,0,98852864,0
2018-12-19T14:42:20.020Z,"autovacuum",,,0,0,0,0,0
2018-12-19T14:42:20.019Z,"stats",,,0,0,0,0,0
2018-12-19T14:42:20.019Z,"wal writer",,,0,98852864,0,98852864,0
2018-12-19T14:42:20.018Z,"checkpoint",,,0,221370,299008,225280,0
2018-12-19T14:42:20.019Z,"other",,,44843008,335036416,45166592,235864064,0
2018-12-19T14:42:23.019Z,"archiver",,,0,0,0,0,0
2018-12-19T14:42:23.019Z,"wal sender",,,0,0,0,0,0
2018-12-19T14:42:23.018Z,"bgwriter",,,1,9003009,0,9003008,0
2018-12-19T14:42:23.020Z,"autovacuum",,,0,0,0,0,0
2018-12-19T14:42:23.019Z,"stats",,,0,0,4096,0,0
2018-12-19T14:42:23.018Z,"wal writer",,,1,9003009,0,9003008,0
2018-12-19T14:42:23.018Z,"checkpoint",,,1,16670992,409600,16678912,0
2018-12-19T14:42:23.019Z,"other",,,8633088,75464765,8630272,58880000,0
2018-12-19T14:42:26.023Z,"archiver",,,0,0,0,0,0
2018-12-19T14:42:26.023Z,"wal sender",,,0,0,0,0,0
2018-12-19T14:42:26.022Z,"bgwriter",,,0,0,0,0,0
2018-12-19T14:42:26.024Z,"autovacuum",,,0,0,0,0,0
2018-12-19T14:42:26.022Z,"stats",,,0,0,0,0,0
2018-12-19T14:42:26.022Z,"wal writer",,,0,0,0,0,0
2018-12-19T14:42:26.022Z,"checkpoint",,,0,0,0,0,0
2018-12-19T14:42:26.022Z,"other",,,0,0,0,0,0
```
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment