ZFS Settings for Osm2pgsql

Continuing with the setup of my new server, I moved on to filesystem settings. Unlike previous servers, this server runs FreeBSD with a ZFS file system. ZFS has many features but importantly for osm2pgsql, it allows transparent on-disk compression and adjusting the disk record size.

There is contradictory information available about both these tunables. Gregory Smith’s PostgreSQL 9.0 High Performance suggests adjusting the recordsize to match the PostgreSQL block size of 8k if doing scattered random IO, and using transparent compression for scans of data too small to be compressed by TOAST. On the other hand, some mailing list posts have suggested 128K for ETL workloads like osm2pgsql.

I was unable to find any ZFS tuning suggestions specific to spatial databases, so I was operating completely in the dark, which called for benchmarking.

Using the previous postgresql tuning, I ran a series of imports, adjusting the ZFS recordsize and compression.

Results

Processing time
Compression	8K recordsize	128K recordsize
None	6.3h	8.4h
lz4	6.1h	10.8h

Processing time includes processing nodes, ways and relations, and is single-threaded. A faster SSD-only server with a faster CPU can complete this stage in 4 hours.

PostgreSQL index and cluster time
Compression	8K recordsize	128K recordsize
None	18.4h	21.8h
lz4	12.8h	30.4h

This time is dominated by the creation of an 100GB GIN index on a bigint[] column.

Total import time
Compression	8K recordsize	128K recordsize
None	27.2h	32.7h
lz4	21.5h	43.7h

Not broken out seperately is pending ways, as this is largely uneffected by ZFS settings.

Record size

The one clear conclusion is that 8K recordsizes are faster than 128K recordsizes. This holds true if you further subdivide the import into individual tasks like processing ways or relations. I found no cases where 128K was faster.

Compression and 128K recordsize is a particularly bad combination. I’m not familiar enough with the internals of ZFS, but it’s possible that every time PostgreSQL does a write of 8K ZFS is forced to fetch the 128K record, decompress it, add the new 8K, and recompress it.

An examination of the CPU usage supports this, with the CPU difference between 128K compressed and uncompressed being bigger than the difference between 8K compressed and uncompressed. This is true for both CPU utilization, and total CPU time used.

Compression

Compression is more interesting. It’s a performance gain on 8K recordsizes, while a loss on 128K recordsizes. 8K is more important, as that’s all-around faster, and it’s worth looking at it in more detail.

Detailed time
Compression setting		off	lz4
Processing	node	1516	1514
	way	5589	5938
	relation	15853	14716
	Total	22958	22168
Pending ways		8746	8889
PostgreSQL index and cluster		66330	46365
Total		98058	77441

Pending ways is unique in that it is the only purely write part of the import that is IO bound, with the load consisting of many COPY statements. It is also the only part that is slower on a compressed ZFS volume.

Total CPU usage

With compression, CPU utilization can be expected to be higher since the import completes faster. On a sufficiently fast IO system, CPU could again be the limiting factor. The 8K lz4 import used 33.9 cpu-hours while the 8K uncompressed import used 34.2 CPU-hours. On a modern system with multiple cores, CPU usage is unlikely to be a reason to reject lz4 compression. It may be an issue with slower methods like lzjb.

Conclusions

A recordsize of 8K is preferrable to 128K for ZFS volumes with a tablespace under all conditions found. For anything but the purely write portions of the import, lz4 compression was found to offer speed advantages, proving to be very significant speed advantages for large GIN index creation.

Further work

ZFS settings were kept constant with 128K recordsize and uncompressed for the xlog. This may not be optimal, given the gains shown with 8K recordsize on the tablespace volume.

Large PostGIS geometries are stored compressed with TOAST and MAIN storage. This compression is on top of the lz4 compression done by ZFS, so results in duplicated work. Better performance might be obtained by switching all MAIN storage to PLAIN or EXTERNAL.

Paul’s Blog

A blog without a good name

ZFS Settings for Osm2pgsql

Results

Processing time

PostgreSQL index and cluster time

Total import time

Record size

Compression

Detailed time

Total CPU usage

Conclusions

Further work