Paul’s Blog

A blog without a good name

Optimizing Osm2pgsql CXXFLAGS

When testing an osm2pgsql bug report I did some testing of osm2pgsql node parsing speed and various CXXFLAGS. CXXFLAGS is a variable that can tell the compiler to apply various optimizations when compiling the code, potentially resulting in speed increases.

The two flags I tested were -O and -march. There are countless others, and many that are not optimization related, but these two are all that need to be adjusted. More flags are not generally useful.

-O sets the overall level of optimization. Additional optimizations increase compile time, and can decrease the size of the resulting program. I looked at -O0, the default; -O2, the most commonly recommended level of optimization; and -O3, a higher level of optimization. -O3 is known to sometimes decrease performance.

-march tells GCC to use CPU-specific instruction sets like MMX, SSE, or AVX. This can speed up some operations, but results in binaries that don’t work on different CPUs. -march=native will auto-detect the appropriate CPU, but for reproducable tests I’m using -march=core-avx2, which corresponds to my fairly recent CPU.

This all originated with a bug report about node processing speed, so to test I ran osm2pgsql -O null --flat-nodes nodes.bin -s -C 22000 planet-150923.osm.pbf. The -O null option avoids writing to a backend and is intended for this kind of debugging. I only timed the node speed and cancelled when nodes were finished. Node processing generally is single-threaded and CPU limited.

Node processing speed

Optimization level No other flags -march=core-avx2
-O0 1179k/s 1171k/s
-O2 5149k/s 5193k/s
-O3 5072k/s 5175k/s

Binary size

Optimization level No other flags -march=core-avx2
-O0 1.6M 1.6M
-O2 685K 685K
-O3 693K 689K

Conclusions

-O2 should be used with osm2pgsql. There is no evidence that -O3 is an improvement, which is consistent with GCC and other recommendations. The new osm2pgsql CMake build system will automatically use -O2.

Other parts of the osm2pgsql import consume more time, and a --slim import without --drop will spend much of its time in PostgreSQL building a large index, where no osm2pgsql speedups can help.