When testing an osm2pgsql bug report I did some testing of osm2pgsql node parsing speed and various CXXFLAGS. CXXFLAGS is a variable that can tell the compiler to apply various optimizations when compiling the code, potentially resulting in speed increases.
The two flags I tested were -O
and -march
. There are countless others, and many that are not optimization related, but these two are all that need to be adjusted. More flags are not generally useful.
-O
sets the overall level of optimization. Additional optimizations increase compile time, and can decrease the size of the resulting program. I looked at -O0
, the default; -O2
, the most commonly recommended level of optimization; and -O3
, a higher level of optimization. -O3
is known to sometimes decrease performance.
-march
tells GCC to use CPU-specific instruction sets like MMX, SSE, or AVX. This can speed up some operations, but results in binaries that don’t work on different CPUs. -march=native
will auto-detect the appropriate CPU, but for reproducable tests I’m using -march=core-avx2
, which corresponds to my fairly recent CPU.
This all originated with a bug report about node processing speed, so to test I ran osm2pgsql -O null --flat-nodes nodes.bin -s -C 22000 planet-150923.osm.pbf
. The -O null
option avoids writing to a backend and is intended for this kind of debugging. I only timed the node speed and cancelled when nodes were finished. Node processing generally is single-threaded and CPU limited.
Node processing speed
Optimization level | No other flags | -march=core-avx2 |
---|---|---|
-O0 |
1179k/s | 1171k/s |
-O2 |
5149k/s | 5193k/s |
-O3 |
5072k/s | 5175k/s |
Binary size
Optimization level | No other flags | -march=core-avx2 |
---|---|---|
-O0 |
1.6M | 1.6M |
-O2 |
685K | 685K |
-O3 |
693K | 689K |
Conclusions
-O2
should be used with osm2pgsql. There is no evidence that -O3
is an improvement, which is consistent with GCC and other recommendations. The new osm2pgsql CMake build system will automatically use -O2
.
Other parts of the osm2pgsql import consume more time, and a --slim
import without --drop
will spend much of its time in PostgreSQL building a large index, where no osm2pgsql speedups can help.