Paul’s Blog

A blog without a good name

Osm2pgsql and Hstore

In my last post I looked at osm2pgsql style files and import and rendering speed. This was really looking at the width of rendering tables and how that impacted rendering speed. I found that eliminating a few unused columns decreased database size by 0.28% and increased rendering speed by 2.35%. The style file is not the only option that affects table width. Osm2pgsql has a series of options which allow you to store additional tags in an hstore column.

hstore is a special data type in postgresql, designed for storing sets of arbitrary key/value pairs. OpenStreetMap tags are also key/value pairs, which makes this data type ideal for storing a complete list of tags.

TOAST tables

To make sense of benchmarking results, it’s necessary to understand TOAST tables, EXTENDED storage, and how PostgreSQL stores large values. In PostgreSQL tuples (rows) are stored on pages, each of which is 8 kB, and a tuple must fit into one page. Obviously, an alternate strategy is required for when a tuple exceeds 8 kB it needs to be stored using another method. The methods that are used in these cases are compression and storage in a special TOAST table. These can only be used on variable-length (varlena) types, but text, hstore and geometry are all variable-length, and most of the columns in an osm2pgsql rendering table are of these types.

When a row is wider than 2 kB (or whatever TOAST_TUPLE_THRESHOLD is set to) PostgreSQL will try to compress field values and move them from the table to the TOAST table. Left behind in place of the data is an ID that points to a row in the TOAST table. When you want to access the TOAST’ed data PostgreSQL will automatically detect it is stored in the TOAST tables, retrieve the chunks from the TOAST tables by ID, and assemble them into the result, presenting what you expect without you ever realizing it was stored in a TOAST table.

The TOAST tables are so transparent to the user that I have not found a way of finding which values, if any, are stored in TOAST tables. Nonetheless, TOAST storage does matter when considering hstore performance because the hstore data may be stored in a TOAST table, changing size increase of rows.

osm2pgsql hstore options

Osm2pgsql has five hstore options

  • --hstore or -k adds any tags not already in a conventional column to a hstore column. With the standard stylesheet this would result in tags like highway appearing in a conventional column while tags not in the style like name:en or lanes:forward would appear only in the hstore column.
  • --hstore-all or -j adds all tags to a hstore column, even if they’re already stored in a conventional column. With the standard stylesheet this would result in tags like highway appearing in conventional column and the hstore column while tags not in the style like name:en or lanes:forward would appear only in the hstore column.
  • --hstore-column or -z, which adds an additional column for tags starting with a specified string, e.g. --hstore-column "name:' produces a hstore column that contains all name:xx tags
  • --hstore-match-only modifies the above options and prevents objects from being added if they only have tags in the hstore column and no conventional tags.
  • --hstore-add-index adds an index to the hstore columns

Either --hstore or --hstore-all when combined with --hstore-match-only should give the same rows, just with an additional unused column.

Results

As before, we’re using a modified osm2pgsql which doesn’t build indexes at import time. Sizes are from pg_column_size and pg_table_size.

The total number of rows was the same without hstore, with -k --match-only and with -j --match-only and 0.6% larger with -k or -j

Import type Rendering rate (MT/s) Speed decrease osm2pgsql time Rendering tables size Size increase Hstore size
No hstore 8.665 ± 0.017 N/A 14600 60.7 GB N/A N/A
-k –match-only 8.632 ± 0.022 0.38% 14780 66.2 GB 9% 5.3 GB
-j –match-only 8.592 ± 0.010 0.84% 14984 75.8 GB 25% 14.6 GB
-k 8.644 ± 0.018 0.24% 14895 66.5 GB 10% 5.36 GB
-j 8.594 ± 0.013 0.82% 15041 76.1 GB 25% 14.7 GB

Conclusions

The rendering rate without hstore agrees with previous results, which validates the methods used.

It is immediately obvious that --match-only has virtually no impact when used with the standard style because most objects have at least one tag used by default.style. This is not surprising because there is a strong feedback cycle between what is mapped and what is displayed on the main page. Additional tags are mainly used to provide additional information.

Looking at the size of the hstore columns, we can tell that most tags are present in the normal columns. There is little benefit to -j over -k as all -j serves to do is duplicate all the data in normal columns to the hstore, at the cost of slower rendering, duplicated data and hstore columns three times the size.

Previous results found a rendering speed increase of 2.35% with a table size decrease 0.28%. This compares with the hstore results where a much larger size increase results in a minor speed decrease.

Summary

There are minimal speed decreases from a hstore column created with -k/--hstore, although there is a more significant size increase. Other hstore options like -j/--hstore-all and --match-only are of minimal use except for specialized cases.