Paul’s Blog

A blog without a good name

Loading the Data

With data downloaded and the style built, the next step is to load the data. Sometimes this scares people, but really shouldn’t. A modern server with the capacity to serve the world will have no problems building the database.

Loading can easily be done on a single CPU server and the RAM needed is less than you want for caching later on.

Like before, the first step is setting some variables.

1
2
3
4
5
6
#!/usr/bin/env bash

set -euf -o pipefail

PLANET_FILE='data.osm.pbf'
export PGDATABASE='osmcarto_prerender'

Next, a database is needed. OpenStreetMap Carto documents what extensions are needed by it, so we just need to follow those directions.

1
2
3
4
dropdb --if-exists "${PGDATABASE}"

createdb
psql -Xqw -c 'CREATE EXTENSION postgis; CREATE EXTENSION hstore;'

OpenStreetMap Carto needs data loaded with osm2pgsql, like most styles. The osm2pgsql options can be broken down into three groups: style settings, performance, and locations.

The style settings control how the data in the database is represented. These are given by the style. We don’t have to know what they mean, so we just have to use what OpenStreetMap Carto’s documentation says: -G --hstore --style openstreetmap-carto.style --tag-transform-script openstreetmap-carto.lua

The locations are where to get the OSM data, database names, and other information that relates to where to read and save everything.

Performance options are the only ones that require some judgement to set. Because this script is intended for the full planet, we use --slim --flat-nodes ${FLAT_NODES}, just like the osm2pgsql documentation suggests. Also, we know the database will not be updated with --append, so we can use the --drop option, which skips indexing the slim tables and drops them instead, saving time and space.

We need to set the how much memory is used to cache node positions. This should never be set so high that the server runs out of RAM, but there’s no gain to setting it to more than is needed to cache every node. A general rule of thumb is to set it to 75% of RAM size, in MB. With the size of the planet right now, I also know that it doesn’t need more than 40GB, but this is subject to change.

This results in the osm2pgsql command

1
2
3
4
5
6
7
FLAT_NODES='nodes.bin'
OSM2PGSQL_CACHE='40000'

osm2pgsql -G --hstore --style 'openstreetmap-carto/openstreetmap-carto.style' \
  --tag-transform-script 'openstreetmap-carto/openstreetmap-carto.lua' \
  --slim --drop --flat-nodes "${FLAT_NODES}" --cache "${OSM2PGSQL_CACHE}" \
  -d "${PGDATABASE}" "${PLANET_FILE}"

On a SSD-based server with 64GB RAM, this should take 10-20 hours to process the planet. On a tuned server with NVMe drives, it can be under 5 hours.

Rearranging the order of some commands and adding cleanup, we get a script that we can run.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#!/usr/bin/env bash

set -euf -o pipefail

PLANET_FILE='data.osm.pbf'
export PGDATABASE='osmcarto_prerender'
FLAT_NODES='nodes.bin'
OSM2PGSQL_CACHE='40000'

# PGDATABASE is set, so postgres commands don't need a database name supplied

# Clean up any existing db and files
dropdb --if-exists "${PGDATABASE}"
rm -f -- "${FLAT_NODES}"

createdb
psql -Xqw -c 'CREATE EXTENSION postgis; CREATE EXTENSION hstore;'

osm2pgsql -G --hstore --style 'openstreetmap-carto/openstreetmap-carto.style' \
  --tag-transform-script 'openstreetmap-carto/openstreetmap-carto.lua' \
  --slim --drop --flat-nodes "${FLAT_NODES}" --cache "${OSM2PGSQL_CACHE}" \
  -d "${PGDATABASE}" "${PLANET_FILE}"

rm -f -- "${FLAT_NODES}"