Last post ended with downloading OpenStreetMap data. This post will leave the data aside and switch to downloading and building a style. There’s lots of styles available, but we’re going to use OpenStreetMap Carto, the current default on OpenStreetMap.org. Also, because we need software not packaged in Debian, that needs to be installed.
To do something with OpenStreetMap data, we have to download it first. This can be the entire data from planet.openstreetmap.org or a smaller extract from a provider like Geofabrik. If you’re doing this manually, it’s easy. Just a single command will call
wget, or you can download it from the browser. If you want to script it, it’s a bit harder. You have to worry about error conditions, what can go wrong, and make sure everything can happen unattended. So, to make sure we can do this, we write a simple bash script.
The most common use for OpenStreetMap data is hosting your own map. If you need up to the minute data, the entire world, and high zooms, this requires a dedicated server running renderd+mod_tile or other specialized software that handles requests. On the other hand, if less frequently updated data and low zooms is all that’s needed, it can make more sense to pre-render tiles and serve them off of an existing server as files from disk.
Over the next few posts, I’m going to be walking through step-by-step on how to generate these files, starting with downloading OpenStreetMap data, and ending up with rendered tiles.
I’m often asked how much space you need for an osm2pgsql database of the planet. The answer depends on what you plan to do with it, how you want to load the data, and how long you want your server to last.
If you want to serve vector tiles, there are a few server options that have developed, each with different strengths and weaknesses.
In my last post I took a sample of PostgreSQL queries and loaded them into a table to do some analysis. The next step is to
This log will include queries from osm2pgsql updating the database, but normally we’re only optimizing rendering queries. Mapnik starts all of these with
SELECT ST_AsBinary("way") AS geom so we can make a view of just rendering queries with
CREATE VIEW render_query_log AS SELECT * FROM query_log WHERE query LIKE 'SELECT ST_AsBinary("way") AS geom%';
There are some tools to turn PostgreSQL queries back into style layers and zoom, but none of them work well on modern complex style queries, so we need to figure out a different way to group queries for the same layers.
Mapnik has four tokens which can appear in a query.
!bbox!is replaced the buffered bounding box of the area being rendered. This text ends up looking like
!scale_denominator!is replaced by the scale in projected units per pixel. OpenStreetMap Carto doesn’t use this, but it’s often seen in vector tile styles to calculate the zoom level. This is a number, either an integer or floating point depending on zoom.
!pixel_height!which are the width and height of a pixel in projected units. These are both numbers, either integers or floating points. They can be zero. In OpenStreetMap carto they are alway seen in the form
way_area > [some number]*!pixel_width!::real*!pixel_height!::real.
To get the common part of the query we can cut off the query text at the first occurance of
way_area. The former is only inserted by Mapnik and the latter only appears on lines with Mapnik tokens or as part of an `ORDER BY very late in the query.
Splitting the string by these tokens and taking the first part can solve this.
1 2 3 4
SELECT count(*), sum(run_length), query q FROM render_query_log GROUP BY q
regexp_split_to_array(‘select foo from a way_area b seasdfas’, ‘(way_area|ST_SetSRID)’)
Part of optimizing a style or application using PostgreSQL is finding what queries are taking up the most time, and improving them. The first part of this is finding the slow queries.
Logging slow queries gives a misleading view on what is using the most resources. A query that takes 100 seconds to run but only runs once a day will use less resource than a 100ms query running every second, but the fast often repeated query might not appear in the logs. This is common with map rendering where low-zoom queries are slow but infrequently run.
Logging all queries is comprehensive but can have a noticeable performance impact. Every query causes a write to the server logs. This can be an unacceptable performance drag on a production server.
Both methods suffer from adjusting log file settings. This means adjusting the server configuration, having the setting apply to all databases, having to remove extraneous information from the logs, and generally being hard to parse. In a shared environment this can be impossible to get done. If you have statements with newlines the newlines make it into the logfile unescaped, making it hard to parse them with normal tools.
Fortunately, there is another method: sampling running queries.
I’ve been experimenting with generating my own vector tiles and client-side rendering with Tangram in order to figure out how to best write a style in its language.
Tangram is a GL-based renderer written by Mapzen and normally used with their Tilezen vector tiles, but I’m interested in being able to make my own vector tiles with different cartographic choices.
Part of normal database maintenance is to rebuild indexes and recluster tables, but this often gets ignored on rendering servers.
Antidotal reports report a speed increase of 25%–50% from reclustering, and this can be done without shutting down the rendering server. The overall plan is to create a new copy of the tables, build new indexes, then replace the old tables.