Paul’s Blog

A blog without a good name

More Work on Bolder

After the birds of a feather session Richard Fairhurst lead at State of the Map, I was motivated to continue some work on bolder, a client-side style I’ve been working on.

While I was working at the Wikimedia Foundation, I developed brighmed, a CartoCSS style using vector tiles. Wikimedia decided not to flip the switch to deploy the style, but the style is open source, so I can use it elsewhere. Making this decision, I spent a day implementing most of it in Tangram.

Bolder example image

What’s next?

I’ve got some missing features like service roads and some railway values to add, then I can look at new stuff like POIs. For that I’ll need to look at icons and where to fit them into colourspace.

There’s a bunch of label work that needs to be done, what I have is just a first pass, and some things like motorway names have big issues, and ref tags still need rendering. Label quality is of course a unending quest, but I should be able to get some big gains without much work.

Richard is planning to do some work on writing a schema, and if it works, I’d like to adopt it. At the same time, I don’t want to tie myself to an external schema which may have different cartographic aims, so I’ll have to see how that works out. Looking at past OpenStreetMap Carto changes to project.mml, I found that what would be breaking schema changes on a vector tile project are less common than I thought, happening about once every 4-6 months. Most of the schema changes that would have happened were compatible and could be handled by regenerating tiles in the background.

“Make the Website Use the API” GSOC Project

I’m a potential mentor for the Google Summer of Code project. The goal of this project is to change the website to rely on API calls instead of website-only database queries. It’s focused on the “browse” pages for each object, and might need additions to the API to fully reproduce the functionality. Because I get asked a lot, this is a blog post on what I’d suggest doing if you’re a student interested in the project.

Installing MapProxy

Switching gears, with the database loaded, it’s time to install more software.

OpenStreetMap Carto generates a Mapnik XML stylesheet, which can be used by any software that includes Mapnik. Some of the common options are

Loading the Data

With data downloaded and the style built, the next step is to load the data. Sometimes this scares people, but really shouldn’t. A modern server with the capacity to serve the world will have no problems building the database.

Loading can easily be done on a single CPU server and the RAM needed is less than you want for caching later on.

Add Some Style

Last post ended with downloading OpenStreetMap data. This post will leave the data aside and switch to downloading and building a style. There’s lots of styles available, but we’re going to use OpenStreetMap Carto, the current default on OpenStreetMap.org. Also, because we need software not packaged in Debian, that needs to be installed.

It Starts With the Planet

To do something with OpenStreetMap data, we have to download it first. This can be the entire data from planet.openstreetmap.org or a smaller extract from a provider like Geofabrik. If you’re doing this manually, it’s easy. Just a single command will call curl or wget, or you can download it from the browser. If you want to script it, it’s a bit harder. You have to worry about error conditions, what can go wrong, and make sure everything can happen unattended. So, to make sure we can do this, we write a simple bash script.

Data to Tiles

The most common use for OpenStreetMap data is hosting your own map. If you need up to the minute data, the entire world, and high zooms, this requires a dedicated server running renderd+mod_tile or other specialized software that handles requests. On the other hand, if less frequently updated data and low zooms is all that’s needed, it can make more sense to pre-render tiles and serve them off of an existing server as files from disk.

Over the next few posts, I’m going to be walking through step-by-step on how to generate these files, starting with downloading OpenStreetMap data, and ending up with rendered tiles.

Serving Vector Tiles

If you want to serve vector tiles, there are a few server options that have developed, each with different strengths and weaknesses.

Identifying Slow Rendering Queries

In my last post I took a sample of PostgreSQL queries and loaded them into a table to do some analysis. The next step is to

This log will include queries from osm2pgsql updating the database, but normally we’re only optimizing rendering queries. Mapnik starts all of these with SELECT ST_AsBinary("way") AS geom so we can make a view of just rendering queries with CREATE VIEW render_query_log AS SELECT * FROM query_log WHERE query LIKE 'SELECT ST_AsBinary("way") AS geom%';

There are some tools to turn PostgreSQL queries back into style layers and zoom, but none of them work well on modern complex style queries, so we need to figure out a different way to group queries for the same layers.

Mapnik has four tokens which can appear in a query.

  1. !bbox! is replaced the buffered bounding box of the area being rendered. This text ends up looking like ST_SetSRID('BOX3D(...

  2. !scale_denominator! is replaced by the scale in projected units per pixel. OpenStreetMap Carto doesn’t use this, but it’s often seen in vector tile styles to calculate the zoom level. This is a number, either an integer or floating point depending on zoom.

  3. !pixel_width! and !pixel_height! which are the width and height of a pixel in projected units. These are both numbers, either integers or floating points. They can be zero. In OpenStreetMap carto they are alway seen in the form way_area/NULLIF(!pixel_width!::real*!pixel_height!::real,0) or way_area > [some number]*!pixel_width!::real*!pixel_height!::real.

To get the common part of the query we can cut off the query text at the first occurance of ST_SetSRID or way_area. The former is only inserted by Mapnik and the latter only appears on lines with Mapnik tokens or as part of an `ORDER BY very late in the query.

Splitting the string by these tokens and taking the first part can solve this.

1
2
3
4
SELECT count(*), sum(run_length), (regexp_split_to_array(query, '(way_area|ST_SetSRID)'))[1] AS q
  FROM render_query_log
  GROUP BY q
  ORDER by sum DESC;

SELECT count(*), sum(run_length), query q FROM render_query_log GROUP BY q

regexp_split_to_array(‘select foo from a way_area b seasdfas’, ‘(way_area|ST_SetSRID)’)

Sampling Slow Postgres Queries

Part of optimizing a style or application using PostgreSQL is finding what queries are taking up the most time, and improving them. The first part of this is finding the slow queries.

A popular way to do this is using log_min_duration_statement to log all queries over a certain duration or to use log_statement to log all queries. These both have downsides.

Logging slow queries gives a misleading view on what is using the most resources. A query that takes 100 seconds to run but only runs once a day will use less resource than a 100ms query running every second, but the fast often repeated query might not appear in the logs. This is common with map rendering where low-zoom queries are slow but infrequently run.

Logging all queries is comprehensive but can have a noticeable performance impact. Every query causes a write to the server logs. This can be an unacceptable performance drag on a production server.

Both methods suffer from adjusting log file settings. This means adjusting the server configuration, having the setting apply to all databases, having to remove extraneous information from the logs, and generally being hard to parse. In a shared environment this can be impossible to get done. If you have statements with newlines the newlines make it into the logfile unescaped, making it hard to parse them with normal tools.

Fortunately, there is another method: sampling running queries.