Avecado Benchmarking - Paul’s Blog

I’ve written about how to set up Avecado and how to run avecado_server to serve tiles, but haven’t gotten as far as generating benchmark results, the goal of my work.

Avecado_server serves tiles over HTTP, allowing the use of HTTP benchmarking tools. There are many specialized tools, but for the simple needs here, curl and GNU Parallel work and are simple to use.

The best practice for benchmarking is to have the list of tiles in a file, so that they are generated in the same order and to make it easy to either generate all tiles or a pre-defined list.

For this benchmark, I’m using a list of all medium-low zoom tiles. This can be generated with a set of nested for loops.

for z in `seq 5 6`; do
  for x in `seq 0 $((2**$z))`; do
    for y in `seq 0 $((2**$z))`; do
      echo "$z/$x/$y" >> tiles.txt
    done
  done
done

These loops build a list of medium-low zoom tiles to render. With the offset of three zoom levels introduced before, the list is equivalent to zoom 8 and 9 meta-tiles. Tiles of a lower zoom than this are not well suited for benchmarking because there are so few of them and they take so long individually that the benchmarks are highly variable.

Fetching the tiles

GNU Parallel can be used to manage fetching the tiles. On Ubuntu this can be installed with sudo apt-get install parallel. It is then just a matter of constructing a suitable parallel command.

time parallel -a tiles.txt -j8 --progress curl -s -o /dev/null "http://localhost:8080/{}.pbf"

This command tells parallel to start up instances of curl to fetch the vector tiles, not save the result, keep 8 downloads running at a time, and display a progress meter. After all the tiles are downloaded the time taken is printed.

A constant complication with benchmarking is caching. Normally if when a SQL query is run or a tile rendered twice it is faster the second time, as the data is cached in RAM. This tends to be less of an issue with low-zoom tiles than high-zoom ones, but still needs to be accounted for. There are several techniques for this, but the one best suited here is to run the benchmark multiple times and discard the results of the first run.

parallel -a tiles.txt -j8 --progress curl -s -o /dev/null "http://localhost:8080/{}.pbf"
time parallel -a tiles.txt -j8 --progress curl -s -o /dev/null "http://localhost:8080/{}.pbf"

Running this, I get a time of 18 minutes 47 seconds, or 1127 seconds.

Measuring variability

It’s important to know the variability of a benchmark. This is done by repeating the same test multiple times, in this case 25 times, resulting in an average of 1231 seconds and a standard deviation of 54 seconds, or 4.3%. This is not great, but reasonable given the small number of tiles and their low zoom.

Testing a custom index

The time to complete a benchmark run will vary with server hardware, database setup, OS setup, data loaded, and other factors so a single time by itself isn’t very useful. By changing something and testing again, we can identify if that change was an improvement. One simple change is adding a custom partial index.

A partial index is one where instead of indexing an entire table, it only indexes a subset of it, making the index smaller, faster to use, and one that will return fewer rows, eliminating some disk fetches and subsequent filtering of rows. If a query has a WHERE clause which matches with that of the index then PostgreSQL can make use of the partial index.

OpenStreetMap-Carto has a number of layers used for labeling that have the name IS NOT NULL in a WHERE clause in their SQL. Of the 167 million polygons in the planet database, only 4.8 million or 3% have names and match the condition. A partial index can be created for these named polygons with

CREATE INDEX planet_osm_polygon_name_index 
  ON planet_osm_polygon 
  USING gist ( way ) 
  WITH (fillfactor=100) 
  WHERE name IS NOT NULL;

This index is 0.5GB instead of the 17GB the normal polygon index is, so it’s reasonable to expect significant speed improvements for those layers, resulting in overall gains. The entire point of benchmarking is to verify that what we think changes speed actually will. Running the benchmark four times and discarding the first run gives an average time of 1013 seconds, 18% faster than before.

This is a speed increase showing the value of partial indexes, particularly on low-middle zoom tiles.