I’ve been looking at how many tiles are changed when updating OSM data in order to better guide resource estimations, and have completed some benchmarks. This is the technical post with details, I’ll be doing a high-level post later.
Software like Tilemaker and Planetiler is great for generating a complete set of tiles, updated about once a day, but they can’t handle minutely updates. Most users are fine with daily or slower updates, but OSM.org users are different, and minutely updates are critical for them. All the current minutely ways to generate map tiles involve loading the changes and regenerating tiles when data in them may have changed. I used osm2pgsql, the standard way to load OSM data for rendering, but the results should be applicable to other ways including different schemas.
Using the Shortbread schemea from osm2pgsql-themepark I loaded the data with osm2pgsql and ran updates. osm2pgsql can output a list of changed tiles (“expired tiles”) and I did this for zoom 1 to 14 for each update. Because I was running this on real data sometimes an update took longer than 60 seconds to process if it was particularly large, and in this case the next run would combine multiple updates from OSM. Combining multiple updates reduces how much work the server has to do at the cost of less frequent updates, and this has been well documented since 2012, but no one has looked at the impact from combining tiles.
To do this testing I was using a Hezner server with 2x1TB NVMe drives in RAID0, 64GB of RAM, and an Intel i7-8700 @ 3.2 GHz. Osm2pgsql 1.10 was used, the latest version at the time. The version of themepark was equivalent to the latest version
The updates were run for a week from 2023-12-30T08:24:00Z to 2024-01-06T20:31:45Z. There were some interruptions in the updates, but I did an update without expiring tiles after the interruptions so they wouldn’t impact the results.
To run the updates I used a simple shell script
1 2 3 4 5 6 7 8 |
|
Normally I’d set up a systemd service and timer as described in the manual, but this setup was an unusual test where I didn’t want it to automatically restart.
I then used grep to count the number by zoom in each file, creating lists for each zoom.
1 2 3 |
|
This let me use a crude script to get percentiles and the mean, and assemble them into a CSV.
1 2 3 4 5 6 7 8 |
|
A look at the percentiles for zoom 14 immediately reveals some outliers, with a mean of 249 tiles, median of 113, p99 of 6854, and p100 of 101824. I was curious what was making this so large and found the p100 was with sequence number 5880335, which was also the largest diff. This diff was surrounded by normal sized diffs, so it wasn’t a lot of data. The data consumed would have been the diff 005/880/336
A bit of shell magic got me a list of changesets that did something other than add a node: osmium cat 005880336.osc.gz -f opl| egrep -v '^n[[:digit:]]+ v1' | cut -d ' ' -f 4 | sort | uniq | sed 's/c\(.*\)/\1/'
Looking at the changesets with achavi, 145229319 stood out as taking some time to load. Two of the nodes modified were information boards that were part of the Belarus - Ukraine border and Belarus-Russia border. Thus, this changeset changed the Russia, Ukraine, and Belarus polygons. As these are large polygons, only the tiles along the edge were considered dirty, but this is still a lot of tiles!
After validating that the results make sense, I got the following means and percentiles, which may be useful to others.
Tiles per minute, with updates every minute
zoom | mean | p0 | p1 | p5 | p25 | p50 | p75 | p95 | p99 | p100 |
---|---|---|---|---|---|---|---|---|---|---|
z1 | 3.3 | 1 | 2 | 2 | 3 | 3 | 4 | 4 | 4 | 4 |
z2 | 5.1 | 1 | 2.6 | 3 | 4 | 5 | 6 | 7 | 7 | 10 |
z3 | 9.1 | 1 | 4 | 5 | 8 | 9 | 11 | 13 | 15 | 24 |
z4 | 12.8 | 1 | 5 | 7 | 10 | 12 | 15 | 20 | 24 | 52 |
z5 | 17.1 | 1 | 5 | 8 | 13 | 17 | 20 | 28 | 35 | 114 |
z6 | 21.7 | 1 | 6 | 9 | 15 | 21 | 26 | 37 | 48 | 262 |
z7 | 25.6 | 1 | 6 | 9 | 17 | 24 | 31 | 46 | 63 | 591 |
z8 | 29.2 | 1 | 6 | 9 | 17 | 26 | 34 | 55 | 92 | 1299 |
z9 | 34.5 | 1 | 6 | 10 | 18 | 28 | 37 | 64 | 173 | 2699 |
z10 | 44.6 | 1 | 7 | 10 | 20 | 31 | 41 | 80 | 330 | 5588 |
z11 | 65.6 | 1 | 7 | 12 | 23 | 35 | 49 | 125 | 668 | 11639 |
z12 | 111 | 1 | 8 | 14 | 29 | 44 | 64 | 238 | 1409 | 24506 |
z13 | 215 | 1 | 10 | 18 | 40 | 64 | 102 | 527 | 3150 | 52824 |
z14 | 468 | 1 | 14 | 27 | 66 | 113 | 199 | 1224 | 7306 | 119801 |
Based on historical OpenStreetMap Carto data the capacity of a rendering server is about 1 req/s per hardware thread. Current performance is slower, but includes The new OSMF general purpose servers are mid-range servers and have 80 threads, so should be able to render about 4800 tiles per second. This means that approximately 95% of the time the server will be able to complete re-rendering tiles within the 60 seconds between updates. A couple of times an hour it will be slower.
As mentioned earlier, when updates take over 60 seconds, multiple updates combine into one and reduce the amount of work to be done. I simulated this by merging every k
files together. Contuining the theme of patched-together scripts I did this with a shell script, based on StackExchange
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Running the results through the same process for percentiles generates numbers in tiles per update - but updates are half as often, so in terms of work done per time, all the numbers need to be divided by k
. For a few k
, here’s the results.
k=2
zoom | mean | p0 | p1 | p5 | p25 | p50 | p75 | p95 | p99 | p100 |
---|---|---|---|---|---|---|---|---|---|---|
z1 | 1.7 | 0.5 | 1 | 1 | 1.5 | 1.5 | 2 | 2 | 2 | 2 |
z2 | 2.5 | 0.5 | 1 | 1.5 | 2 | 2.5 | 3 | 3.5 | 3.5 | 5 |
z3 | 4.5 | 0.5 | 2 | 2.5 | 4 | 4.5 | 5.5 | 6.5 | 7.5 | 12 |
z4 | 6.4 | 0.5 | 2.5 | 3.5 | 5 | 6 | 7.5 | 10 | 12.5 | 26 |
z5 | 8.6 | 0.5 | 2.5 | 4 | 6.5 | 8.5 | 10 | 14 | 17.5 | 51 |
z6 | 10.9 | 0.5 | 2.9 | 4.5 | 7.5 | 10.5 | 13 | 18.5 | 24.5 | 107 |
z7 | 13.0 | 0.5 | 3 | 4.5 | 8.5 | 12 | 15.5 | 23 | 32 | 239 |
z8 | 14.9 | 0.5 | 3 | 4.5 | 9 | 13 | 17 | 27 | 50 | 535 |
z9 | 17.8 | 0.5 | 3 | 5 | 9.5 | 14 | 18.5 | 32 | 97 | 1127 |
z10 | 24 | 0.5 | 3 | 5 | 10 | 15.5 | 20.5 | 41 | 192 | 2347 |
z11 | 36 | 0.5 | 3.5 | 6 | 11.5 | 17.5 | 24 | 65 | 395 | 4888 |
z12 | 64 | 0.5 | 4 | 7 | 14.5 | 22 | 32 | 120 | 844 | 10338 |
z13 | 120 | 0.5 | 5 | 9 | 20 | 32 | 50 | 265 | 1786 | 22379 |
z14 | 263 | 0.5 | 7 | 14 | 33 | 56 | 99 | 617 | 3988 | 50912 |
k=5
zoom | mean | p0 | p1 | p5 | p25 | p50 | p75 | p95 | p99 | p100 |
---|---|---|---|---|---|---|---|---|---|---|
z1 | 0.66 | 0.20 | 0.40 | 0.40 | 0.60 | 0.60 | 0.80 | 0.80 | 0.80 | 0.80 |
z2 | 1.01 | 0.20 | 0.40 | 0.60 | 0.80 | 1.00 | 1.20 | 1.40 | 1.40 | 2.00 |
z3 | 1.82 | 0.20 | 0.80 | 1.00 | 1.60 | 1.80 | 2.20 | 2.60 | 3.00 | 4.60 |
z4 | 2.54 | 0.20 | 1.00 | 1.40 | 2.00 | 2.40 | 3.00 | 4.00 | 4.80 | 8.00 |
z5 | 3.40 | 0.20 | 1.00 | 1.60 | 2.60 | 3.40 | 4.00 | 5.40 | 7.00 | 18.80 |
z6 | 4.31 | 0.20 | 1.02 | 1.80 | 3.20 | 4.20 | 5.20 | 7.40 | 9.80 | 42.60 |
z7 | 5.08 | 0.20 | 1.20 | 1.80 | 3.40 | 4.80 | 6.20 | 9.20 | 12.60 | 93.60 |
z8 | 5.78 | 0.20 | 1.20 | 1.80 | 3.40 | 5.20 | 6.80 | 11.00 | 18.93 | 206.20 |
z9 | 6.78 | 0.20 | 1.20 | 2.00 | 3.60 | 5.60 | 7.40 | 13.00 | 35.40 | 430.40 |
z10 | 8.73 | 0.20 | 1.40 | 2.00 | 4.00 | 6.20 | 8.20 | 16.40 | 67.48 | 895.20 |
z11 | 12.76 | 0.20 | 1.40 | 2.40 | 4.60 | 7.00 | 9.60 | 25.16 | 150.32 | 1,865.40 |
z12 | 21.60 | 0.40 | 1.60 | 2.80 | 5.80 | 8.80 | 12.80 | 47.00 | 328.89 | 3,932.40 |
z13 | 41.88 | 0.40 | 2.00 | 3.60 | 8.00 | 12.80 | 20.60 | 102.08 | 712.36 | 8,486.80 |
z14 | 91.76 | 0.40 | 2.80 | 5.40 | 13.00 | 22.80 | 40.40 | 239.88 | 1,597.66 | 19,274.40 |
Finally, we can reproduce the Geofabrik graph, looking at tiles per minute with update interval and get approximately work ∝ update ^ -1.05
, where update is the number of minutes between updates. This means combining multiple updates is very effective at reducing load.
What does all this mean?
This has been a lot of numbers, which is useful for someone in my position, but what does this mean at a practical level?
Big updates happen sometimes, which will slow everything down. Even a powerful server will slow down when multiple large country borders need to be regenerated.
As update interval slows down, the tile server has less work to do and can catch up. Updates every 10 minutes involve approximately 5 times less work than minutely updates, so when a particularly large update happens, the server can easily catch up.
A lower-end server capable of 10 tiles/second can still update every 3 minutes or faster 95% of the time, 3-15 minutes 4% of the time, and only 1% of the time fall farther behind.
You probably don’t want to keep a minutely updated tileset running on your laptop.