Paul’s Blog

A blog without a good name

Tilekiln Tile Storage

I’m rewriting Tilekiln, tile generation software which leverages PostGIS to allow using established toolchains like osm2pgsql.

Tile storage is a difficult problem. For a tileset going to zoom 14, there are 358 million tiles, and for one going to zoom 15, there are 1.4 billion. Most tiles are smalled, with 80% being about 100 bytes typically, and the largest tiles might be about 1 megabyte.

Tilekiln’s storage must be able to handle these numbers, but also handle incremental minutely updates, and maintenance work like deleting tilesets. A nice to have would be the ability to distribute tilesets easily, but this is not essential.

Options

PMTiles

PMTiles is a file format designed to store an entire tileset in one file. It consists of a directory, which lists offsets for where tiles are within the larger file. Using range requests, any tile can be retrieved in 3 requests in the worst case, while any caching at all will bring this to 2 requests, and typical caching can bring it close to one.

It features de-duplication, both for tiles that are bytewise-indentical, as well as for adjacent offset listings pointing at the same tile.

There is client-side support for some map browser-based display libraries, but most applications will require a server returning conventional that handles conventional z/x/y URLs serving from the PMTiles file. As a fairly new format, support from other applications is limited.

Updating the PMTiles archive in place is possible, because the clients use etags to identify when the archive has changed, invalidating the client-side cache. This means with minutely updates, every one minute, one request from each client will be the worst case, requiring 3 requests. In practice, this doesn’t matter, because for a large tileset, it is impossible to rewrite the entire archive that frequently, as it will take longer than that to write out the complete file.

Pros

  • Generally most space efficient single-file tileset archive format
  • Easy to distribute
  • Can directly serve to some clients

Cons

  • Impossible to minutely update
  • Poor support for the archive format outside of specialized software and browser-based libraries

MBTiles

Like PMTiles, MBTiles is a single-file archive format. It was developed by Mapbox for users to generate tiles and upload them to Mapbox’s servers. It’s format is a SQLite database with tables consisting of tile indexes and tile data data as binary blobs. Because it’s based on SQLite, and has been around for longer, support is wide-spread, with several generation. Browser-based support is limited, and it wasn’t designed with that in mind.

Minutely updates are theoretically possible, but in practice, not a good idea. SQLite databases do not work well with high volumes of concurrent reads and writes, generally requiring all work to go through one process. This requires coupling the generation and serving systems.

Pros

  • Easy to distribute
  • Good support for non-browser clients

Cons

  • Poor minutely support
  • Not suitable for directly serving to browsers

PostgreSQL

Because Tilekiln already requires PostgreSQL, it would be possible to store tiles in it, the same way that MBTiles does.

Pros

  • Supports minutely updates
  • Uses software already required

Cons

  • Custom format
  • Impossible to distribute the archive

Tiles on disk

Instead of an archive format, it’s possible to store tiles on disk as files. This is the most well-established method, and simplest. Tiles can be updated atomically, and serving tiles is just serving files from disk. The downside comes to managing millions or billions of tiny files. File systems are not designed for this, and can have problems with

  • minimum file sizes,
  • inode usage,
  • inodes per directory, and
  • cleaning up tilesets.

In particular, it can take a day or longer to delete a tileset.

Pros

  • Supports minutely updates
  • Simple serving

Cons

  • Does not scale to planet-wide tilesets
  • No archive to distribute

Object stores

A popular approach to store tiles in some form of object store, like S3. All commercial object stores I’ve looked perform badly with large numbers of small objects. While there are sometimes work-arounds for this, their pricing structure generally makes it very expensive to store tiles this way.

Pros

  • Easy to serve out of
  • Supports minutely updates

Cons

  • Very expensive, or requires running your own object store
  • Slow

Tapalcatl 2

Tapalcatl 2 is a system of using zip files to combine tiles, reducing the number of tiles that need to be stored. It is similar to how raster tiles are combined into metatiles, except that the vector tiles are pre-sliced within the zipfile and can contain multiple zooms.

In a typical configuration, there are zip files generated for tiles on zooms 0, 4, 8, and 12. Each zip file contains the “root” tile and then tiles from the next three zooms that lie within it. This means that a zip archive contains 85 tiles, all tiles within a small area. By combining tiles into one zip archive, this reduces the number of files on disk to 16.8 million files, a small enough number to be reasonably managed on disk.

The format hasn’t had a great deal of usage since it was developed, so support is limited to some server-side programs that take tapalcatl archives and present tiles to the user. These server-side programs are known to have some issues, like not supporting updates to remote tapalcatl tilesets.

Updates are possible in two ways. The first is by taking an existing zip file, replacing the changed tiles within it, and generating a new zip file. The second is to completely regenerate all the tiles in the zip file, which is simpler, but involves more tile generation.

Pros

  • Supports minutely updates
  • Allows good decoupling of serving and generation

Cons

  • Limited client support
  • Minutely updates are more complicated

Recommendations

The two options which requires further investigation are PostgreSQL and Tapalcatl 2. Both support updates, but come with downsides.