I’m rewriting Tilekiln, tile generation software which leverages PostGIS to allow using established toolchains like osm2pgsql.
Tile storage is a difficult problem. For a tileset going to zoom 14, there are 358 million tiles, and for one going to zoom 15, there are 1.4 billion. Most tiles are smalled, with 80% being about 100 bytes typically, and the largest tiles might be about 1 megabyte.
Tilekiln’s storage must be able to handle these numbers, but also handle incremental minutely updates, and maintenance work like deleting tilesets. A nice to have would be the ability to distribute tilesets easily, but this is not essential.
Options
PMTiles is a file format designed to store an entire tileset in one file. It consists of a directory, which lists offsets for where tiles are within the larger file. Using range requests, any tile can be retrieved in 3 requests in the worst case, while any caching at all will bring this to 2 requests, and typical caching can bring it close to one.
It features de-duplication, both for tiles that are bytewise-indentical, as well as for adjacent offset listings pointing at the same tile.
There is client-side support for some map browser-based display libraries, but most applications will require a server returning conventional that handles conventional z/x/y URLs serving from the PMTiles file. As a fairly new format, support from other applications is limited.
Updating the PMTiles archive in place is possible, because the clients use etags to identify when the archive has changed, invalidating the client-side cache. This means with minutely updates, every one minute, one request from each client will be the worst case, requiring 3 requests. In practice, this doesn’t matter, because for a large tileset, it is impossible to rewrite the entire archive that frequently, as it will take longer than that to write out the complete file.
Pros
- Generally most space efficient single-file tileset archive format
- Easy to distribute
- Can directly serve to some clients
Cons
- Impossible to minutely update
- Poor support for the archive format outside of specialized software and browser-based libraries
Like PMTiles, MBTiles is a single-file archive format. It was developed by Mapbox for users to generate tiles and upload them to Mapbox’s servers. It’s format is a SQLite database with tables consisting of tile indexes and tile data data as binary blobs. Because it’s based on SQLite, and has been around for longer, support is wide-spread, with several generation. Browser-based support is limited, and it wasn’t designed with that in mind.
Minutely updates are theoretically possible, but in practice, not a good idea. SQLite databases do not work well with high volumes of concurrent reads and writes, generally requiring all work to go through one process. This requires coupling the generation and serving systems.
Pros
- Easy to distribute
- Good support for non-browser clients
Cons
- Poor minutely support
- Not suitable for directly serving to browsers
PostgreSQL
Because Tilekiln already requires PostgreSQL, it would be possible to store tiles in it, the same way that MBTiles does.
Pros
- Supports minutely updates
- Uses software already required
Cons
- Custom format
- Impossible to distribute the archive
Tiles on disk
Instead of an archive format, it’s possible to store tiles on disk as files. This is the most well-established method, and simplest. Tiles can be updated atomically, and serving tiles is just serving files from disk. The downside comes to managing millions or billions of tiny files. File systems are not designed for this, and can have problems with
- minimum file sizes,
- inode usage,
- inodes per directory, and
- cleaning up tilesets.
In particular, it can take a day or longer to delete a tileset.
Pros
- Supports minutely updates
- Simple serving
Cons
- Does not scale to planet-wide tilesets
- No archive to distribute
Object stores
A popular approach to store tiles in some form of object store, like S3. All commercial object stores I’ve looked perform badly with large numbers of small objects. While there are sometimes work-arounds for this, their pricing structure generally makes it very expensive to store tiles this way.
Pros
- Easy to serve out of
- Supports minutely updates
Cons
- Very expensive, or requires running your own object store
- Slow
Tapalcatl 2 is a system of using zip files to combine tiles, reducing the number of tiles that need to be stored. It is similar to how raster tiles are combined into metatiles, except that the vector tiles are pre-sliced within the zipfile and can contain multiple zooms.
In a typical configuration, there are zip files generated for tiles on zooms 0, 4, 8, and 12. Each zip file contains the “root” tile and then tiles from the next three zooms that lie within it. This means that a zip archive contains 85 tiles, all tiles within a small area. By combining tiles into one zip archive, this reduces the number of files on disk to 16.8 million files, a small enough number to be reasonably managed on disk.
The format hasn’t had a great deal of usage since it was developed, so support is limited to some server-side programs that take tapalcatl archives and present tiles to the user. These server-side programs are known to have some issues, like not supporting updates to remote tapalcatl tilesets.
Updates are possible in two ways. The first is by taking an existing zip file, replacing the changed tiles within it, and generating a new zip file. The second is to completely regenerate all the tiles in the zip file, which is simpler, but involves more tile generation.
Pros
- Supports minutely updates
- Allows good decoupling of serving and generation
Cons
- Limited client support
- Minutely updates are more complicated
Recommendations
The two options which requires further investigation are PostgreSQL and Tapalcatl 2. Both support updates, but come with downsides.