Paul’s Blog

A blog without a good name

Self-hosted Vector Tiles and Tangram

I’ve been experimenting with generating my own vector tiles and client-side rendering with Tangram in order to figure out how to best write a style in its language.

Tangram is a GL-based renderer written by Mapzen and normally used with their Tilezen vector tiles, but I’m interested in being able to make my own vector tiles with different cartographic choices.

Improving Rendering Speed With Reclustering

Part of normal database maintenance is to rebuild indexes and recluster tables, but this often gets ignored on rendering servers.

Antidotal reports report a speed increase of 25%–50% from reclustering, and this can be done without shutting down the rendering server. The overall plan is to create a new copy of the tables, build new indexes, then replace the old tables.

OpenStreetMap Active Users

Periodically people make the claim of over 2 million active users for OpenStreetMap, but what this mean? This is the total number of accounts, including those who never edited, those who left long ago, spammers, and actual active contributors.

The closest metric to a standard is active users over the last 30 days. Although we can’t get that number, we can look at the changeset dump and analyze it with ChangesetMD and some SQL.

New Drive Testing

I had to buy a new hard drive for my array recently, which meant verifying that it works before I put it into service.

I don’t do burn-in tests of drives. Drives have a bathtub curve for reliability, like most components, but I find that if a drive is failing, it will start exhibiting performance problems that a thorough testing reveals.

Running a sufficiently long burn-in is increasingly impractical. A burn-in would probably involve writing and reading everywhere on the disk multiple times, and disks have been getting bigger and bigger. Denser platters help with sequential speeds, but I’d estimate it would take several days to burn in a new drive.

OpenStreetMap Carto Complexity

I often refer to OpenStreetMap Carto as the largest most complex open multi-contributor map style, but what does that mean?

Broken down, it means

  • It’s the largest open stylesheet. If you measure in code size, features rendered, or complexity, nothing else is close;

  • It’s the largest multi-contributor map style that doesn’t have a company dictating what is worked on. This means we get merge conflicts. They got so bad we changed the technology we use to define layers to make them solvable; and

  • It’s the largest style using OpenStreetMap data. Some proprietary styles like OpenCycleMap, MapQuest Open, and Mapbox Streets are complex, but none of them render the range of features we do.

More OpenStreetMap Futures

Andy recently blogged the developer numbers from his OpenStreetMap Futures talk at SOTM US.

Wanting to play with the numbers myself, I took the osm100 code and added in additional projects. The original list of repos came from a list of “Core Software” from the Engineering Working Group, and since then some of the software has been replaced, and there’s other older software which used to be core, but isn’t.

Optimizing Osm2pgsql CXXFLAGS

When testing an osm2pgsql bug report I did some testing of osm2pgsql node parsing speed and various CXXFLAGS. CXXFLAGS is a variable that can tell the compiler to apply various optimizations when compiling the code, potentially resulting in speed increases.

The two flags I tested were -O and -march. There are countless others, and many that are not optimization related, but these two are all that need to be adjusted. More flags are not generally useful.

Running the Redaction Bot

The redaction bot is a piece of software designed to remove legally problematic data from OpenStreetMap, generally data that violates copyright.

It’s not often needed, as OpenStreetMap users are careful about copyright, but it is available if needed. Requiring special knowledge and API permissions, a grand total of one person runs it regularly – me. This post might not be the most interesting for others…

Real Performance Numbers for a Style

The tile list used for performance testing a stylesheet is critical. It needs to represent a realistic mix of zooms and tile complexity, and be large enough to have reasonable caching behavior. The best source for a tile list is logs from a real rendering server.