Paul’s Blog

A blog without a good name

Writing a Shp to Osm Conversion Translation Dictionary

ogr2osm is an amazing tool for converting shapefiles to osm files, but to make use of it you have to write a translation dictionary. The quality of your translation file will determine the quality of your tagging, and how useful your import is. To illustrate this, I am using the example cadLotsSHP from the City of Surrey. This is data for a city of 92 617 lots. Although this data will not be imported into OSM, it is useful to write a conversion as selected lots can be imported for parks and such.

The first step is to identify the fields in the shapefile. This can be done with a GIS tool like QGIS, or by running ogr2osm with no translation file and examining the resulting .osm file with JOSM. It’s important to give JOSM enough memory if the shapefile is large. In this case, we see 25 attributes are used.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
AMANDARSN
BLOCK_NO
CRE_BY
CRE_DATE
DESCRIPTIO
GEODB_OID
GLOBALID
HOOK_CNTR
IN_PLAN
LOT_NO
LOT_TYPE
LOT_TYPE2
MOD_BY
MOD_DATE
MSLINK
NAME
OBJECTID
OWNER_TYPE
PID
PLANIMAGEI
PLAN_NO
PLAN_TYPE
PLAN_YR
REM_FLAG
SHAPE_AREA

Some of these we can immediately identify as not useful for OpenStreetMap as they are internal tracking fields, such as CRE_BY, the user ID of the person who created the feature in the database, or SHAPE_AREA, the area of the shape. The goal here is to eliminate excess information to reduce the number of fields that you have to look at. After some work, we’re left with a dozen fields.

1
2
3
4
5
6
7
8
9
10
11
12
BLOCK_NO
DESCRIPTIO
HOOK_CNTR
IN_PLAN
LOT_NO
LOT_TYPE
LOT_TYPE2
NAME
OWNER_TYPE
PLAN_TYPE
PLAN_YR
REM_FLAG

Experience with the Surrey datasets tells me that LOT_TYPE is a numerical code that has a human-readable version in LOT_TYPE2 so it can be eliminated.

The first field is BLOCK_NO. By selecting random lots and looking at how BLOCK_NO changes, we can see that it is constant in an area but the values have no obvious meaning. We’ll put it, along with IN_PLAN aside and see if the values make sense after we’re done.

To evaluate DESCRIPTIO, we select all, then select DESCRIPTIO in the tag editor, and copy then paste to a text editor. This will give us the field name, followed by the frequencies of the various possible values. We can see that there are 90575 blanks, and the values fall into two classes. The first is descriptions of lots in rivers, and the second is “Park xxx-xx” This might be useful for identifying the use of the lot if we have no other way.

The next field is HOOK_CNTR. The name doesn’t give anything away, but we can see that 92047 lots have it blank, and the rest have it set to a number from 0 to 8. By searching with JOSM’s search we can tell that they tend to be unoccupied lots or parks but this isn’t 100%. This field seems like it won’t be any use.

LOT_NO seems to go with BLOCK_NO, and is another field of no use.

The data for LOT_TYPE2 shows that it has the frequency of Bareland Strata=1417, City Road=42, Early Copy=131, FRPA Foreshore Tenure=51, Provincial Road=4, Standard Lot=89775, Standard Strata=1197

This doesn’t seem to mean anything.

NAME is a promising field, but an inspection of the values shows it’s hard to determine if it should map to name or ref. Some of the values are refs like “33H – Greenbelt” while others are names like “Brookside Park”

OWNER_TYPE is another potentially useful field, with the majority of the values as OWNER_TYPE=PRIVATE

PLAN_YR is the year that the lot was created. This maps to start_date.

REM_FLAG doesn’t seem to map to anything useful.

From our initial 25 fields, we’re now down to three,

1
2
3
NAME
OWNER_TYPE
PLAN_YR

With this, we can write a simple translation file that only looks at these three, to get a simpler .osm file.

We then have to go through each possible value for owner and identify what it should map to in OSM tags.

The first of these, BC Gas, does not map to anything obvious. Some lots are gas substations while others are empty lots. We’ll just set the owner field and nothing else.

The next, BC Hydro is tempting to map to a power key. Unfortunately, lots with this value include rights of way which have train tracks on them, so we just set the owner key.

The three rail values, BC Rail, BN Rail and CN Rail look like they map to landuse=railway and they do, just not very well. There isn’t a great correlation between where the rail yards end and where the property boundaries end, but we’ll set landuse=railway and be careful when using the data.

City land doesn’t tell us anything. These lots tend to be the ones that are more interesting and are open areas, but there’s all sort of different areas covered by this value. A review on taginfo shows that the practice when setting the owner is to set it to the name of the city, in this case “City of Surrey”.

Federal land is an odd mix of rail bridges and other lots in the middle of the river. We’ll completely ignore it.

GVRD owned land tends to be open areas or areas associated with water infrastructure, so we can only map this to owner.

Harbour Board land is port associated industrial land, so we can add landuse=industrial here.

We finally reach the four types of parks, city dedicated, city purchased, provincial, and regional. To look at only these we search for owner:park in JOSM and merge these to a new layer, then hide the old layer. This process takes awhile, so grab a drink while it merges.

There is no clear difference between city dedicated and purchased parks.

Provincial parks corresponds to the Peace Arch Park and the wooded area beside Highway 1, across the city.

Regional corresponds to large nature reserve type parks, so we’ll map it as that and all the other park types as leisure=park

Provincial is another owner that we just map to owner and can’t derive any more information from.

Road -City and Road – Provincial are another two values that don’t tell anything useful, so we completely ignore them.

School tells us that the land belongs to a school, which is actually useful information.

The last value, Transit, tells us nothing useful. We just set the owner to Translink.

After having finished owner, we now have to consider name vs. ref. It looks like whatever is a ref starts with a number, while names don’t.

At this point, I’ve been writing the translation file for about three to four hours, so I find it’s best to take a break and come back later to review the work with a fresh mind.

After reviewing the next day and finding no mistakes, I’m ready to covert the shapefile and start using it as a background layer with JOSM.

To write this dictionary took about four hours of research and work. Admittedly property lot data is complicated but my experience with the Surrey datasets helped reduce the time required. Anyone contemplating an import needs to be prepared to dedicate significant time to writing a dictionary and working out the tagging. Even the simplest of shapefiles needs careful consideration.

If I were proposing to import this data set I’d want to include example tags for each feature type in the data source in my import proposal so the community can inspect the tagging. This is a step many imports omit, resulting in questionable tagging.