Paul’s Blog

A blog without a good name

A Simpler Shapefile Conversion

After my last marathon shapefile conversion, I thought a simpler example might be useful as to how to write a translation. To do so, I’m going to use the Surrey parkNaturalAreasSHP shapefile. This shapefile contains data on forested areas in Surrey parks.

My first step was establishing which shapefile fields might be useful. To do so, I ran ogr2osm on the shapefile with no translation. I got back 12 fields, AGE_CLASS, AREA_TYPE, CRE_BYE, CRE_DATE, EQ_NO, GEODB_OID, GLOBALID, MOD_BY, MOD_DATE, OBJECTID, PARK_NAME and SHAPE_AREA. Of these, the potentially interesting ones are AGE_CLASS, AREA_TYPE and PARK_NAME.

The first of these, AGE_CLASS, has four possible values which JOSM reports as {=104, Mature=1100, N/A=6, Young=214}. This appears to be the age of the stand of trees. Unfortunately there’s no corresponding tagging for this information, but there might be some day. My preferred method for this type of information is to add it in the surrey: namespace, in this case as surrey:wood_age=.

The second, AREA_TYPE has five possible values, {Forest-coniferous=161, Forest-deciduous=542, Forest-mixed=610, Grassland=47, Shrubland=64}.

There is debate over how to tag forests. Without going into excessive detail, I am of the school of thought that forested areas in parks should be tagged as natural=wood. This is consistent with what I’ve seen most other local mappers do, and is Approach 2 on the wiki. After checking with the wiki for how to tag coniferous vs. deciduous forests, I map the three forest types to natural=wood and wood=coniferous/deciduous/mixed.

I now have to consider how to tag AREA_TYPE=Grassland. Two obvious options are landuse=grass and landuse=meadow. After looking at 2010 on-leaf and 2008 off-leaf imagery, I can conclude that the areas are unmanaged and contain a mix of plants and are best described as a meadow. It helps to be very familiar with your imagery to make a determination like this without a site survey.

The last value is AREA_TYPE=Shrubland. Here it could be natural=heath or natural=scrub. Of these, I lean towards natural=scrub based on the imagery. There’s no clear dividing line between the two, but I see enough bush coverage that I decide scrub is more appropriate.

The last of the fields is PARK_NAME. This could be used to determine the name of a park if I had no other way, but I have received permission to copy from Surrey’s online maps which include park names. In addition, the lot shapefile I converted previously has park names.

An important theme in both this translation and the property lot translation is to strip off excess fields that are not relevant to OSM. Too many imports want to import all the fields, leading to unwieldy tagging schemes that are intimidating to mappers.