Sanitizing GPX files for public sharing

GPX is a popular XML format for running or cycling tracks with geocoordinates. This is a how-to for cleaning up a GPX file by removing unwanted or privacy-sensitive information.

What’s in a GPX file?

Many apps that record workout routes and can export them as GPX files include more data than the plain GPS coordinates. For instance, a GPX file from my favorite recording app, Guru Maps, looks like this:

<?xml version="1.0" encoding="utf-8"?>
<gpx version="1.1" creator="Guru Maps/4.5.2" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns="http://www.topografix.com/GPX/1/1" 
  xmlns:gom="https://gurumaps.app/gpx/v2" 
  xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd https://gurumaps.app/gpx/v2 https://gurumaps.app/gpx/v2/schema.xsd">
  <trk>
    <name>Barnimer Dörferweg</name>
    <type>TrackStyle_FF7F00C8</type>
    <trkseg>
      <trkpt lat="52.6254614634" lon="13.4092010169">
        <ele>54.238586451</ele>
        <time>2020-05-10T05:30:38.997Z</time>
        <hdop>4.6875</hdop>
        <vdop>3.375</vdop>
        <extensions>
          <gom:speed>5.5661926282</gom:speed>
          <gom:course>329.1938658731</gom:course>
        </extensions>
      </trkpt><!-- thousands of track points -->

This track includes the following properties for each track point:

  • Geocoordinates (latitude and longitude)
  • Elevation
  • Timestamp
  • Horizontal and vertical dilution of precision (hdop/vdop)
  • Current speed
  • Current course/heading

Plus, Guru Maps uses the track’s <type> attribute to encode the color of the track as displayed in the app in a non-standardized format (TrackStyle_FF7F00C8).

Some apps also include heart rate or other fitness measurements.

Stripping out unwanted data

All this data is useful for archiving tracks or importing them into another app. But before sharing this track publicly, I’d want to clean the data up first:

  • The only truly important pieces of information are the coordinates and possibly the elevation.
  • Timestamps are private data. I don’t want to share those.
  • The other measurements are largely irrelevant.

GPX files can become pretty large (thousands of track points is common), so reducing the amount of data is also good for file sizes and parsing performance.

Requirements

  1. XmlStarlet

    I use Xml to do most of the XML processing. On macOS, you can install XMLStarlet via Homebrew:

    brew install xmlstarlet
    
  2. xmllint

    One optional processing step uses xmllint, which comes preinstalled on macOS.

  3. XSLT file for removing unused namespaces

    Finally, download this XSLT file remove-unused-namespaces.xslt, either from this Gist or from my server. We’re gonna use it in one processing step to strip unused namespaces from the GPX file.

    Original source: Dimitre Novatchev on Stack Overflow.

Running the command

Assuming your source file is named input.gpx and the XSLT file you downloaded above is in the current directory, this is the full command to process the GPX file and save the result to output.gpx:

xmlstarlet ed \
  -d "//_:extensions" \
  -d "/_:gpx/_:metadata/_:time" \
  -d "/_:gpx/_:trk/_:type" \
  -d "//_:trkpt/_:time" \
  -d "//_:trkpt/_:hdop" \
  -d "//_:trkpt/_:vdop" \
  -d "//_:trkpt/_:pdop" \
  -u "/_:gpx/@creator" -v "Shell script" \
  input.gpx \
  | xmlstarlet tr remove-unused-namespaces.xslt - \
  | xmlstarlet ed -u "/_:gpx/@xsi:schemaLocation" -v "http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd" \
  | xmllint --c14n11 --pretty 2 - \
  > output.gpx

This sequence performs the following steps:

  • Delete all <extensions> elements.
  • Delete the timestamp from the file’s <metadata> section if present.
  • Delete the <trk><type> element.
  • Delete the <time>, <hdop>, <vdop>, and <pdop> elements from all track points.
  • Set the file’s creator attribute.
  • Now that extension fields are gone, remove all unused XML namespaces from the file header.
  • Delete all xsi:schemaLocation entries except the one for the GPX schema.
  • Run the file through xmllint for formatting. The --c14n11 option performs XML Canonicalization (C14N). Among many other things, canonicalization replaces numeric character entities in the XML with their normal Unicode characters, which is important for my use case.

    For example, the text “Dörferweg” in the source would become “Dörferweg”. I found that some of the tools I use insert non-ASCII characters as numeric codes and other tools don’t display those correctly.

The processed GPX file looks like this:

<gpx xmlns="http://www.topografix.com/GPX/1/1" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  creator="Shell script" version="1.1" 
  xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd">
  <trk>
    <name>Barnimer Dörferweg</name>
    <trkseg>
      <trkpt lat="52.6254614634" lon="13.4092010169">
        <ele>54.238586451</ele>
      </trkpt>
      <trkpt lat="52.6255090307" lon="13.4091548326">
        <ele>53.9600219977</ele>
      </trkpt>

The processing steps above are the ones that work for me given the apps I use. Your mileage may vary if your tools add other data to your GPX files. Feel free to edit the command accordingly. XmlStarlet uses XPath syntax to select which elements to operate on. The xmlstarlet sel command is useful for inspecting a source file and trying out the required XPath incantations.

Validation

Finally, it’s a good idea to validate the processed GPX file against the official GPX schema:

xmlstarlet val --quiet --err --xsd \
  http://www.topografix.com/GPX/1/1/gpx.xsd \
  output.gpx

Happy processing!


PS: If you’re ever in Berlin, this is a nice long bike route (55 km) with minimal car traffic. Starts and ends at Hauptbahnhof. Download the (sanitized) GPX file.