Transport data models comparative review

published on 20 April 2020 by Jungle Bus

This study is a comparative review of the data models to store public transport information. It compares the model from the following sources:

Although not limited to transportation, both collaborative projects allow the description of public transport objects.

General data format description

GTFS format

A GTFS feed is composed of several text files collected in a .zip file. Here are the main files and the associated notions:

Each text file is a csv and contains columns to describe the content and attributes of the objects.

The full description of the model can be found in the GTFS reference.

OpenStreetMap

OpenStreetMap is an open-licensed collaborative geographical database and has three types of element:

Each element has a unique id and a set of tags (key = value) to describe its attributes.

Several public transport data models co-exists in OpenStreetMap database. We will mostly consider the one approved by the community in 2011 (sometimes called ptv2 which stands for “Public Transport version 2”) but we will add special notes to point out other frequent models that can also be found in the database.

The full description of the ptv2 model can be found in OpenStreetMap wiki.

Wikidata

Wikidata is an open-licensed collaborative structured database that supports Wikipedia and other projects of the Wikimedia Foundation.

The Wikidata database is made of elements called items uniquely identified by an identifier (a Q followed by a number). Each one can have a label, a description and any number of aliases. Their attributes are called statements and consist of a property (uniquely identified by a P followed by a number) and a value.
A value can be a link to another item, or a proper value (such as text label, number or coordinates for instance). A property can also link to an external dataset (and it is called identifier).

Wikidata contains few information about transport, mainly about train, subway and tramway modes. Although it is a structured database, the model is not very standardized and there are many local divergences on the modelling of the information. Moreover, on the contrary to previous databases, original contents such as images (logo, pictograms, line maps, etc.) or photos can be found thanks to the link with Wikimedia Commons.

topo.transport.data.gouv.fr

topo.transport.data.gouv.fr is also a structured database and rely on the same software as Wikidata. It already contains all the French GTFS datasets available in open data.

NeTEx format and Transmodel

NeTEx (Network Timetable Exchange) is an European standard for exchanging public transport data. It relies on Transmodel, another European standard that is a conceptual data model covering most of the data domain of public transport.
In practice, we rarely use all the concepts covered by the standard: the stakeholders who are going to exchange data with each other agree on a subset of the schema useful to their needs and in accordance with their local context (this is called a profile).
Data in NeTEx format is encoded as XML documents.

The basic concepts of Transmodel are POINTs (the described entities are represented in capital letters in most normative documents) and LINKs (an oriented spatial object of dimension 1 with view to the overall description of a network, describing a connection between two POINTs). An ordered set of POINTs or LINKs is called a LINK SEQUENCE. These are the generic building blocks of the Public Transport network model.

To account for the fact that data can come from different sources that use the same word to represent different things, the Transmodel data model uses distinct elements to represent the precise semantics of each separate concept in LAYERs, that can be mapped using PROJECTIONs.

The full description of the model can be found in the Transmodel reference.

Geographical description of the route

Most public transport systems run along fixed routes with set embarkation and disembarkation stops.

Let’s take Paris Métro Line 12 as an example.

This line is part of the Paris Métro public transport network. It is operated by RATP, Paris’s public transport company. It links “Mairie d’Issy” station at the South to “Front Populaire” station at the North and is about 15km long.

GTFS

In the GTFS of the Paris region, you will find it as a line in the routes.txt file.
Its attributes are:

In the agency.txt file, this agency has the following attributes:

If you want to get the tracks of the line (for instance to draw it on a map), you will need to use some other GTFS files.
The trips.txt file contains the trips of each route, with a route_id column to map it to its route.
In the GTFS file studied here, we have 2024 trips for the Paris Métro Line 12. Among other attributes, a trip has:

The sequence of the stops can be find using the stop_times.txt file, which has identifiers to both the trips.txt and stops.txt files.

In the shapes.txt file, you can find the sequence of geographical points through which the vehicle passes in order for each trip.

OpenStreetMap

In OpenStreetMap, a public transport line is a relation of type “route_master”.
Paris Métro Line 12 is the relation with OpenStreetMap id 7420642.
It has the following tags:

A route_master relation contains relations with type=route as members.
A route relation represents the paths taken repeatedly by people and vehicles and contains as members stops (where passengers can embark) and roads taken by the vehicles. They are sometimes called route variants or itinerary.
The closest concept in GTFS is the trip, but as opposed to the trip which has a temporal vision, the route relation has only a geographical meaning. Paris Métro Line 12 route_master relation has two child route relations, one for each direction.

The route relation inherits a lots of attributes from the route_master relation: it has a route number (in the ref tag), a public transport mode (in the route tag), a network name (in the network tag), an operator (in the operator tag), a route color (in the colour tag), etc. It has also a few additional tags:

The route relation has two kinds of members: stops and “roads”. Inside the route relation, you will find:

The modeling of the data in OpenStreetMap is far from homogeneous and many variations from this model exist:

Wikidata

Wikidata data is a structured database and consists mainly of objects linked together by relationship. There are few transport lines in Wikidata and there are many modelling discrepancies.

If we take our example from Paris Métro Line 12:

Finally, many other attributes are available concerning its color (P465), its length (P2043), but also its history ( date of official opening (P1619), etc) or media from the Wikimedia Commons project (route map (P15), logo image (P154), etc).

topo.transport.data.gouv.fr

topo.transport.data.gouv.fr is also a structured database that consists of objects linked together by relationship. The data model is very closed to the GTFS one.

Our Paris Métro Line 12:

NeTEx and Transmodel

The main element in the representation of a network in NeTEx and Transmodel is the ROUTE:
an ordered list of located ROUTE POINTs defining one single logical path.

The ROUTE entity represents an abstract concept and is independent of both the infrastructure and the operational concerns.

A group of ROUTEs which is generally known to the public by a similar name or number is called a LINE.
LINEs may be grouped into GROUPs OF LINES that can be used to define a public transport network.

The spatial description of the services is then described using the JOURNEY PATTERN entity: an ordered list of STOP POINTs and TIMING POINTs on a single ROUTE.

NeTEx also includes some GIS features (ROAD POINT, ROAD ELEMENT, ROAD RAIL ELEMENT, etc) that can be linked to public transport objects. PROJECTIONs can also be used to link entities with GIS concepts (for instance, a LINK SEQUENCE PROJECTION can be used to set the geometry of the path between two points that are part of a ROUTE).

an example of representation of a line in NeTEx

Comparative review

The main difference of modelling is the agency, which is a united notion in the GTFS (and as well in topo.transport.data.gouv.fr, whose model is very similar). Conversely, for the others, the concept is split between transport network and transport operator.
However, the treatment of these notions is not uniform: it is a notion in its own for the GTFS, Transmodel and Wikidata, while it is only an attribute of the transport lines and stops for OpenStreetMap.

The notion of terminus is present in both OpenStreetMap and Wikidata. It is close to the GTFS notion of trip_headsign, which is more hidden for the GTFS route (and non-existent in the topo.transport.data.gouv.fr route model).
Both GTFS and OpenStreetMap use some text for the terminus, while Wikidata go further and use a proper stop entity with an optional text.

The notion of route shape is also very different: non-existent in Wikidata and topo.transport.data.gouv.fr, made up of exact crossing points connected to a road or rail network for OpenStreetMap, or completely decorrelated and made up of a sequence of geographical points for the GTFS. Transmodel allows both types of modelling.

Finally, there is a great diversity of attributes, according to the specificity of each base. We will find the notion of line number and colors in all databases, but Wikidata has a wider variety of attributes (historical data, logos, photos and images of signs or vehicles, passenger numbers, etc.). In addition, both collaborative databases are open-ended and can dynamically propose new attributes on their objects according to the context and needs.

Geographical description of the stop or station

Wikidata

Wikidata mostly contains railway stations.
A typical railway station:

You can also build the ordered list of the stop within a line using the adjacent (P197) property. Its uses as qualifier a connecting line (property P81) and a terminus station with the towards (P5051) property.

OpenStreetMap

In OpenStreetMap, the stop position is the place where the vehicle usually stops on the rails or on the street. It is an OpenStreetMap node.
The platform is the place where passengers are waiting for the vehicles. It is an OpenStreetMap node or way.

The stop area is the logical combination of all elements of a stop, with one unique name. It can also contain other elements such as taxi stops or station buildings. It is an OpenStreetMap relation.

In fact, several data models co-exists in OpenStreetMap for describing stops. The public transport model (also called ptv2) described above has been approved by the community in 2011 but its adoption is far from complete, several years later.

Historically, there were dedicated tags in OpenStreetMap for stops in each mode and some were platforms and some were stop positions. As the public transport model was not intended to replace the “old” one, but only to clarify it, we now usually find these historical tags, sometimes in addition to the new ones, and it is frequent to only have platforms or only stop_positions.
Modeling is highly variable across locations and communities, but the following is a summary of what these tags represent:



platform stop_position
bus highway=bus_stop frequent sometimes
metro / train railway=station usual sometimes
tram railway=tram_stop usual frequent
bus boat amenity=ferry_terminal rare frequent

The stop area is probably the most contested concept. It sometimes groups together logical elements representing the same object (the platform and the stop position of the same stop), and sometimes several stops.

Stop areas can sometimes be nested to represent multimodal hub, although there is another attribute sometimes used for this purpose (public_transport=stop_area_group).

It should be noted that, as opposed to the stop area, which is a logical grouping of stops, there is also physical grouping, represented by a polygon (closed way) drawing an area and encompassing objects: this is used for the outline of subway and train stations, or for bus stations.

GTFS

In the GTFS model, the stops are gathered in the stops.txt file. The kind of stop is represented by the location_type attribute.

You can link stops between them in hierarchy by using the parent_station attribute, that will hold another stop_id: entrances and platforms are linked to their parent station, while boarding areas link to a platform.

topo.transport.data.gouv.fr

For the stops too, topo.transport.data.gouv.fr’s model is very close to the GTFS one:

In addition, the entities that are stop points (instance of Stop Point Q12) are also linked (with the part of P15 property) to their route entities. This is a major difference with the GTFS, where the link between a stop and a route needs several file merges.

NeTEx and Transmodel

A STOP PLACE represents a station or a physical stop.

Its main component is the QUAY, that is the physical point of access to transport.

It can also contains ENTRANCE, which is a point where a passenger can access a stop place on foot.

It can also contains ACCESS SPACEs, that can be nested to model complex stations. QUAYs and ACCESS SPACES can be connected to each other using PATH LINKs, and ENTRANCEs model both external (for instance for a subway entrance) and internal (for instance from an entrance concourse to a platform).

A STOP ASSIGNMENT can be used to connect a STOP PLACE to a SCHEDULED STOP POINT (the representation of the stop in a timetable).
There is also a concept of STOP AREA, but it represents a group of SCHEDULED STOP POINTs close to each other.

an example representation of a stop

A simple representation of a pair of bus stops on a route bus stop can be achieved with a STOP PLACE and 2 QUAYs.
A metro station will also use a STOP PLACE and some QUAYs, but also ENTRANCEs and ACCESS SPACES.

Comparative review

In the modeling of stops, we find the same terminologies (platform, stop, station, stop area, etc) but with different definitions.

There is a differentiation between physical stop and logical stop:

Most of them have the notion of a logical stop area, grouping stops.

Transmodel, OpenStreetMap and GTFS also provide means to represent, physically or logically (in some graphical form), the indoor of stations, with spaces and links to connect entrances and platforms.

Transmodel and GTFS also propose a notion of boarding point on a platform, which is not found in OpenStreetMap.

Although most of the models studied attempt to propose a unified model, there is a certain difference in modelling between modes, which is physically justified: it is difficult to represent a large metro station and a bus stop identically.

The link between the line and the stops is also modeled in a differentiated way:

Credits

This comparative review was carried out by Jungle Bus for MobilityData as part of the Mobility Database project.


This document is made available under the conditions of the CC-BY-SA license.
CC-BY-SA

Expand allBack to topGo to bottom