published on 20 April 2020 by Jungle Bus
This study is a comparative review of the data models to store public transport information. It compares the model from the following sources:
Although not limited to transportation, both collaborative projects allow the description of public transport objects.
A GTFS feed is composed of several text files collected in a
file. Here are the main files and the associated notions:
Each text file is a
csv and contains columns to describe
the content and attributes of the objects.
The full description of the model can be found in the GTFS reference.
OpenStreetMap is an open-licensed collaborative geographical database and has three types of element:
node: a single point in space. A stop can be a node.
way: an ordered list of nodes that defines a linear feature or a polygon area. A train platform may be a way.
relation: an ordered list of elements. These elements can be nodes, ways or relations, are called members and can have an optional role. A bus route can be a relation.
Each element has a unique id and a set of
value) to describe its attributes.
Several public transport data models co-exists in OpenStreetMap database. We will mostly consider the one approved by the community in 2011 (sometimes called ptv2 which stands for “Public Transport version 2”) but we will add special notes to point out other frequent models that can also be found in the database.
The full description of the ptv2 model can be found in OpenStreetMap wiki.
Wikidata is an open-licensed collaborative structured database that supports Wikipedia and other projects of the Wikimedia Foundation.
The Wikidata database is made of elements called
uniquely identified by an identifier (a Q followed by a number). Each
one can have a label, a description and any number of aliases. Their
attributes are called
statements and consist of a
(uniquely identified by a P followed by a number) and a value.
A value can be a link to another item, or a proper value (such as text label, number or coordinates for instance). A property can also link to an external dataset (and it is called
Wikidata contains few information about transport, mainly about train, subway and tramway modes. Although it is a structured database, the model is not very standardized and there are many local divergences on the modelling of the information. Moreover, on the contrary to previous databases, original contents such as images (logo, pictograms, line maps, etc.) or photos can be found thanks to the link with Wikimedia Commons.
topo.transport.data.gouv.fr is also a structured database and rely on the same software as Wikidata. It already contains all the French GTFS datasets available in open data.
NeTEx (Network Timetable Exchange) is an European standard for
exchanging public transport data. It relies on Transmodel, another
European standard that is a conceptual data model covering most of the
data domain of public transport.
In practice, we rarely use all the concepts covered by the standard: the stakeholders who are going to exchange data with each other agree on a subset of the schema useful to their needs and in accordance with their local context (this is called a profile).
Data in NeTEx format is encoded as XML documents.
The basic concepts of Transmodel are POINTs (the described entities are represented in capital letters in most normative documents) and LINKs (an oriented spatial object of dimension 1 with view to the overall description of a network, describing a connection between two POINTs). An ordered set of POINTs or LINKs is called a LINK SEQUENCE. These are the generic building blocks of the Public Transport network model.
To account for the fact that data can come from different sources that use the same word to represent different things, the Transmodel data model uses distinct elements to represent the precise semantics of each separate concept in LAYERs, that can be mapped using PROJECTIONs.
The full description of the model can be found in the Transmodel reference.
Most public transport systems run along fixed routes with set embarkation and disembarkation stops.
Let’s take Paris Métro Line 12 as an example.
This line is part of the Paris Métro public transport network. It is operated by RATP, Paris’s public transport company. It links “Mairie d’Issy” station at the South to “Front Populaire” station at the North and is about 15km long.
In the GTFS of the Paris region, you will
find it as a line in the routes.txt file.
Its attributes are:
route_id, which is an identifier for this object
route_long_name: also “12”
route_type: “1”. The GTFS reference maps each authorized values to a public transport mode or type of transportation (“1” is Subway or Metro).
route_text_color. These are used to represent the route in public facing material, the
route_coloris the background color while the text is printed on top of it with the
route_desc(a description of a route),
route_url(an URL of a web page about this route),
route_sort_order(to allow to order the routes in an usual way for presentation to the passengers)
agency_id: This is an identifier referencing an agency object in the agency.txt
In the agency.txt file, this agency has the following attributes:
agency_urlto link to the web page of the agency
If you want to get the tracks of the line (for instance to draw it on a
map), you will need to use some other GTFS files.
The trips.txt file contains the trips of each route, with a
column to map it to its route.
In the GTFS file studied here, we have 2024 trips for the Paris Métro Line 12. Among other attributes, a trip has:
trip_headsignwhich defines the destination of the trip for passenger information
direction_id(0 or 1) to indicate the direction of travel in timetables
shape_idwhich is an identifier to a shape object in the shapes.txt file
The sequence of the stops can be find using the stop_times.txt file, which has identifiers to both the trips.txt and stops.txt files.
In the shapes.txt file, you can find the sequence of geographical points through which the vehicle passes in order for each trip.
In OpenStreetMap, a public transport line is a relation of type
Paris Métro Line 12 is the relation with OpenStreetMap id 7420642.
It has the following tags:
type = route_master: this is a public transport line
route_master = subway: its public transport mode is subway. OpenStreetMap defines a set of values that represents the different types of transportation
ref = 12: the reference by which the service or route is known
name = Métro 12
network = RATPand
operator = RATP: the network tag represents the commercial name displayed for passengers for this network while the operator is the name of the company that operates the service. Most of the time, these tags have different values (for instance, a night bus could have
wikidatatag), an URL, a social media account, etc
A route_master relation contains relations with
A route relation represents the paths taken repeatedly by people and vehicles and contains as members stops (where passengers can embark) and roads taken by the vehicles. They are sometimes called route variants or itinerary.
The closest concept in GTFS is the trip, but as opposed to the trip which has a temporal vision, the route relation has only a geographical meaning. Paris Métro Line 12 route_master relation has two child route relations, one for each direction.
The route relation inherits a lots of attributes from the route_master
relation: it has a route number (in the
ref tag), a public
transport mode (in the
route tag), a network name (in the
network tag), an operator (in the
tag), a route color (in the
colour tag), etc. It has also
a few additional tags:
tothat represents the destination of the route variant
fromto represent its origin
The route relation has two kinds of members: stops and “roads”. Inside the route relation, you will find:
ways: it will be actual street objects for bus route or railway objects for the tramway, subway or train routes
The modeling of the data in OpenStreetMap is far from homogeneous and many variations from this model exist:
type=networkrelation that groups all public transport lines of a same network are sometimes found in addition to or instead of the
networktag on route and route_master relations.
Wikidata data is a structured database and consists mainly of objects linked together by relationship. There are few transport lines in Wikidata and there are many modelling discrepancies.
If we take our example from Paris Métro Line 12:
instance of(property P31) a rapid transit railway line (item Q15079663). If we go up the semantic hierarchy of instances and subclasses, we can find a parent item
transport linefrom which most of the types of transport lines are derived.
part of(P361) property.
terminus(P559) property, with additional qualifiers for the direction.
Finally, many other attributes are available concerning its color (P465), its length (P2043), but also its history ( date of official opening (P1619), etc) or media from the Wikimedia Commons project (route map (P15), logo image (P154), etc).
topo.transport.data.gouv.fr is also a structured database that consists of objects linked together by relationship. The data model is very closed to the GTFS one.
Our Paris Métro Line 12:
instance of(property P3) a route (item Q3).
physical mode(property P8), which is itself an
instance ofa physical mode (item Q2)
imported from). It links to a producer (Q1), which models a GTFS agency.
The main element in the representation of a network in NeTEx and
Transmodel is the ROUTE:
an ordered list of located ROUTE POINTs defining one single logical path.
The ROUTE entity represents an abstract concept and is independent of both the infrastructure and the operational concerns.
A group of ROUTEs which is generally known to the public by a similar
name or number is called a LINE.
LINEs may be grouped into GROUPs OF LINES that can be used to define a public transport network.
The spatial description of the services is then described using the JOURNEY PATTERN entity: an ordered list of STOP POINTs and TIMING POINTs on a single ROUTE.
NeTEx also includes some GIS features (ROAD POINT, ROAD ELEMENT, ROAD RAIL ELEMENT, etc) that can be linked to public transport objects. PROJECTIONs can also be used to link entities with GIS concepts (for instance, a LINK SEQUENCE PROJECTION can be used to set the geometry of the path between two points that are part of a ROUTE).
an example of representation of a line in NeTEx
The main difference of modelling is the agency, which
is a united notion in the GTFS (and as well in topo.transport.data.gouv.fr, whose
model is very similar). Conversely, for the others, the concept is split
between transport network and transport operator.
However, the treatment of these notions is not uniform: it is a notion in its own for the GTFS, Transmodel and Wikidata, while it is only an attribute of the transport lines and stops for OpenStreetMap.
The notion of terminus is present in both
OpenStreetMap and Wikidata. It is close to the GTFS notion of
which is more hidden for the GTFS route (and non-existent in the topo.transport.data.gouv.fr route
Both GTFS and OpenStreetMap use some text for the terminus, while Wikidata go further and use a proper stop entity with an optional text.
The notion of route shape is also very different: non-existent in Wikidata and topo.transport.data.gouv.fr, made up of exact crossing points connected to a road or rail network for OpenStreetMap, or completely decorrelated and made up of a sequence of geographical points for the GTFS. Transmodel allows both types of modelling.
Finally, there is a great diversity of attributes, according to the specificity of each base. We will find the notion of line number and colors in all databases, but Wikidata has a wider variety of attributes (historical data, logos, photos and images of signs or vehicles, passenger numbers, etc.). In addition, both collaborative databases are open-ended and can dynamically propose new attributes on their objects according to the context and needs.
Wikidata mostly contains railway stations.
A typical railway station:
instance of(property P31) a metro station Q928830 (or a railway station Q55488). If we go up the semantic hierarchy of instances and subclasses, we can find a parent item
transportation stopfrom which most of the types of transport stops are derived.
part of(P361) property.
connecting line(property P81).
You can also build the ordered list of the stop within a line using the
adjacent (P197) property. Its uses as qualifier a
line (property P81) and a terminus station with the
In OpenStreetMap, the stop position is the place where the vehicle
usually stops on the rails or on the street. It is an OpenStreetMap
The platform is the place where passengers are waiting for the vehicles. It is an OpenStreetMap
The stop area is the logical combination of all elements of a stop,
with one unique name. It can also contain other elements such as taxi
stops or station buildings. It is an OpenStreetMap
In fact, several data models co-exists in OpenStreetMap for describing stops. The public transport model (also called ptv2) described above has been approved by the community in 2011 but its adoption is far from complete, several years later.
Historically, there were dedicated tags in OpenStreetMap for stops in
each mode and some were platforms and some were stop positions. As the
public transport model was not intended to replace the “old” one, but
only to clarify it, we now usually find these historical tags, sometimes
in addition to the new ones, and it is frequent to only have platforms
or only stop_positions.
Modeling is highly variable across locations and communities, but the following is a summary of what these tags represent:
|metro / train||
The stop area is probably the most contested concept. It sometimes groups together logical elements representing the same object (the platform and the stop position of the same stop), and sometimes several stops.
Stop areas can sometimes be nested to represent multimodal hub,
although there is another attribute sometimes used for this purpose (
It should be noted that, as opposed to the stop area, which is a
logical grouping of stops, there is also physical grouping, represented
by a polygon (closed
way) drawing an area and encompassing
objects: this is used for the outline of subway and train stations, or
for bus stations.
In the GTFS model, the stops are gathered in the stops.txt file. The
kind of stop is represented by the
location_type= 0 represents the most basic stop, which is called platform if inside a station
location_type= 1 is used to represent a station, which is a group of stops
location_type= 2 represents entrances and exits, usually for train or subway stations
location_type= 3 is a generic node, which represents a point used in a pathway (a graph representation of a walkway inside a station)
location_type= 4 is a boarding area (a location on a platform where passenger can board vehicles)
You can link stops between them in hierarchy by using the
attribute, that will hold another
stop_id: entrances and
platforms are linked to their parent station, while boarding areas link
to a platform.
For the stops too, topo.transport.data.gouv.fr’s model is very close to the GTFS one:
location_type: Stop Point, Stop Area, Entrance, Generic Nodes, Boarding Points. Each stop is an
instance of(property P3) some kind of stop
part ofproperty (P15)
In addition, the entities that are stop points (
Stop Point Q12) are also linked (with the
part of P15
property) to their route entities. This is a major difference with the
GTFS, where the link between a stop and a route needs several file
A STOP PLACE represents a station or a physical stop.
Its main component is the QUAY, that is the physical point of access to transport.
It can also contains ENTRANCE, which is a point where a passenger can access a stop place on foot.
It can also contains ACCESS SPACEs, that can be nested to model complex stations. QUAYs and ACCESS SPACES can be connected to each other using PATH LINKs, and ENTRANCEs model both external (for instance for a subway entrance) and internal (for instance from an entrance concourse to a platform).
A STOP ASSIGNMENT can be used to connect a STOP PLACE to a SCHEDULED
STOP POINT (the representation of the stop in a timetable).
There is also a concept of STOP AREA, but it represents a group of SCHEDULED STOP POINTs close to each other.
an example representation of a stop
A simple representation of a pair of bus stops on a route bus stop can
be achieved with a STOP PLACE and 2 QUAYs.
A metro station will also use a STOP PLACE and some QUAYs, but also ENTRANCEs and ACCESS SPACES.
In the modeling of stops, we find the same terminologies (platform, stop, station, stop area, etc) but with different definitions.
There is a differentiation between physical stop and logical stop:
location_typeattributes get them closer to a physical description.
Most of them have the notion of a logical stop area, grouping stops.
Transmodel, OpenStreetMap and GTFS also provide means to represent, physically or logically (in some graphical form), the indoor of stations, with spaces and links to connect entrances and platforms.
Transmodel and GTFS also propose a notion of boarding point on a platform, which is not found in OpenStreetMap.
Although most of the models studied attempt to propose a unified model, there is a certain difference in modelling between modes, which is physically justified: it is difficult to represent a large metro station and a bus stop identically.
The link between the line and the stops is also modeled in a differentiated way:
This document is made available under the conditions of the CC-BY-SA license.