Transport data models
comparative review
published on 20 April 2020 by Jungle Bus
This study is a comparative review of the data models to store public
transport information. It compares the model from the following sources:
- The GTFS, a common format to describe public
transport objects and schedules.
- NeTEx (Network Timetable Exchange), a European
standard for exchanging public transport data.
- topo.transport.data.gouv.fr, an
experiment from transport.data.gouv.fr, the French national
access point for transit data.
The project has many objectives in common with the Mobility Database.
It consists of a structured database that stores topological transport
data. The purpose is to be able to link datasets from different
operators or from different sources.
- OpenStreetMap (OSM), a collaborative project to
create a geographical database of the physical world under free
license.
- Wikidata, a collaborative structured knowledge
database released under free license.
Although not limited to transportation, both collaborative projects
allow the description of public transport objects.
A GTFS feed is composed of several text files collected in a .zip
file. Here are the main files and the associated notions:
- agency.txt: a transit brand or transit agency.
- stops.txt: stops where vehicles pick up or drop off riders. This
file also defines stations and station entrances.
- routes.txt: a route is a group of trips that are displayed to
passengers as a single service.
- trips.txt: a trip is a sequence of two or more stops that occur
during a specific time period.
- stop_times.txt, calendar.txt, calendar_dates.txt, frequencies.txt:
these files describe the schedules.
- shapes.txt: the vehicle travel paths.
- and a few other optional files that will not be studied here.
Each text file is a csv
and contains columns to describe
the content and attributes of the objects.
The full description of the model can be found in the GTFS reference.
OpenStreetMap
OpenStreetMap is an open-licensed collaborative geographical database
and has three types of element:
node
: a single point in space. A stop can be a node.
way
: an ordered list of nodes that defines a linear
feature or a polygon area. A train platform may be a way.
relation
: an ordered list of elements. These elements
can be nodes, ways or relations, are called members and can have an
optional role. A bus route can be a relation.
Each element has a unique id and a set of tags
(key
= value
) to describe its attributes.
Several public transport data models co-exists in OpenStreetMap
database. We will mostly consider the one approved by the community in
2011 (sometimes called ptv2 which stands for “Public Transport version
2”) but we will add special notes to point out other frequent models
that can also be found in the database.
The full description of the ptv2 model can be found in OpenStreetMap wiki.
Wikidata
Wikidata is an open-licensed collaborative structured database that
supports Wikipedia and other projects of the Wikimedia Foundation.
The Wikidata database is made of elements called items
uniquely identified by an identifier (a Q followed by a number). Each
one can have a label, a description and any number of aliases. Their
attributes are called statements
and consist of a property
(uniquely identified by a P followed by a number) and a value.
A value can be a link to another item, or a proper value (such as text
label, number or coordinates for instance). A property can also link to
an external dataset (and it is called identifier
).
Wikidata contains few information about transport, mainly about train,
subway and tramway modes. Although it is a structured database, the
model is not very standardized and there are many local divergences on
the modelling of the information. Moreover, on the contrary to previous
databases, original contents such as images (logo, pictograms, line
maps, etc.) or photos can be found thanks to the link with Wikimedia
Commons.
topo.transport.data.gouv.fr
is also a structured database and rely on the same software as Wikidata.
It already contains all the French GTFS datasets available in open data.
NeTEx (Network Timetable Exchange) is an European standard for
exchanging public transport data. It relies on Transmodel, another
European standard that is a conceptual data model covering most of the
data domain of public transport.
In practice, we rarely use all the concepts covered by the standard: the
stakeholders who are going to exchange data with each other agree on a
subset of the schema useful to their needs and in accordance with their
local context (this is called a profile).
Data in NeTEx format is encoded as XML documents.
The basic concepts of Transmodel are POINTs (the described entities are
represented in capital letters in most normative documents) and LINKs
(an oriented spatial object of dimension 1 with view to the overall
description of a network, describing a connection between two POINTs).
An ordered set of POINTs or LINKs is called a LINK SEQUENCE. These are
the generic building blocks of the Public Transport network model.
To account for the fact that data can come from different sources that
use the same word to represent different things, the Transmodel data
model uses distinct elements to represent the precise semantics of each
separate concept in LAYERs, that can be mapped using PROJECTIONs.
The full description of the model can be found in the Transmodel reference.
Geographical description of
the route
Most public transport systems run along fixed routes with set
embarkation and disembarkation stops.
Let’s take Paris Métro Line 12 as an example.
This line is part of the Paris Métro public transport network. It is
operated by RATP, Paris’s public transport company. It links “Mairie
d’Issy” station at the South to “Front Populaire” station at the North
and is about 15km long.
GTFS
In the GTFS of the Paris region, you will
find it as a line in the routes.txt file.
Its attributes are:
- a
route_id
, which is an identifier for this object
- a
route_short_name
: “12”
- a
route_long_name
: also “12”
- a
route_type
: “1”. The GTFS reference maps each
authorized values to a public transport mode or type of transportation
(“1” is Subway or Metro).
- a
route_color
and a route_text_color
.
These are used to represent the route in public facing material, the route_color
is the background color while the text is printed on top of it with
the route_text_color
color.
- other optional attributes are
route_desc
(a
description of a route), route_url
(an URL of a web page
about this route), route_sort_order
(to allow to order
the routes in an usual way for presentation to the passengers)
- an
agency_id
: This is an identifier referencing an
agency object in the agency.txt
In the agency.txt file, this agency has the following attributes:
- an
agency_name
: “METRO”
- an
agency_url
to link to the web page of the agency
- and a few other attributes
If you want to get the tracks of the line (for instance to draw it on a
map), you will need to use some other GTFS files.
The trips.txt file contains the trips of each route, with a route_id
column to map it to its route.
In the GTFS file studied here, we have 2024 trips for the Paris Métro
Line 12. Among other attributes, a trip has:
- a
trip_headsign
which defines the destination of the
trip for passenger information
- a
direction_id
(0 or 1) to indicate the direction of
travel in timetables
- a
shape_id
which is an identifier to a shape object in
the shapes.txt file
The sequence of the stops can be find using the stop_times.txt file,
which has identifiers to both the trips.txt and stops.txt files.
In the shapes.txt file, you can find the sequence of geographical
points through which the vehicle passes in order for each trip.
OpenStreetMap
In OpenStreetMap, a public transport line is a relation of type
“route_master”.
Paris Métro Line 12 is the relation with OpenStreetMap id 7420642.
It has the following tags:
type = route_master
: this is a public transport line
route_master = subway
: its public transport mode is
subway. OpenStreetMap defines a set of values that represents the
different types of transportation
ref = 12
: the reference by which the service or route
is known
name = Métro 12
network = RATP
and operator = RATP
: the
network tag represents the commercial name displayed for passengers
for this network while the operator is the name of the company that
operates the service. Most of the time, these tags have different
values (for instance, a night bus could have network=Noctilien
and operator=RATP
)
- a background color set in the
colour
tag
- other additional attributes can be set such as a wikidata id (
wikidata
tag), an URL, a social media account, etc
A route_master relation contains relations with type=route
as members.
A route relation represents the paths taken repeatedly by people and
vehicles and contains as members stops (where passengers can embark) and
roads taken by the vehicles. They are sometimes called route variants or
itinerary.
The closest concept in GTFS is the trip, but as opposed to the trip
which has a temporal vision, the route relation has only a geographical
meaning. Paris Métro Line 12 route_master relation has two child route
relations, one for each direction.
The route relation inherits a lots of attributes from the route_master
relation: it has a route number (in the ref
tag), a public
transport mode (in the route
tag), a network name (in the
network
tag), an operator (in the operator
tag), a route color (in the colour
tag), etc. It has also
a few additional tags:
to
that represents the destination of the route variant
from
to represent its origin
The route relation has two kinds of members: stops and “roads”. Inside
the route relation, you will find:
- the ordered list of the stops for this route variant. There are two
kinds of stops: platforms, that represents locations where passengers
waits for the vehicle, and stop_positions that are spots on the road
or rail where the vehicle actually stop.
- the “roads” followed by the vehicles as OpenStreetMap
ways
:
it will be actual street objects for bus route or railway objects for
the tramway, subway or train routes
The modeling of the data in OpenStreetMap is far from homogeneous and
many variations from this model exist:
- It is usual to only have platforms or only have stop positions in a
given route relation, even when both objects actually exist in the
database.
- Sometimes, the route_master relation does not exist. It is indeed a
quite unusual object in OpenStreetMap database, as it is a grouping of
grouping of physical objects, and is very confusing especially for
beginners OpenStreetMap contributors.
- You can still find some route relation containing all line variants
objects (stops from both directions and all road or rail ways). Before
the ptv2 adoption, this was a common way of mapping public transport
lines and some are still to be updated using the new model.
- Although not a recommended practice,
type=network
relation that groups all public transport lines of a same network are
sometimes found in addition to or instead of the network
tag on route and route_master relations.
Wikidata
Wikidata data is a structured database and consists mainly of objects
linked together by relationship. There are few transport lines in
Wikidata and there are many modelling discrepancies.
If we take our example from Paris Métro Line 12:
- it has unique and stable id
Q50757
.
- This is an
instance of
(property P31) a rapid transit
railway line (item Q15079663). If we go up the semantic hierarchy of
instances and subclasses, we can find a parent item transport
line
from which most of the types of transport lines are
derived.
- the transport operator is an item that is a transport company.
- the notion of network is found in the
part of
(P361)
property.
- terminus stations are also found with the
terminus
(P559) property, with additional qualifiers for the direction.
Finally, many other attributes are available concerning its color
(P465), its length (P2043), but also its history ( date of official
opening (P1619), etc) or media from the Wikimedia Commons project (route
map (P15), logo image (P154), etc).
topo.transport.data.gouv.fr
is also a structured database that consists of objects linked together
by relationship. The data model is very closed to the GTFS one.
Our Paris Métro Line 12:
- it has unique and stable id
Q127581
- it is an
instance of
(property P3) a route (item Q3).
- it has subway (item Q5) as
physical mode
(property
P8), which is itself an instance of
a physical mode
(item Q2)
- to take account of the fact that the same network will be updated
over time, an additional notion has been introduced: the data source
and the property P10 (
imported from
). It links to a
producer (Q1), which models a GTFS agency.
NeTEx
and Transmodel
The main element in the representation of a network in NeTEx and
Transmodel is the ROUTE:
an ordered list of located ROUTE POINTs defining one single logical
path.
The ROUTE entity represents an abstract concept and is independent of
both the infrastructure and the operational concerns.
A group of ROUTEs which is generally known to the public by a similar
name or number is called a LINE.
LINEs may be grouped into GROUPs OF LINES that can be used to define a
public transport network.
The spatial description of the services is then described using the
JOURNEY PATTERN entity: an ordered list of STOP POINTs and TIMING POINTs
on a single ROUTE.
NeTEx also includes some GIS features (ROAD POINT, ROAD ELEMENT, ROAD
RAIL ELEMENT, etc) that can be linked to public transport objects.
PROJECTIONs can also be used to link entities with GIS concepts (for
instance, a LINK SEQUENCE PROJECTION can be used to set the geometry of
the path between two points that are part of a ROUTE).
an example of representation of a line in NeTEx
Comparative
review
The main difference of modelling is the agency, which
is a united notion in the GTFS (and as well in topo.transport.data.gouv.fr, whose
model is very similar). Conversely, for the others, the concept is split
between transport network and transport operator.
However, the treatment of these notions is not uniform: it is a notion
in its own for the GTFS, Transmodel and Wikidata, while it is only an
attribute of the transport lines and stops for OpenStreetMap.
The notion of terminus is present in both
OpenStreetMap and Wikidata. It is close to the GTFS notion of trip_headsign
,
which is more hidden for the GTFS route (and non-existent in the topo.transport.data.gouv.fr route
model).
Both GTFS and OpenStreetMap use some text for the terminus, while
Wikidata go further and use a proper stop entity with an optional text.
The notion of route shape is also very different:
non-existent in Wikidata and topo.transport.data.gouv.fr, made
up of exact crossing points connected to a road or rail network for
OpenStreetMap, or completely decorrelated and made up of a sequence of
geographical points for the GTFS. Transmodel allows both types of
modelling.
Finally, there is a great diversity of attributes,
according to the specificity of each base. We will find the notion of
line number and colors in all databases, but Wikidata has a wider
variety of attributes (historical data, logos, photos and images of
signs or vehicles, passenger numbers, etc.). In addition, both
collaborative databases are open-ended and can dynamically propose new
attributes on their objects according to the context and needs.
Geographical description of
the stop or station
Wikidata
Wikidata mostly contains railway stations.
A typical railway station:
- has unique and stable id.
- is
instance of
(property P31) a metro station Q928830
(or a railway station Q55488). If we go up the semantic hierarchy of
instances and subclasses, we can find a parent item transportation
stop
from which most of the types of transport stops are
derived.
- we find again the notion of network, with the
part of
(P361) property.
- a station can then have some
connecting line
(property
P81).
You can also build the ordered list of the stop within a line using the
adjacent
(P197) property. Its uses as qualifier a connecting
line
(property P81) and a terminus station with the towards
(P5051) property.
OpenStreetMap
In OpenStreetMap, the stop position is the place where the vehicle
usually stops on the rails or on the street. It is an OpenStreetMap node
.
The platform is the place where passengers are waiting for the vehicles.
It is an OpenStreetMap node
or way
.
The stop area is the logical combination of all elements of a stop,
with one unique name. It can also contain other elements such as taxi
stops or station buildings. It is an OpenStreetMap relation
.
In fact, several data models co-exists in OpenStreetMap for describing
stops. The public transport model (also called ptv2) described above has
been approved by the community in 2011 but its adoption is far from
complete, several years later.
Historically, there were dedicated tags in OpenStreetMap for stops in
each mode and some were platforms and some were stop positions. As the
public transport model was not intended to replace the “old” one, but
only to clarify it, we now usually find these historical tags, sometimes
in addition to the new ones, and it is frequent to only have platforms
or only stop_positions.
Modeling is highly variable across locations and communities, but the
following is a summary of what these tags represent:
|
|
platform |
stop_position |
bus |
highway=bus_stop |
frequent |
sometimes |
metro / train |
railway=station |
usual |
sometimes |
tram |
railway=tram_stop |
usual |
frequent |
bus boat |
amenity=ferry_terminal |
rare |
frequent |
The stop area is probably the most contested concept. It sometimes
groups together logical elements representing the same object (the
platform and the stop position of the same stop), and sometimes several
stops.
Stop areas can sometimes be nested to represent multimodal hub,
although there is another attribute sometimes used for this purpose (public_transport=stop_area_group
).
It should be noted that, as opposed to the stop area, which is a
logical grouping of stops, there is also physical grouping, represented
by a polygon (closed way
) drawing an area and encompassing
objects: this is used for the outline of subway and train stations, or
for bus stations.
GTFS
In the GTFS model, the stops are gathered in the stops.txt file. The
kind of stop is represented by the location_type
attribute.
location_type
= 0 represents the most basic stop, which
is called platform if inside a station
location_type
= 1 is used to represent a station, which
is a group of stops
location_type
= 2 represents entrances and exits,
usually for train or subway stations
location_type
= 3 is a generic node, which represents a
point used in a pathway (a graph representation of a walkway inside a
station)
location_type
= 4 is a boarding area (a location on a
platform where passenger can board vehicles)
You can link stops between them in hierarchy by using the parent_station
attribute, that will hold another stop_id
: entrances and
platforms are linked to their parent station, while boarding areas link
to a platform.
For the stops too, topo.transport.data.gouv.fr’s model is very close
to the GTFS one:
- There are entities that models the different kinds of
location_type
:
Stop Point, Stop Area, Entrance, Generic Nodes, Boarding Points. Each
stop is an instance of
(property P3) some kind of stop
- Each stop is also linked to its data source entity with the
datasource
property P10.
- the hierarchy inside a station is represented using the
part
of
property (P15)
In addition, the entities that are stop points (instance of
Stop Point Q12) are also linked (with the part of
P15
property) to their route entities. This is a major difference with the
GTFS, where the link between a stop and a route needs several file
merges.
NeTEx
and Transmodel
A STOP PLACE represents a station or a physical stop.
Its main component is the QUAY, that is the physical point of access to
transport.
It can also contains ENTRANCE, which is a point where a passenger can
access a stop place on foot.
It can also contains ACCESS SPACEs, that can be nested to model complex
stations. QUAYs and ACCESS SPACES can be connected to each other using
PATH LINKs, and ENTRANCEs model both external (for instance for a subway
entrance) and internal (for instance from an entrance concourse to a
platform).
A STOP ASSIGNMENT can be used to connect a STOP PLACE to a SCHEDULED
STOP POINT (the representation of the stop in a timetable).
There is also a concept of STOP AREA, but it represents a group of
SCHEDULED STOP POINTs close to each other.
an example representation of a stop
A simple representation of a pair of bus stops on a route bus stop can
be achieved with a STOP PLACE and 2 QUAYs.
A metro station will also use a STOP PLACE and some QUAYs, but also
ENTRANCEs and ACCESS SPACES.
Comparative
review
In the modeling of stops, we find the same terminologies (platform,
stop, station, stop area, etc) but with different definitions.
There is a differentiation between physical stop and logical
stop:
- OpenStreetMap has a rather physical description. It has some logical
notions such as stop areas but has no real consensus on the
representation of these notions.
- Transmodel offers a logical description and can even theoretically
propose stops without coordinates.
- the GTFS and topo.transport.data.gouv.fr are rather logical,
even if the precise description of stations allowed by the recently
added
location_type
attributes get them closer to a
physical description.
Most of them have the notion of a logical stop area, grouping stops.
Transmodel, OpenStreetMap and GTFS also provide means to represent,
physically or logically (in some graphical form), the indoor of
stations, with spaces and links to connect entrances and
platforms.
Transmodel and GTFS also propose a notion of boarding point
on a platform, which is not found in OpenStreetMap.
Although most of the models studied attempt to propose a unified model,
there is a certain difference in modelling between modes,
which is physically justified: it is difficult to represent a large
metro station and a bus stop identically.
The link between the line and the stops is also
modeled in a differentiated way:
- In OpenStreetMap, the route contains its stops
- In Wikidata, the stops are connected to a line
- In the GTFS, the link between the stop and the route is more hidden
because it is linked to services and timetables. Transmodel takes up a
close vision. Conversely, topo.transport.data.gouv.fr has
added a link between the stop and its routes directly accessible via a
property.
Credits
This comparative review was carried out by Jungle Bus for MobilityData as part of the
Mobility Database project.
This document is made available under the conditions of the CC-BY-SA license.
Transport data models comparative review
This study is a comparative review of the data models to store public transport information. It compares the model from the following sources:
The project has many objectives in common with the Mobility Database. It consists of a structured database that stores topological transport data. The purpose is to be able to link datasets from different operators or from different sources.
Although not limited to transportation, both collaborative projects allow the description of public transport objects.
General data format description
GTFS format
A GTFS feed is composed of several text files collected in a
.zip
file. Here are the main files and the associated notions:Each text file is a
csv
and contains columns to describe the content and attributes of the objects.The full description of the model can be found in the GTFS reference.
OpenStreetMap
OpenStreetMap is an open-licensed collaborative geographical database and has three types of element:
node
: a single point in space. A stop can be a node.way
: an ordered list of nodes that defines a linear feature or a polygon area. A train platform may be a way.relation
: an ordered list of elements. These elements can be nodes, ways or relations, are called members and can have an optional role. A bus route can be a relation.Each element has a unique id and a set of
tags
(key
=value
) to describe its attributes.Several public transport data models co-exists in OpenStreetMap database. We will mostly consider the one approved by the community in 2011 (sometimes called ptv2 which stands for “Public Transport version 2”) but we will add special notes to point out other frequent models that can also be found in the database.
The full description of the ptv2 model can be found in OpenStreetMap wiki.
Wikidata
Wikidata is an open-licensed collaborative structured database that supports Wikipedia and other projects of the Wikimedia Foundation.
The Wikidata database is made of elements called
items
uniquely identified by an identifier (a Q followed by a number). Each one can have a label, a description and any number of aliases. Their attributes are calledstatements
and consist of aproperty
(uniquely identified by a P followed by a number) and a value.A value can be a link to another item, or a proper value (such as text label, number or coordinates for instance). A property can also link to an external dataset (and it is called
identifier
).Wikidata contains few information about transport, mainly about train, subway and tramway modes. Although it is a structured database, the model is not very standardized and there are many local divergences on the modelling of the information. Moreover, on the contrary to previous databases, original contents such as images (logo, pictograms, line maps, etc.) or photos can be found thanks to the link with Wikimedia Commons.
topo.transport.data.gouv.fr
topo.transport.data.gouv.fr is also a structured database and rely on the same software as Wikidata. It already contains all the French GTFS datasets available in open data.
NeTEx format and Transmodel
NeTEx (Network Timetable Exchange) is an European standard for exchanging public transport data. It relies on Transmodel, another European standard that is a conceptual data model covering most of the data domain of public transport.
In practice, we rarely use all the concepts covered by the standard: the stakeholders who are going to exchange data with each other agree on a subset of the schema useful to their needs and in accordance with their local context (this is called a profile).
Data in NeTEx format is encoded as XML documents.
The basic concepts of Transmodel are POINTs (the described entities are represented in capital letters in most normative documents) and LINKs (an oriented spatial object of dimension 1 with view to the overall description of a network, describing a connection between two POINTs). An ordered set of POINTs or LINKs is called a LINK SEQUENCE. These are the generic building blocks of the Public Transport network model.
To account for the fact that data can come from different sources that use the same word to represent different things, the Transmodel data model uses distinct elements to represent the precise semantics of each separate concept in LAYERs, that can be mapped using PROJECTIONs.
The full description of the model can be found in the Transmodel reference.
Geographical description of the route
Most public transport systems run along fixed routes with set embarkation and disembarkation stops.
Let’s take Paris Métro Line 12 as an example.
This line is part of the Paris Métro public transport network. It is operated by RATP, Paris’s public transport company. It links “Mairie d’Issy” station at the South to “Front Populaire” station at the North and is about 15km long.
GTFS
In the GTFS of the Paris region, you will find it as a line in the routes.txt file.
Its attributes are:
route_id
, which is an identifier for this objectroute_short_name
: “12”route_long_name
: also “12”route_type
: “1”. The GTFS reference maps each authorized values to a public transport mode or type of transportation (“1” is Subway or Metro).route_color
and aroute_text_color
. These are used to represent the route in public facing material, theroute_color
is the background color while the text is printed on top of it with theroute_text_color
color.route_desc
(a description of a route),route_url
(an URL of a web page about this route),route_sort_order
(to allow to order the routes in an usual way for presentation to the passengers)agency_id
: This is an identifier referencing an agency object in the agency.txtIn the agency.txt file, this agency has the following attributes:
agency_name
: “METRO”agency_url
to link to the web page of the agencyIf you want to get the tracks of the line (for instance to draw it on a map), you will need to use some other GTFS files.
The trips.txt file contains the trips of each route, with a
route_id
column to map it to its route.In the GTFS file studied here, we have 2024 trips for the Paris Métro Line 12. Among other attributes, a trip has:
trip_headsign
which defines the destination of the trip for passenger informationdirection_id
(0 or 1) to indicate the direction of travel in timetablesshape_id
which is an identifier to a shape object in the shapes.txt fileThe sequence of the stops can be find using the stop_times.txt file, which has identifiers to both the trips.txt and stops.txt files.
In the shapes.txt file, you can find the sequence of geographical points through which the vehicle passes in order for each trip.
OpenStreetMap
In OpenStreetMap, a public transport line is a relation of type “route_master”.
Paris Métro Line 12 is the relation with OpenStreetMap id 7420642.
It has the following tags:
type = route_master
: this is a public transport lineroute_master = subway
: its public transport mode is subway. OpenStreetMap defines a set of values that represents the different types of transportationref = 12
: the reference by which the service or route is knownname = Métro 12
network = RATP
andoperator = RATP
: the network tag represents the commercial name displayed for passengers for this network while the operator is the name of the company that operates the service. Most of the time, these tags have different values (for instance, a night bus could havenetwork=Noctilien
andoperator=RATP
)colour
tagwikidata
tag), an URL, a social media account, etcA route_master relation contains relations with
type=route
as members.A route relation represents the paths taken repeatedly by people and vehicles and contains as members stops (where passengers can embark) and roads taken by the vehicles. They are sometimes called route variants or itinerary.
The closest concept in GTFS is the trip, but as opposed to the trip which has a temporal vision, the route relation has only a geographical meaning. Paris Métro Line 12 route_master relation has two child route relations, one for each direction.
The route relation inherits a lots of attributes from the route_master relation: it has a route number (in the
ref
tag), a public transport mode (in theroute
tag), a network name (in thenetwork
tag), an operator (in theoperator
tag), a route color (in thecolour
tag), etc. It has also a few additional tags:to
that represents the destination of the route variantfrom
to represent its originThe route relation has two kinds of members: stops and “roads”. Inside the route relation, you will find:
ways
: it will be actual street objects for bus route or railway objects for the tramway, subway or train routesThe modeling of the data in OpenStreetMap is far from homogeneous and many variations from this model exist:
type=network
relation that groups all public transport lines of a same network are sometimes found in addition to or instead of thenetwork
tag on route and route_master relations.Wikidata
Wikidata data is a structured database and consists mainly of objects linked together by relationship. There are few transport lines in Wikidata and there are many modelling discrepancies.
If we take our example from Paris Métro Line 12:
Q50757
.instance of
(property P31) a rapid transit railway line (item Q15079663). If we go up the semantic hierarchy of instances and subclasses, we can find a parent itemtransport line
from which most of the types of transport lines are derived.part of
(P361) property.terminus
(P559) property, with additional qualifiers for the direction.Finally, many other attributes are available concerning its color (P465), its length (P2043), but also its history ( date of official opening (P1619), etc) or media from the Wikimedia Commons project (route map (P15), logo image (P154), etc).
topo.transport.data.gouv.fr
topo.transport.data.gouv.fr is also a structured database that consists of objects linked together by relationship. The data model is very closed to the GTFS one.
Our Paris Métro Line 12:
Q127581
instance of
(property P3) a route (item Q3).physical mode
(property P8), which is itself aninstance of
a physical mode (item Q2)imported from
). It links to a producer (Q1), which models a GTFS agency.NeTEx and Transmodel
The main element in the representation of a network in NeTEx and Transmodel is the ROUTE:
an ordered list of located ROUTE POINTs defining one single logical path.
The ROUTE entity represents an abstract concept and is independent of both the infrastructure and the operational concerns.
A group of ROUTEs which is generally known to the public by a similar name or number is called a LINE.
LINEs may be grouped into GROUPs OF LINES that can be used to define a public transport network.
The spatial description of the services is then described using the JOURNEY PATTERN entity: an ordered list of STOP POINTs and TIMING POINTs on a single ROUTE.
NeTEx also includes some GIS features (ROAD POINT, ROAD ELEMENT, ROAD RAIL ELEMENT, etc) that can be linked to public transport objects. PROJECTIONs can also be used to link entities with GIS concepts (for instance, a LINK SEQUENCE PROJECTION can be used to set the geometry of the path between two points that are part of a ROUTE).
Comparative review
The main difference of modelling is the agency, which is a united notion in the GTFS (and as well in topo.transport.data.gouv.fr, whose model is very similar). Conversely, for the others, the concept is split between transport network and transport operator.
However, the treatment of these notions is not uniform: it is a notion in its own for the GTFS, Transmodel and Wikidata, while it is only an attribute of the transport lines and stops for OpenStreetMap.
The notion of terminus is present in both OpenStreetMap and Wikidata. It is close to the GTFS notion of
trip_headsign
, which is more hidden for the GTFS route (and non-existent in the topo.transport.data.gouv.fr route model).Both GTFS and OpenStreetMap use some text for the terminus, while Wikidata go further and use a proper stop entity with an optional text.
The notion of route shape is also very different: non-existent in Wikidata and topo.transport.data.gouv.fr, made up of exact crossing points connected to a road or rail network for OpenStreetMap, or completely decorrelated and made up of a sequence of geographical points for the GTFS. Transmodel allows both types of modelling.
Finally, there is a great diversity of attributes, according to the specificity of each base. We will find the notion of line number and colors in all databases, but Wikidata has a wider variety of attributes (historical data, logos, photos and images of signs or vehicles, passenger numbers, etc.). In addition, both collaborative databases are open-ended and can dynamically propose new attributes on their objects according to the context and needs.
Geographical description of the stop or station
Wikidata
Wikidata mostly contains railway stations.
A typical railway station:
instance of
(property P31) a metro station Q928830 (or a railway station Q55488). If we go up the semantic hierarchy of instances and subclasses, we can find a parent itemtransportation stop
from which most of the types of transport stops are derived.part of
(P361) property.connecting line
(property P81).You can also build the ordered list of the stop within a line using the
adjacent
(P197) property. Its uses as qualifier aconnecting line
(property P81) and a terminus station with thetowards
(P5051) property.OpenStreetMap
In OpenStreetMap, the stop position is the place where the vehicle usually stops on the rails or on the street. It is an OpenStreetMap
node
.The platform is the place where passengers are waiting for the vehicles. It is an OpenStreetMap
node
orway
.The stop area is the logical combination of all elements of a stop, with one unique name. It can also contain other elements such as taxi stops or station buildings. It is an OpenStreetMap
relation
.In fact, several data models co-exists in OpenStreetMap for describing stops. The public transport model (also called ptv2) described above has been approved by the community in 2011 but its adoption is far from complete, several years later.
Historically, there were dedicated tags in OpenStreetMap for stops in each mode and some were platforms and some were stop positions. As the public transport model was not intended to replace the “old” one, but only to clarify it, we now usually find these historical tags, sometimes in addition to the new ones, and it is frequent to only have platforms or only stop_positions.
Modeling is highly variable across locations and communities, but the following is a summary of what these tags represent:
highway=bus_stop
railway=station
railway=tram_stop
amenity=ferry_terminal
The stop area is probably the most contested concept. It sometimes groups together logical elements representing the same object (the platform and the stop position of the same stop), and sometimes several stops.
Stop areas can sometimes be nested to represent multimodal hub, although there is another attribute sometimes used for this purpose (
public_transport=stop_area_group
).It should be noted that, as opposed to the stop area, which is a logical grouping of stops, there is also physical grouping, represented by a polygon (closed
way
) drawing an area and encompassing objects: this is used for the outline of subway and train stations, or for bus stations.GTFS
In the GTFS model, the stops are gathered in the stops.txt file. The kind of stop is represented by the
location_type
attribute.location_type
= 0 represents the most basic stop, which is called platform if inside a stationlocation_type
= 1 is used to represent a station, which is a group of stopslocation_type
= 2 represents entrances and exits, usually for train or subway stationslocation_type
= 3 is a generic node, which represents a point used in a pathway (a graph representation of a walkway inside a station)location_type
= 4 is a boarding area (a location on a platform where passenger can board vehicles)You can link stops between them in hierarchy by using the
parent_station
attribute, that will hold anotherstop_id
: entrances and platforms are linked to their parent station, while boarding areas link to a platform.topo.transport.data.gouv.fr
For the stops too, topo.transport.data.gouv.fr’s model is very close to the GTFS one:
location_type
: Stop Point, Stop Area, Entrance, Generic Nodes, Boarding Points. Each stop is aninstance of
(property P3) some kind of stopdatasource
property P10.part of
property (P15)In addition, the entities that are stop points (
instance of
Stop Point Q12) are also linked (with thepart of
P15 property) to their route entities. This is a major difference with the GTFS, where the link between a stop and a route needs several file merges.NeTEx and Transmodel
A STOP PLACE represents a station or a physical stop.
Its main component is the QUAY, that is the physical point of access to transport.
It can also contains ENTRANCE, which is a point where a passenger can access a stop place on foot.
It can also contains ACCESS SPACEs, that can be nested to model complex stations. QUAYs and ACCESS SPACES can be connected to each other using PATH LINKs, and ENTRANCEs model both external (for instance for a subway entrance) and internal (for instance from an entrance concourse to a platform).
A STOP ASSIGNMENT can be used to connect a STOP PLACE to a SCHEDULED STOP POINT (the representation of the stop in a timetable).
There is also a concept of STOP AREA, but it represents a group of SCHEDULED STOP POINTs close to each other.
A simple representation of a pair of bus stops on a route bus stop can be achieved with a STOP PLACE and 2 QUAYs.
A metro station will also use a STOP PLACE and some QUAYs, but also ENTRANCEs and ACCESS SPACES.
Comparative review
In the modeling of stops, we find the same terminologies (platform, stop, station, stop area, etc) but with different definitions.
There is a differentiation between physical stop and logical stop:
location_type
attributes get them closer to a physical description.Most of them have the notion of a logical stop area, grouping stops.
Transmodel, OpenStreetMap and GTFS also provide means to represent, physically or logically (in some graphical form), the indoor of stations, with spaces and links to connect entrances and platforms.
Transmodel and GTFS also propose a notion of boarding point on a platform, which is not found in OpenStreetMap.
Although most of the models studied attempt to propose a unified model, there is a certain difference in modelling between modes, which is physically justified: it is difficult to represent a large metro station and a bus stop identically.
The link between the line and the stops is also modeled in a differentiated way:
Credits
This comparative review was carried out by Jungle Bus for MobilityData as part of the Mobility Database project.
This document is made available under the conditions of the CC-BY-SA license.