Dataset API Discovery 0.3

Abstract

This document specifies a dataset site and embedded JSON-LD document that together describe an open data dataset and define related APIs that are available to manipulate it.

5. Definitions

5.1 Dataset Sites

A Dataset Site is a human and machine readable web page ("Dataset Page") that describes a dataset and the APIs available to interact with it, with associated functionality that allows for feedback to be provided about the dataset.

5.2 Data Catalogs

A Data Catalog is a JSON structure that supports and enables the discoverability of Dataset Sites. They do so by providing metadata and links, either to Dataset Sites directly or to other Data Catalogs.

5.3 Purpose

5.3.1 Dataset Sites

The purpose of a Dataset Site is to provide:

A web page that can be referenced when discussing the dataset.
A human and machine readable licence associated with the data (the Dataset Page contains invisible metadata which allows its details to be read automatically).
A human and machine readable rights statement to specify how dataset users (innovators who want to build on top of/use data) should attribute your data.
An accessible "single point of truth" that explains where the data can be found.
Details ("documentation") and historical record ("changelog") relating to the format of the data, including the specifications it follows, and the data fields it contains.
A place where the community can contribute with comments, and raise issues.
A mechanism by which Data Consumers can subscribe to get updates about changes to the data format, specifications and fields.
A human and machine readable description of any APIs that can be used to manipulate the data, and the process to gain access to such APIs.

5.3.2 Data Catalogs

The purpose of a Data Catalog is to provide:

machine-readable metadata about dataset sites
URL(s) pointing to dataset sites
machine-readable metadata about, and URLS pointing to, other Data Catalogs where appropriate
a start-point for spidering of data collections

5.4 Dataset Sites

5.4.1 HTML Content

5.4.1.1 Human-readable content

With the exception of licensing information, there are no strong requirements for the human-readable content of dataset pages, and implementers may provide whatever information they see fit here. For the convenience of end-users, however, it is normally expected (and is RECOMMENDED) that a Dataset Page will provide at least the following information and markup:

the name of the organisation publishing the data
the standards to which the published data conforms (e.g., the Opportunity standard)
the version of each of these standards (e.g., '2.0')
where a link to a data feed is provided, their text should refer to the entity types this feed contains (e.g. SessionSeries, Slots)
an appropriately-labelled link to documentation relevant to the data feed(s)
an appropriately-labelled link to a discussion channel for the data feed(s)
licensing information

Note that, of the list above, only licensing information is REQUIRED to be available in human-readable form, and this license MUST be a Creative Commons Attribution 4.0 International License (often abbreviated as 'cc-by').

Note further that, in the event that you are republishing OpenActive data from another source, the original publisher must be credited as per the terms of this license.

5.4.1.2 HTML `meta` tags

In addition to the directly-readable content of the HTML body, information contained in the HTML head may sometimes be used by search engines and social-media platforms to aid findability and provide snippets.

It is accordingly RECOMMENDED that the following <meta> tags be supplied.

Property	Value
title	The name of the publishing organisation, followed by the string ' Open Data'.
identifier	The URL of the dataset site.
keywords	Short, descriptive words or phrases to aid discoverability
description	A human-readable description of the dataset.
language	The language of the dataset site.

5.4.1.3 OpenGraph `<meta>` tags

OpenGraph is a protocol created by Facebook that allows useful snippets to be extracted on social media platforms, including also LinkedIn and Twitter.

The following OpenGraph properties are RECOMMENDED for use in <meta> elements in the HTML head of Dataset Pages.

Property	Value
`og:title`	The name of the publishing organisation, followed by the string ' Open Data'.
`og:description`	A human-readable description of the dataset.
`og:locale`	For publishers within the UK, this should be 'en_GB'.
`og:url`	The URL of the dataset site.
`og:image`	The logo of the publishing organisation

5.4.2 Embedded JSON

Dataset Sites must be machine-readable via embedded JSON-LD.

Property	Status	Type	Notes
`@context`	REQUIRED	Array of URL values	Note that, in conformity with RFC3986, trailing slashes MUST be supplied.
`@type`	REQUIRED	Text	`Dataset`
`@id`	REQUIRED	URL	A URL uniquely identifying the dataset site resource. May be the URL of the Dataset Site itself
`schema:url`	REQUIRED	URL	Typically the URL of the dataset site itself.
`schema:name`	REQUIRED	Text	The name of the collection of datasets referenced by the site. Often this will simply be the name of the publishing organisation.
`schema:description`	RECOMMENDED	Text	A human-readable description of the datasets referenced by the site.
`schema:keywords`	OPTIONAL	Array of Text	Short descriptive metadata tags for the dataset collection.
`schema:license`	REQUIRED	URL	A URL reference to the license under which the dataset site is published. For OpenActive dataset sites this should be `https://creativecommons.org/licenses/by/4.0/`.
`schema:distribution`	REQUIRED	Array of `dcat:Distribution` object	See below, Describing Individual Feeds
`schema:discussionUrl`	RECOMMENDED	URL	A link to a resource for discussing and raising issues with the published datasets. Typically, although not necessarily, this will be a link to a GitHub repository.
`schema:documentation`	RECOMMENDED	Array of URL	Link(s) to further resources concerning the dataset site and its referenced datasets - e.g., GitHub READMEs or status summaries.
`schema:inLanguage`	RECOMMENDED	String	The language of the dataset. Should be expressed as an ISO 639-2 language code.
`schema:publisher`	REQUIRED	schema:Organization	The organization responsible for publishing the collection of datasets linked to by the dataset site. For further information, see below, Describing Organizations.
`schema:datePublished`	REQUIRED	schema:Date	The date the dataset site was published.
`schema:schemaVersion`	REQUIRED	URL	The version of the dataset site specification to which the site conforms.

The MIME-type of this JSON object MUST be defined on the enclosing HTML script tag as application/ld+json.

Note: Trailing slashes and @context

It is common practice is to reference https://schema.org without a trailing / within @context. However to be consistent with the OpenActive Modelling Opportunity Data specification, which uses the full URI of https://openactive.io/ (including a path as per RFC 3986, the specification requires the schema.org context to be referenced with a trailing slash, i.e. https://schema.org/.

5.4.2.1 Describing Individual Feeds (`dcat:Distribution` objects)

Property	Status	Type	Notes
`@type`	REQUIRED	Text	`DataDownload`
`schema:name`	REQUIRED	Text	A human-readable name for the dataset.
`schema:additionalType`	RECOMMENDED	URL	A link to a definition of the type of the feed - e.g of `ScheduledSessions` or `CourseInstances`
`schema:encodingFormat`	RECOMMENDED	Text or URL	The MIME-type of the data accessible via the `contentUrl`
`schema:contentUrl`	REQUIRED	URL	The URL of the feed containing the dataset.
`schema:totalItems`	RECOMMENDED	Integer	The total number of items (whether `updated` or `deleted`) that are available from the beginning of the feed. Note that this number will often be approximate only, given the rapidity with which updates may be made to backend datastores.

5.4.2.2 Supporting Booking (`schema:WebAPI`)

In addition to the above markup for discoverability, dataset sites that support Open Booking API functionality MUST indicate this with markup enabling discovery and use of the relevant API endpoints.

Property	Status	Type	Notes
`@type`	REQUIRED	Text	`WebAPI`
`schema:name`	RECOMMENDED	Text	A human-readable name for the dataset.
`schema:description`	OPTIONAL	Text	A human-readable description of the API
`schema:documentation`	RECOMMENDED	URL or schema:CreativeWork	Human-readable API documentation. See Describing API Endpoints, below.
`schema:termsOfService`	REQUIRED	Text or URL	Human-readable terms of service documentation.
`schema:provider`	REQUIRED	schema:Organization	The Organization providing the API endpoint.
`schema:endpointUrl`	REQUIRED	URL	The root location or primary endpoint of the API.
`schema:conformsTo`	RECOMMENDED	URL	The URL reference of an established standard to which the described API conforms.
`schema:license`	REQUIRED	URL	A URL reference to the license under which the dataset site is published. For OpenActive dataset sites this should be `https://creativecommons.org/licenses/by/4.0/`.
`schema:endpointDescription`	RECOMMENDED	`schema:EntryPoint`	A machine-readable description of the API. See Describing API Endpoints, below
`schema:bookingService`	RECOMMENDED	`schema:SoftwareApplication`	The software system responsible for handling booking over the Open Booking API.
`oa:authenticationAuthority`	`schema:URL`	The location of the OpenID Provider or other relevant authentication authority that must be used to access the API.	e.g. `https://auth.bookingsystem.com`

5.4.2.3 Describing API Endpoints (`schema:EntryPoint`)

Supporting documentation is crucial for the successful uptake and use of APIs. Ideally, both human-readable freetext and machine-readable structured data are made available.

The schema.org objects for human- and machine-readable documents are largely identical in terms of content and structure. However, the MIME-type associated with each will normally differ.

Property	Status	Type	Notes
`@type`	REQUIRED	Text	`EntryPoint`
`schema:url`	REQUIRED	URL	A URL pointing to supporting documentation for the API.
`schema:encodingFormat`	RECOMMENDED	Text	The MIME-type delivered by the url. For human-readable documentation (`schema:documentation`) this will normally be `text/html`; for machine-readable documentation (`schema:endpointUrl`), `application/json` or a more-specific subtype of this.

5.4.2.4 Describing Booking Services (`schema:SoftwareApplication`)

Property	Status	Type	Notes
`@type`	REQUIRED	Text	`SoftwareApplication`
`schema:name`	REQUIRED	Text	The name of the software application
`schema:url`	OPTIONAL	`schema:URL`	The URL of a human-readable web-page providing further information about the software.
`schema:featureList`	RECOMMENDED	`schema:URL`	A URL pointing to a machine-readable description of the Open Booking API features implemented by the system, e.g. as generated by the OpenActive Test Suite.
`schema:softwareVersion`	RECOMMENDED	Text	Version of the software instance.

Note

The schema:WebAPI specification has been assigned Pending status by the schema.org organisation, and is scheduled for release in schema version 10.0. While schema:WebAPI is relatively stable, then, points of detail are still subject to review and this specification may change at short notice.

5.4.2.4.1 Worked Example

The below illustrates a Dataset Site pointing to feeds consisting of ScheduledSessions, SessionSeries, and Events. As the presence of the webAPI attribute indicates, data items from these feeds are bookable.

<script type="application/ld+json`/">
{
   "@context":[
      "https://schema.org/",
      "https://openactive.io/",
      "https://openactive.io/ns-beta"
   ],
   "@type":"Dataset",
   "@id":"https://data.example.com/",
   "name":"Example Sessions and Events",
   "description":"Near real-time availability and rich descriptions relating to sessions and events available from Example.com",
   "url":"https://data.example.com/",
   "dateModified":"2019-08-25T11:23:27+00:00",
   "keywords":[
      "Courses",
      "Sessions",
      "Events",
      "Activities",
      "Sports",
      "Physical Activity",
      "OpenActive"
   ],
   "schemaVersion":"https://www.openactive.io/modelling-opportunity-data/2.0/",
   "license":"https://creativecommons.org/licenses/by/4.0/",
   "publisher":{
      "@type":"Organization",
      "name":"Example.com",
      "description":"Example.com makes it easy to get active!",
      "url":"https://example.com/home",
      "legalName":"Example Ltd",
      "logo":{
         "@type":"ImageObject",
         "url":"https://cdn.example.com/assets/logo.png"
      },
      "email":"[email protected]"
   },
   "discussionUrl":"https://github.com/example/repo/issues",
   "datePublished":"2019-07-11T00:00:00+00:00",
   "inLanguage":[
      "en-GB"
   ],
   "distribution":[
      {
         "@type":"DataDownload",
         "name":"ScheduledSession",
         "additionalType":"https://openactive.io/ScheduledSession",
         "encodingFormat":"application/vnd.openactive.rpde+json; version=1",
         "contentUrl":"https://example.com/api/openactive/scheduledsessions",
         "totalItems": 1852
      },
      {
         "@type":"DataDownload",
         "name":"SessionSeries",
         "additionalType":"https://openactive.io/SessionSeries",
         "encodingFormat":"application/vnd.openactive.rpde+json; version=1",
         "contentUrl":"https://example.com/api/openactive/sessionseries",
         "totalItems": 361
      },
      {
         "@type":"DataDownload",
         "name":"Event",
         "additionalType":"https://schema.org/Event",
         "encodingFormat":"application/vnd.openactive.rpde+json; version=1",
         "contentUrl":"https://example.com/api/openactive/events",
         "totalItems": 1906
      }
   ],
   "backgroundImage":{
      "@type":"ImageObject",
      "url":"https://cdn.example.com/images/background.jpg"
   },
   "documentation":"https://developer.openactive.io/",
   "accessService":{
      "@type":"WebAPI",
      "name":"Open Booking API",
      "description":"The Open Booking API lets you to book OpenActive Opportunities. The API uses standard schema.org types and is compliant with the JSON-LD specification.",
      "documentation":"https://openactive.io/open-booking-api/EditorsDraft",
      "termsOfService":"https://example.com/api/booking/documentation/terms-of-service",
      "provider": {
        "@type": "Organization",
        "name":"examplebooking.com",
        "description":"examplebooking.com makes it easy to get booking!",
        "url":"https://examplebooking.com/home",
        "email":"[email protected]"
      },
      "endpointUrl":"https://example.com/api/booking/",
      "conformsTo":[
         "https://www.openactive.io/open-booking-api/2.0/"
      ],
      "endpointDescription":"https://www.openactive.io/open-booking-api/2.0/swagger.json",
      "bookingService": {
        "@type": "SoftwareApplication",
        "name": "nyExampleBookingPlatform",
        "softwareVersion": "1.2",
        "url": "https://www.example.com/myExampleBookingPlatform",
        "featureList": "https://www.example.com"


      }
   }
}
</script>

5.4.3 Discoverability and Dataset Sites (using `schema:DataCatalog`)

Data Catalogs will normally be published as JSON-LD objects accessible via a URL.

Property	Status	Type	Notes
`@context`	REQUIRED	Array of URL values	Will normally consist only of the value `http://schema.org/`. Note that, in conformity with RFC3986, trailing slashes MUST be supplied where appropriate.
`@type`	REQUIRED	String	`DataCatalog`
`@id`	RECOMMENDED	URL	A unique identifier for the DataCatalog, often identical to the URL at which the DataCatalog is found.
`schema:datePublished`	RECOMMENDED	`schema:Date`	The date the `DataCatalog` was published.
`schema:publisher`	RECOMMENDED	`schema:Organization`	The `Organization` responsible for publishing the DataCatalog.
`schema:license`	REQUIRED	URL	A URL reference to the license under which the dataset site is published. For OpenActive dataset sites this should be `https://creativecommons.org/licenses/by/4.0/`.
`schema:dataset`	REQUIRED if `hasPart` is absent, OPTIONAL otherwise.	Array of URL	One or more URLs pointing to OpenActive Dataset Sites.
`schema:hasPart`	REQUIRED if `dataset` is absent, OPTIONAL otherwise.	Array of URL	One or more URLs pointing to other OpenActive DataCatalogs.

Note: Trailing slashes and @context

5.4.3.1 schema.org to DCAT mapping

The W3C DCAT 2.0 standard is widely used to publish Data Catalogs. In order to make the semantics of OpenActive Data Catalogs clear, and to assist developers and organisations more familiar with DCAT, a mapping from DCAT 2.0 to OpenActive schema.org-based Data Catalog elements is provided here.

DCAT 2.0 Element	schema.org target element
`dcat:issued`	`schema:datePublished`
`dcat:publisher`	`schema:publisher`
`dcat:license`	`schema:license`
`dcat:dataset`	`schema:dataset`
`dcat:hasPart`	`schema:hasPart`

5.4.3.1.1 Worked example

The below is an example of a DataCatalog JSON object.

{
     "@context": " https://schema.org/",
     "@type": "DataCatalog",
     "id": "https://opendata.example.live/api/datacatalog",
     "dataset": [
          "https://api.example.org.uk/OpenActive/",
          "https://booking.example.co.uk/OpenActive/",
          "https://active.example.net/OpenActive/",
          "https://camp.example.net/OpenActive/"
     ],
     "datePublished": "2020-10-21T12:28:09.7981681+00:00",
     "publisher": {
          "type": "Organization",
          "name": "Example.com",
          "url": "https://www.example.com/systems"
     },
     "license": "https://creativecommons.org/licenses/by/4.0/"
}

5.4.4 Describing Publishers (`schema:Organization`)

Property	Status	Type	Notes
`@type`	REQUIRED	Text	`Organization`
`schema:name`	RECOMMENDED	Text	The name of the `Organization` publishing the datasets.
`schema:logo`	OPTIONAL	URL	A link to the publishing `Organization`'s logo.
`schema:url`	RECOMMENDED	URL	A link to the publishing `Organization`'s website.

5.4.5 Removing data feeds

5.4.5.1 Data Publishers

In the event that a feed is to be removed permanently, publishers MUST:

Remove the link to the feed from their dataset site
Ensure that the feed URL returns a 404 ('Not Found') status code. This response should be returned for a period of at least seven (7) days from the date of initial removal, in order to ensure that regularly-consuming applications receive an explicit indication of removal within a reasonable timeframe.

In the event that all data feeds are to be removed permanently and the publisher is ceasing to publish OpenActive feeds entirely, the dataset site as a whole should be removed and its URL return a 404.

5.4.5.2 Data Consumers

In the event that a consuming application receives a 404 response from a previously-harvested feed URL, all records associated with that feed MUST be purged from its datastore. This is to ensure data privacy and compliance with related legislation, such as e.g. the General Data Protection Regulation (GDPR).

Dataset API Discovery 0.3

Draft Community Group Report 01 December 2020

Abstract

Status of This Document

1. Introduction

1.1 Scope and requirements

1.1.1 Functionality that is out of scope

1.2 Audience

2. Conformance

3. Typographical Conventions

4. Key Actors

5. Definitions

5.1 Dataset Sites

5.2 Data Catalogs

5.3 Purpose

5.3.1 Dataset Sites

5.3.2 Data Catalogs

5.4 Dataset Sites

5.4.1 HTML Content

5.4.1.1 Human-readable content

5.4.1.2 HTML `meta` tags

5.4.1.3 OpenGraph `<meta>` tags

5.4.2 Embedded JSON

5.4.2.1 Describing Individual Feeds (`dcat:Distribution` objects)

5.4.2.2 Supporting Booking (`schema:WebAPI`)

5.4.2.3 Describing API Endpoints (`schema:EntryPoint`)

5.4.2.4 Describing Booking Services (`schema:SoftwareApplication`)

5.4.2.4.1 Worked Example

5.4.3 Discoverability and Dataset Sites (using `schema:DataCatalog`)

5.4.3.1 schema.org to DCAT mapping

5.4.3.1.1 Worked example

5.4.4 Describing Publishers (`schema:Organization`)

5.4.5 Removing data feeds

5.4.5.1 Data Publishers

5.4.5.2 Data Consumers

6. Future versions of this API

A. Acknowledgements

B. References

B.1 Normative references

Dataset API Discovery 0.3

Draft Community Group Report 01 December 2020

Abstract

Status of This Document

1. Introduction

1.1 Scope and requirements

1.1.1 Functionality that is out of scope

1.2 Audience

2. Conformance

3. Typographical Conventions

4. Key Actors

5. Definitions

5.1 Dataset Sites

5.2 Data Catalogs

5.3 Purpose

5.3.1 Dataset Sites

5.3.2 Data Catalogs

5.4 Dataset Sites

5.4.1 HTML Content

5.4.1.1 Human-readable content

5.4.1.2 HTML meta tags

5.4.1.3 OpenGraph <meta> tags

5.4.2 Embedded JSON

5.4.2.1 Describing Individual Feeds (dcat:Distribution objects)

5.4.2.2 Supporting Booking (schema:WebAPI)

5.4.2.3 Describing API Endpoints (schema:EntryPoint)

5.4.2.4 Describing Booking Services (schema:SoftwareApplication)

5.4.2.4.1 Worked Example

5.4.3 Discoverability and Dataset Sites (using schema:DataCatalog)

5.4.3.1 schema.org to DCAT mapping

5.4.3.1.1 Worked example

5.4.4 Describing Publishers (schema:Organization)

5.4.5 Removing data feeds

5.4.5.1 Data Publishers

5.4.5.2 Data Consumers

6. Future versions of this API

A. Acknowledgements

B. References

B.1 Normative references

5.4.1.2 HTML `meta` tags

5.4.1.3 OpenGraph `<meta>` tags

5.4.2.1 Describing Individual Feeds (`dcat:Distribution` objects)

5.4.2.2 Supporting Booking (`schema:WebAPI`)

5.4.2.3 Describing API Endpoints (`schema:EntryPoint`)

5.4.2.4 Describing Booking Services (`schema:SoftwareApplication`)

5.4.3 Discoverability and Dataset Sites (using `schema:DataCatalog`)

5.4.4 Describing Publishers (`schema:Organization`)