Collecting data is essential for data curators. In order to learn how to collect data, this tutorial introduces the concept of an API, and will show you examples of and practical guidelines for accessing data and metadata from public sources using APIs. Specifically, this tutorial first teaches you how to work with SODA API, which is provided by an open government data portal provider, Socrata. Once the tutorial is complete, you will know how to retrieve open government datasets and their metadata through the SODA API, Discovery API, and Metadata API.
API stands for Application Programming Interface. An API defines how different applications or different software components talk to each other, providing a way for developers to simplify programming. Just as a graphical user interface makes it easier for people to use programs, APIs make it easier for developers to use functions or data from another application (or another component of an application) without needing to know the implementation details [1][2].
In this tutorial, we will focus on web APIs. Applications such as Facebook, Twitter, and Google Maps provide access to some of their services and data through web APIs, which receive requests and send responses. For example, if you’re an iPhone user and you use Yelp to search for a restaurant, you will see an Apple Map embedded in Yelp showing you the location of the restaurant. Yelp doesn't provide map services; instead, it uses Apple’s API to integrate maps into its own functions. Another example of API usage is the ability to use Google or Facebook to log into other platforms. When you sign up for an account on a non-Google, non-Facebook website, you may be asked to sign in with your Facebook or Google account. When you do so, the website uses a Facebook or Google API to retrieve your account information.
The APIs used in this tutorial adhere to an architectural style called REST, which stands for Representational State Transfer. As blogger Sarah Maddox explains, “REST services tend to offer an easy-to-parse URL structure consisting of nouns that reflect the logical hierarchical categories of the data on offer” [3]. An example of a REST API web URL is: http://api.us.socrata.com/api/catalog/v1/domains. When using a REST API, you will need to know how to structure your query according to rules, and these rules can usually be found in the API documentation for the web application.
Why should we learn about API? Software developers use APIs to create well-structured applications, and data scientists and data curators use APIs to collect large amounts of data automatically. Thus, learning to access data and metadata using an API will be the main objective of this tutorial.
We will walk through two examples of retrieving datasets from the municipal open data portal of City of Seattle using an API:
The underlying software, Socrata, provides APIs for accessing, filtering, and downloading data and metadata. To learn more about Socrata's API, visit their API starter guide.
Step 1: First, take some time to explore Seattle Open Data Portal datasets and choose a dataset that interests you. For example, Road Weather Information Stations. API documentation is in the upper right corner of the data landing page, as shown in the image below:
Click “API” and copy the “API Endpoint” using the default JSON data format. Then, open a new tab or window and paste the link into the browser. Then, try it again, selecting a different data format, such as CSV. Finally, click “API” again, but this time, click “API Docs," instead of using an endpoint.
Step 2: On the “API Docs” page, scroll down to “Fields." Read through this section, and then try different fields, and their subsequent filters. To do so, click the “+” next to a field. For example, when you expand “stationname” (see image below), you can see a sample filter by clicking “try it.” This shows you what type of response (data) you will get.
Step 3: Now, go to https://dev.socrata.com/docs/endpoints.html. Review the API endpoint concept, and try the examples on the page.
Step 1: Go to Discovery API, read the introduction, and scroll through the rest of the content. Try a few examples by clicking on the text boxes that begin with “Search by . . .” You can also view different code examples by choosing from the drop-down menu under “Request” that is set by default to “Raw.” Take a look at a few code examples, especially if you are familiar with any of the provided languages.
Step 2: After trying some of the examples in the documentation, try customizing your own API call in a browser window. Here are some examples:
In the image above (and in the link you followed for the API call above), “thing” represents portals hosted by Socrata, and “count” represents the number of resources each portal possesses. You can then search within a particular domain.
Here is an example: http://api.us.socrata.com/api/catalog/v1domains=data.seattle.gov. Results for this API call are displayed below.
Try it yourself with a different domain, such as data.nasa.gov.
Try the search again with a different category. You can find a list of all possible categories with this API call: http://api.us.socrata.com/api/catalog/v1/categories.
Step 1: Go to Socrata’s Metadata API page. Read the introduction and try a few examples. For example, you could choose to retrieve all metadata on a domain (see image below).
Step 2: Customize your API calls to retrieve metadata of interest to you. Add different parameters. For example, you can start by working with existing examples provided by Socrata.
The API call, http://evergreen.data.socrata.com/api/views/metadata/v1?limit=10&page=1 will return the 10 metadata records of page one from https://evergreen.data.socrata.com.
Try these queries with a different domain, such as one you discovered in Example 2a, Step 1. For example, we can see the metadata from the Seattle domain with the following API call: http://data.seattle.gov/api/views/metadata/v1/. Notice how we replaced evergreen.data.socrata.com in the URL with data.seattle.gov (see image below).
1. REST API: https://en.wikipedia.org/wiki/Representational_state_transfer
2. Other useful APIs:
3. Data Science project using Soda API: https://github.com/ViDA-NYU/urban-data-study
1. Explore Soda API and discovery API documents.
2. Try to use Soda API to access datasets of interest, filter data records according to your selected criteria. Submit your API calls.
e.g. Get all records about “Animal Complaints” after April 4th, 2018: https://data.seattle.gov/resource/pu5n-trf4.json?$where=event_clearance_date >'2018-04-04T12:00:00.000'&initial_type_group=ANIMAL COMPLAINTS
3. Try to use Soda discovery API or metadata API to get catalog information or asset-level metadata information about open government data portals. Use filters and parameters to refine your queries. Submit your API calls.
e.g. http://api.us.socrata.com/api/catalog/v1?categories=public%20safety
e.g. http://data.seattle.gov/api/views/metadata/v1?limit=10&page=1
This material is part of Open Data Literacy Project funded by IMLS grant.
An Yan: Information School, University of Washington
Bree Norlander: Information School, University of Washington
Carole Palmer: Information School, University of Washington
Kaitlin Throgmorton: Information School, University of Washington
1. Mulesoft. (n.d.). What is an API? (Application Programming Interface). Retreived March 7, 2019, from https://www.mulesoft.com/cn/resources/api/what-is-an-api
2. Wikipedia. (2019, March 5). Application programming interface. Retrieved March 7, 2019, from https://en.wikipedia.org/wiki/Application_programming_interface#Web_APIs
3. Maddox, S. (2014, February 16). API Types. Retrieved March 7, 2019, from https://ffeathers.wordpress.com/2014/02/16/api-types/