Introduction to the RAQSAPI package

RAQSAPI is a package for R that connects the R programming language environment to the United States Environmental Protection Agency’s (US EPA) Air Quality System (AQS) Data Mart database API for retrieval of ambient air pollution data.

Warning: US EPA’s AQS Data Mart API V2 is currently in beta phase of development, the API interface has not been finalized. This means that certain functionality of the API may change or be removed without notice. As a result, this package is also currently marked as beta and may also change to reflect any changes made to the Data Mart API or in respect to improvements in the design, functionality, quality and documentation of this package. The authors assume no liability for any problems that may occur as a result of using this package, the Data Mart service, any software, service, hardware, or user accounts that may utilize this package.

This software/application was developed by the U.S. Environmental Protection Agency (USEPA). No warranty expressed or implied is made regarding the accuracy or utility of the system, nor shall the act of distribution constitute any such warranty. The USEPA has relinquished control of the information and no longer has responsibility to protect the integrity, confidentiality or availability of the information. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by the USEPA. The USEPA seal and logo shall not be used in any manner to imply endorsement of any commercial product or activity by the USEPA or the United States Government.

The RAQSAPI package for the R programming environment allows a R programming environment to connect to and retrieve data from the United States Environmental Protection Agency’s (US EPA) Air Quality System (AQS) Data Mart API v2 Air Quality System 1 interface directly. This package enables the data user to omit legacy challenges including coercing data from a JSON object to a usable R object, retrieving multiple years of data, formatting API requests, retrieving results, handling credentials, requesting multiple pollutant data and rate limiting data requests. All the basic functionality of the API have been implemented that are available from the AQS API Data Mart server. The library connects to AQS Data Mart API via Hypertext Transfer Protocol (HTTP) so there is no need to install external ODBC drivers, configure ODBC connections or deal with the security vulnerabilities associated with them. Most functions have a parameter, return_header which by default is set to FALSE. If the user decides to set return_header to TRUE, then that function will return a R AQS_DATAMART_APIv2 S3 object which is a two item named list. The first item, ($Header) in the AQS_DATAMART_APIv2 object is a tibble 2 which contains the header information. The Header contains status information regarding the request (success/fail), any applicable error messages returned from the API, if any exist, the URL used in the request, a date and time stamp noting when request was received and other useful information. The second item of the AQS_DATAMART_APIv2 object ($Data) is a tibble which contains the actual data being requested. For functions with the return_header option set to FALSE (default) a simple tibble is returned with just the $Data portion of the request. After each call to the API a five second stall is invoked to help prevent overloading the Data Mart API server and to serve as a simple rate limit.

EPA’s AQS Datamart API, the service that RAQSAPI retrieves data from, does not host real time (collected now/today) data. If real time data is needed, please use the AirNow API and direct all questions toward real time data there. RAQSAPI does not work with AirNow and cannot retrieve real time data. For more details see section 7.1 of the About AQS Data page 3 .

To install the development version of RAQSAPI , first if not already installed, install the remotes package and its dependencies. Then run the following in a R environment.

Either install the stable version from CRAN or install the latest development version from GitHub.

Using The RAQSAPI library

Load RAQSAPI

after successfully installing the RAQSAPI package, load the RAQSAPI library:

library

(RAQSAPI)

(RAQSAPI)

Sign up and setting up user credentials with the RAQSAPI library

If you have not already done so you will need to sign up with AQS Data Mart using aqs_sign_up function, this function takes one input, “email,” which is a R character object, that represents the email address that you want to use as a user credential to the AQS Data Mart service. After a successful call to aqs_sign_up an email message will be sent to the email address provided with a new Data Mart key which will be used as a credential key to access the Data Mart API. The aqs_sign_up function can also be used to regenerate a new key for an existing user, to generate a new key simply call the aqs_sign_up function with the parameter “email” set to an existing account. A new key will be e-mailed to the account given.

The credentials used to access the Data Mart API service are stored in a R environment variable that needs to be set every time the RAQSAPI library is attached or the key is changed. Without valid credentials, the Data Mart server will reject any request sent to it. The key used with Data Mart is a key and is not a password, so the RAQSAPI library does not treat the key as a password; this means that the key is stored in plain text and there are no attempts to encrypt Data Mart credentials as would be done for a username and password combination. The key that is supplied to use with Data Mart is not intended for authentication but only account monitoring. Each time RAQSAPI is loaded and before using any of it’s functions use the aqs_credentials function to enter in the user credentials so that RAQSAPI can access the AQS Data Mart server.

Note: The credentials used to access AQS Data Mart
API is not the same as the credentials used to access AQS. AQS users who do
not have access to the AQS Data Mart will need to create new credentials.

(suggested) Use the keyring package to manage credentials

It is highly suggested that users use a keyring manager to store and retrieve their credentials while using RAQSAPI. One such credential manager is provided by the keyring package. The Keyring package uses the credential manager available for most popular operating systems to store and manage user credentials. This will help avoid hard coding credential information into R scripts.

To use the keyring package with RAQSAPI first install keyring:

install.package(“keyring”)

Ensure that your system is supported by the keyring package before proceeding.

keyring::has_keyring_support()

then set the keyring used to access AQS Data Mart (make sure to replace the text in the angled brackets with your specific user information):

library(“keyring”) keyring::key_set(service = “AQSDatamart,” username = “<user email account>”)

a popup window will appear for the user to input their keyring information. Enter the AQS Data mart credential key associated with the AQS user name provided, then hit enter. Now the AQS Data Mart user credential is set using keyring.

To retrieve the keyring to use with RAQSAPI load the keyring package and use the function key_get to return the user credential to RAQSAPI:

library(RAQSAPI) library(keyring) datamartAPI_user <- <user email account> server <- “AQSDatamart”

then pass these variables to the aqs_credentials function when using RAQSAPI:

aqs_credentials(username = datamartAPI_user, key = key_get(service = server, username = datamartAPI_user ) )

To change the keyring stored with the keyring package repeat the steps above to call the keyring::key_set function again with the new credential information.

To retrieve a list of all keyrings managed with the keyring package use the function: > keyring::key_list()

Refer the thekeyring package documentation for an in depth explanation on using the keyring package.

Usage tips and precautions

This section contains suggestions for completing certain data related tasks.

  • Determine if or how much data exists for a time-parameter-geography combination:
    • Retrieve data using the annualdata service.
    • If no records are returned, we do not have the data.
    • If records are returned, use the observation count to determine the temporal and geographic distribution of the data.
  • Monthly averages:
    • AQS does not routinely calculate monthly aggregate statistics.
    • If you need these, you must calculate them yourself.
    • These can be calculated from the sample data or the daily data without loss of fidelity.
  • Determine a single value for a site with collocated monitors:
    • Many sites will have collocated monitors – monitors collecting the same parameter at the same time.
    • The API currently provides only monitor level values. (site-level values will be added in the future.)
    • For some criteria pollutants (PM2.5, ozone, lead, and NO2), the regulations define procedures for defining a single site-level value.
    • For other pollutants, determining a single site-level value is left to the investigator.
  • Please adhere to the following when using the AQS Data Mart API:
    • Limit the size of queries. The AQS Data Mart contains billions of values and you may request more than you intend. If you are unsure of the amount of data, start small and work your way up. Please limit queries to 1,000,000 rows of data each. You can use the “observation count” field on the annualdata service to determine how much data exists for a time-parameter-geography combination.
    • Limit the frequency of queries. The AQS Data Mart can process a limited load. Please wait for one request to complete before submitting another and do not make more than 10 requests per minute.
    • Be advised that RAQSAPI is capable of retrieving results for multiple pollutants, this can result in the amount of data being returned being multiplied by the number of pollutants being requested.
    • Be advised that the AQS Data Mart API limits certain data requests to one year of data at a time with the exception of the Monitor service. In order to retrieve multiple years of data for these functions the RAQSAPI library conveniently sends multiple API requests to the Data Mart API server, one request for each year, this can result in the amount of data being returned being multiplied by the number of years of data being requested.

The AQS Data Mart administrators may disable accounts without notice for failure to adhere to these terms (Though they will contact the offending user via the email address provided)