Tidygeocoder 1.0.4

[This article was first published on Jesse Cambon-R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Tidygeocoder v1.0.4 is released! ? This release adds support for the Geoapify geocoding service (thanks Daniel Possenriede!), a progress bar, more helpful console output, and new functions for combining the results of multiple geocoding queries. A more detailed overview of the changes in this release is available in the changelog.

Progress Bars and Console Output

Progress bars are now displayed for single input geocoding queries (ie. not batch queries). Additionally, console messages now by default show which geocoding service was used, how many addresses or coordinates were given to it, and how long the query took to execute.

The progress_bar parameter can be used to toggle the use of the progress bar while the quiet parameter can be used to silence console messages that are displayed by default. See the documentation for geo() or reverse_geo() for details.

Additionally, the quiet, progress_bar, and verbose parameters can now be set permanently via options(). For example, options(tidygeocoder.progress_bar = FALSE) will disable progress bars for all queries.

Combining Multiple Queries

In past releases of the package, method = "cascade" could be used in the geo() and geocode() functions to combine the results of geocoding queries from two different services. The “cascade” method is now deprecated in favor of two new and more flexible functions: geocode_combine() and geo_combine(). These functions allow for executing and combining the results of more than two queries and they allow the queries to be fully customized.

To demonstrate the utility of these new functions, below I’ve assembled a dataset of addresses to be geocoded. The first 5 are street level addresses in the United States that can be geocoded with the US Census geocoding service. However, three of these addresses will not return results with the US Census batch service (see issue #87 for more information) and must instead be geocoded with the US Census single address geocoder. Also, the last three addresses are cities outside the United States and require a different geocoding service entirely (the US Census service is limited to the United States).

library(tidyverse)
library(tidygeocoder)

mixed_addresses <- tribble(
  ~street_address, ~city, ~state_cd, ~zip_cd,
  "624 W DAVIS ST #1D",   "BURLINGTON", "NC",  27215,
  "201 E CENTER ST #268", "MEBANE",     "NC",  27302,
  "7833  WOLFE LN",       "SNOW CAMP",  "NC",  27349,
  "202 C St",             "San Diego",  "CA",  92101,
  "121 N Rouse Ave",      "Bozeman",    "MT",  59715
) %>%
  bind_rows(
    tibble(city = c('Taipei', 'Moscow', 'Buenos Aires'))
  )

If we wanted to geocode a large dataset with addresses such as these, we might first try to geocode as many as possible via the US Census batch service, then attempt the remaining addresses with the US Census single address geocoder, and then finally send any remaining unfound addresses to another service. We’ll accomplish this workflow in the code below.

The geocode_combine() function accepts a dataframe input and a list of queries provided as lists (ie. a list of lists). Each list in the queries argument contains parameters that are passed to the geocode() function. Optionally, the query_names argument can be used to specify a label to be used for each query’s results.

Below, the street, city, state, and postalcode arguments are specified for the first two queries while the address argument (ie. single line address) is pointed at the city column for the third query (the ArcGIS service only accepts a single line address argument and doesn’t use address component arguments like city and state).

results <- mixed_addresses %>%
  geocode_combine(
    queries = list(
      list(method = 'census', mode = 'batch', 
        street = 'street_address', city = 'city', state = 'state_cd', postalcode = 'zip_cd'),
      list(method = 'census', mode = 'single',
           street = 'street_address', city = 'city', state = 'state_cd', postalcode = 'zip_cd'),
      list(method = 'arcgis', address = 'city')
    ),
    query_names = c('census - batch', 'census - single', 'arcgis')
  )

## 

## Passing 8 addresses to the US Census batch geocoder

## Query completed in: 1.9 seconds

## Passing 6 addresses to the US Census single address geocoder

## Query completed in: 3.6 seconds

## Passing 3 addresses to the ArcGIS single address geocoder

## Query completed in: 1.4 seconds
street_address city state_cd zip_cd lat long query
624 W DAVIS ST #1D BURLINGTON NC 27215 36.09598 -79.44453 census – single
201 E CENTER ST #268 MEBANE NC 27302 36.09683 -79.26977 census – single
7833 WOLFE LN SNOW CAMP NC 27349 35.89866 -79.43713 census – single
202 C St San Diego CA 92101 32.71676 -117.16283 census – batch
121 N Rouse Ave Bozeman MT 59715 45.68066 -111.03203 census – batch
NA Taipei NA NA 25.03737 121.56355 arcgis
NA Moscow NA NA 55.75696 37.61502 arcgis
NA Buenos Aires NA NA -34.60849 -58.37344 arcgis

By default the results of the queries are combined into a single dataframe as shown above and the query column shows which query produced each result. Alternatively, the results of each query can be returned as separate dataframes in a list by using return_list = TRUE.

By default, only addresses that are not found in a query are passed to the subsequent query. However, setting cascade = FALSE will pass all addresses to all queries. See the documentation for the geocode_combine() function for more usage details.

Package Housekeeping

The return_type, geocodio_v, mapbox_permanent, mapquest_open, iq_region, and here_request_id parameters are now deprecated in favor of the new api_options parameter. For instance, instead of using return_type = "geographies" you should now instead use api_options = list(census_return_type = "geographies").

Also, the cascade_order, param_error, and batch_limit_error parameters in geo() are now deprecated as they were only required because of the deprecated “cascade” method. Refer to the documentation for geo() or reverse_geo() for details.

To leave a comment for the author, please follow the link and comment on their blog: Jesse Cambon-R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)