inegiR v2

Posted on March 28, 2018 by En El Margen - R-English in R bloggers | 0 Comments

[This article was first published on En El Margen - R-English, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

After a lot of slacking around, I finally got to finishing the upgraded version of the inegiR package on CRAN. This version combines quite a few changes that I will explain further in this post.

New language

The biggest change upfront is the migration to english in both function names and documentation. The rationale behind this is to make it more accessible to developers around the world (I have recieved a few emails asking for translations). Also, the non-ASCII characters were not helpful. For the Mexican users, I assume that if you know R, you can probably find yourself around an english document.

To avoid crashing workflows, I left the legacy functions intact except for a warning to use the english version instead. An example of this is the commercial growth rate functions, which are:

# english
rate_commerce()
# spanish (old version)
tasa_comercio()

Route API

With some help from Arturo Cárdenas and a revamp of the Sákbe API in INEGI, I was able to add functions to access route information.

The two main ones are:

# to search for a destiny id
inegi_destiny()
# to get route information
inegi_route()

The first thing to understand is that INEGI has categorized sites in Mexico according to a “destiny id”. For example, the International Airport in Mexico city is destiny id #57. The inegi_destiny() function will help you find a destiny id based on a text criteria, sort of like googling the place and getting an address. Here is an example with a plaza in Monterrey:

# download on CRAN or newest dev version (if not accepted yet)
# install.packages("inegiR")
# or... 
# devtools::install_github("eflores89/inegiR")
library(inegiR)
library(knitr)
# to search for Macroplaza destiny id
token <- "mytoken"
destiny1 <- inegi_destiny("Macroplaza", token = token)
kable(destiny1)

ID	ID_DEST	STATE	NAME	GEO_STRING	TYPE	LAT	LONG
destino	6940	N.L.	Macroplaza, Monterrey	{“type”:”Point”,”coordinates”:[-100.309991587,25.668862054]}	Point	-100.3100	25.66886
destino	20237	B.C.	Macroplaza del Valle, Mexicali	{“type”:”Point”,”coordinates”:[-115.50790804,32.62128025]}	Point	-115.5079	32.62128
destino	17891	Coah.	Macroplaza, Acuña	{“type”:”Point”,”coordinates”:[-100.978421457,29.3299882860001]}	Point	-100.9784	29.32999

When you know two destiny id’s, you can now use the API to learn about potential routes you can take between them. This function will return a list with two objects: a data.frame of route information (kilometers, toll cost, etc) and another data.frame with all the coordinates in the route. Intuitively, if you join all the dots, you can clearly see the route you would take.

To illustrate, i’m going to use the first result and see what the route would be from there to the U.S. Border (which is the other id) with a normal car and with a tolled highway. A further look at the documentation will explain the names and options in the parameters.

route <- inegi_route(from = 6940, to = 7426, token = token, pref = 1, vehicle = 1)
str(route)
# List of 2
#  $ ROUTE          :'data.frame':	1 obs. of  6 variables:
#   ..$ KMS       : num 222
#   ..$ TIME_MINS : num 151
#   ..$ TIME_HRS  : num 2.52
#   ..$ HAS_TOLL  : logi TRUE
#   ..$ TOLL_COST : num 364
#   ..$ TOTAL_COST: logi NA
#  $ COORDINATE_PATH:'data.frame':	1176 obs. of  2 variables:
#   ..$ V1: num [1:1176] -100 -100 -100 -100 -100 ...
#   ..$ V2: num [1:1176] 25.7 25.7 25.7 25.7 25.7 ...

As you can see, the returning element is a list of two data.frame objects. The first will give us basic statistics about the route.

kable(route$ROUTE)

KMS	TIME_MINS	TIME_HRS	HAS_TOLL	TOLL_COST	TOTAL_COST
222.36	151.11	2.5185	TRUE	364	NA

The total cost is NA because the default value for the calc_cost parameter is FALSE. When this is set to TRUE, the function will additionally look for the price of gasoline in the Sakbé API and calculate a cost of the trip. Be warned, this is very experimental and it is just a rule of thumb (you can see the documentation for a further explanation). Once the price of gasoline is calculated, any tolls are added and then a total cost is supplied. To do this, just change the parameter.

route2 <- inegi_route(from = 6940, to = 7426, token = token, pref = 1, vehicle = 1, 
                      calc_cost = TRUE)
kable(route2$ROUTE)

KMS	TIME_MINS	TIME_HRS	HAS_TOLL	TOLL_COST	TOTAL_COST
222.36	151.11	2.5185	TRUE	364	757.1729

All prices are reported in Mexican pesos.

The second element in the list is the data.frame containing all point references in the route. As I said before, just connect the dots. Here is a preview:

kable(head(route$COORDINATE_PATH))

LONGITUD	LATITUD	INDEX
-100.3125	25.66238	1
-100.3125	25.66231	2
-100.3124	25.66225	3
-100.3124	25.66222	4
-100.3124	25.66220	5
-100.3124	25.66215	6

For this particular route, I added the dots in Google maps to show this better:

New GDP catalog

Another huge issue that users reported was trying to find relevant indicator id’s in the INEGI webpage. As experienced users know, every economic data series has a unique id on the API. However, there is no catalog that allows you to find the id’s you are looking for. I have petitioned INEGI multiple times but got nowhere.

My personal solution was to look up the series in the BIE application (a web browser version of the API) and download the data as a .iqy object. From there, I would hack my way into the file to find the unique id’s being called. Very time intensive and error-prone.

So, to help each other out in this endeavour, I created a catalog of id’s. This version has all the sub-levels of GDP (up until 4th level desagregation), but I plan to update this catalog on a rolling basis. Any help would also be appreciated.

You can see the catalog by calling the dataset like this:

data("inegi_catalog")
kable(head(inegi_catalog[,1:7]))
# for more rows, see docs!

NAME	LEVEL_2	LEVEL_3	LEVEL_4	UNITS	BASE	FREQUENCY
PIB	TOTAL	TOTAL	TOTAL	MILLIONS OF 2008 PESOS	2008	TRIMESTRAL
PIB - IMPUESTOS A PRODUCTOS NETOS	IMPUESTOS A PRODUCTOS NETOS	TOTAL	TOTAL	MILLIONS OF 2008 PESOS	2008	TRIMESTRAL
PIB - VALOR AGREGADO BRUTO	VALOR AGREGADO BRUTO	TOTAL	TOTAL	MILLIONS OF 2008 PESOS	2008	TRIMESTRAL
PIB - ACTIVIDADES PRIMARIAS	ACTIVIDADES PRIMARIAS	TOTAL	TOTAL	MILLIONS OF 2008 PESOS	2008	TRIMESTRAL
PIB - ACTIVIDADES PRIMARIAS - AGRICULTURA	ACTIVIDADES PRIMARIAS	AGRICULTURA	TOTAL	MILLIONS OF 2008 PESOS	2008	TRIMESTRAL
PIB - ACTIVIDADES SECUNDARIAS	ACTIVIDADES SECUNDARIAS	TOTAL	TOTAL	MILLIONS OF 2008 PESOS	2008	TRIMESTRAL

Compact metadata and series helper

Two other common headaches came up with the past versions. First, the inegi_series() functions only accepted the full URL when most of the times, the only thing that changed between them was the number of the id. So I added a simple function to paste the entire URL string for the call to the API.

GPD_ID <- 381016
inegi_code(381016)
# "http://www3.inegi.org.mx/sistemas/api/indicadores/v1//Indicador/381016/00000/es/false/xml/"

The second headache had to do with downloading multiple id’s. The list returned when using inegi_series() with the metadata parameter as TRUE is a bit clunky when using it in a loop or apply function. So I added a compact function that returns all the information in a tidy data.frame:

token_inegi <- "mytoken"
df <- compact_inegi_series(inegi_code(381016), token_inegi)
kable(head(df))

Values	Dates	Name	Update	Region	Units	Indicator	Frequency
7945204	1993-01-01	Producto interno bruto, a precios de mercado	2017/08/22	Nacional	Millones de pesos a precios de 2008	381016	Trimestral
7939362	1993-04-01	Producto interno bruto, a precios de mercado	2017/08/22	Nacional	Millones de pesos a precios de 2008	381016	Trimestral
7954943	1993-07-01	Producto interno bruto, a precios de mercado	2017/08/22	Nacional	Millones de pesos a precios de 2008	381016	Trimestral
8268036	1993-10-01	Producto interno bruto, a precios de mercado	2017/08/22	Nacional	Millones de pesos a precios de 2008	381016	Trimestral
8210538	1994-01-01	Producto interno bruto, a precios de mercado	2017/08/22	Nacional	Millones de pesos a precios de 2008	381016	Trimestral
8413362	1994-04-01	Producto interno bruto, a precios de mercado	2017/08/22	Nacional	Millones de pesos a precios de 2008	381016	Trimestral

I hope this update is useful to everyone doing data science with Mexican stats. Any new suggestions or questiosn are welcome via twitter or a github issue request.

To leave a comment for the author, please follow the link and comment on their blog: En El Margen - R-English.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

inegiR v2

New language

Route API

New GDP catalog

Compact metadata and series helper

Related

New language

Route API

New GDP catalog

Compact metadata and series helper

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)