inegiR v2
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
After a lot of slacking around, I finally got to finishing the upgraded version of the inegiR package on CRAN. This version combines quite a few changes that I will explain further in this post.
New language
The biggest change upfront is the migration to english in both function names and documentation. The rationale behind this is to make it more accessible to developers around the world (I have recieved a few emails asking for translations). Also, the non-ASCII characters were not helpful. For the Mexican users, I assume that if you know R, you can probably find yourself around an english document.
To avoid crashing workflows, I left the legacy functions intact except for a warning to use the english version instead. An example of this is the commercial growth rate functions, which are:
Route API
With some help from Arturo Cárdenas and a revamp of the Sákbe API in INEGI, I was able to add functions to access route information.
The two main ones are:
The first thing to understand is that INEGI has categorized sites in Mexico according to a “destiny id”. For example, the International Airport in Mexico city is destiny id #57. The inegi_destiny() function will help you find a destiny id based on a text criteria, sort of like googling the place and getting an address. Here is an example with a plaza in Monterrey:
ID | ID_DEST | STATE | NAME | GEO_STRING | TYPE | LAT | LONG |
---|---|---|---|---|---|---|---|
destino | 6940 | N.L. | Macroplaza, Monterrey | {“type”:”Point”,”coordinates”:[-100.309991587,25.668862054]} | Point | -100.3100 | 25.66886 |
destino | 20237 | B.C. | Macroplaza del Valle, Mexicali | {“type”:”Point”,”coordinates”:[-115.50790804,32.62128025]} | Point | -115.5079 | 32.62128 |
destino | 17891 | Coah. | Macroplaza, Acuña | {“type”:”Point”,”coordinates”:[-100.978421457,29.3299882860001]} | Point | -100.9784 | 29.32999 |
When you know two destiny id’s, you can now use the API to learn about potential routes you can take between them. This function will return a list with two objects: a data.frame of route information (kilometers, toll cost, etc) and another data.frame with all the coordinates in the route. Intuitively, if you join all the dots, you can clearly see the route you would take.
To illustrate, i’m going to use the first result and see what the route would be from there to the U.S. Border (which is the other id) with a normal car and with a tolled highway. A further look at the documentation will explain the names and options in the parameters.
As you can see, the returning element is a list of two data.frame objects. The first will give us basic statistics about the route.
KMS | TIME_MINS | TIME_HRS | HAS_TOLL | TOLL_COST | TOTAL_COST |
---|---|---|---|---|---|
222.36 | 151.11 | 2.5185 | TRUE | 364 | NA |
The total cost is NA because the default value for the calc_cost parameter is FALSE. When this is set to TRUE, the function will additionally look for the price of gasoline in the Sakbé API and calculate a cost of the trip. Be warned, this is very experimental and it is just a rule of thumb (you can see the documentation for a further explanation). Once the price of gasoline is calculated, any tolls are added and then a total cost is supplied. To do this, just change the parameter.
KMS | TIME_MINS | TIME_HRS | HAS_TOLL | TOLL_COST | TOTAL_COST |
---|---|---|---|---|---|
222.36 | 151.11 | 2.5185 | TRUE | 364 | 757.1729 |
All prices are reported in Mexican pesos.
The second element in the list is the data.frame containing all point references in the route. As I said before, just connect the dots. Here is a preview:
LONGITUD | LATITUD | INDEX |
---|---|---|
-100.3125 | 25.66238 | 1 |
-100.3125 | 25.66231 | 2 |
-100.3124 | 25.66225 | 3 |
-100.3124 | 25.66222 | 4 |
-100.3124 | 25.66220 | 5 |
-100.3124 | 25.66215 | 6 |
For this particular route, I added the dots in Google maps to show this better:
New GDP catalog
Another huge issue that users reported was trying to find relevant indicator id’s in the INEGI webpage. As experienced users know, every economic data series has a unique id on the API. However, there is no catalog that allows you to find the id’s you are looking for. I have petitioned INEGI multiple times but got nowhere.
My personal solution was to look up the series in the BIE application (a web browser version of the API) and download the data as a .iqy object. From there, I would hack my way into the file to find the unique id’s being called. Very time intensive and error-prone.
So, to help each other out in this endeavour, I created a catalog of id’s. This version has all the sub-levels of GDP (up until 4th level desagregation), but I plan to update this catalog on a rolling basis. Any help would also be appreciated.
You can see the catalog by calling the dataset like this:
NAME | LEVEL_2 | LEVEL_3 | LEVEL_4 | UNITS | BASE | FREQUENCY |
---|---|---|---|---|---|---|
PIB | TOTAL | TOTAL | TOTAL | MILLIONS OF 2008 PESOS | 2008 | TRIMESTRAL |
PIB - IMPUESTOS A PRODUCTOS NETOS | IMPUESTOS A PRODUCTOS NETOS | TOTAL | TOTAL | MILLIONS OF 2008 PESOS | 2008 | TRIMESTRAL |
PIB - VALOR AGREGADO BRUTO | VALOR AGREGADO BRUTO | TOTAL | TOTAL | MILLIONS OF 2008 PESOS | 2008 | TRIMESTRAL |
PIB - ACTIVIDADES PRIMARIAS | ACTIVIDADES PRIMARIAS | TOTAL | TOTAL | MILLIONS OF 2008 PESOS | 2008 | TRIMESTRAL |
PIB - ACTIVIDADES PRIMARIAS - AGRICULTURA | ACTIVIDADES PRIMARIAS | AGRICULTURA | TOTAL | MILLIONS OF 2008 PESOS | 2008 | TRIMESTRAL |
PIB - ACTIVIDADES SECUNDARIAS | ACTIVIDADES SECUNDARIAS | TOTAL | TOTAL | MILLIONS OF 2008 PESOS | 2008 | TRIMESTRAL |
Compact metadata and series helper
Two other common headaches came up with the past versions. First, the inegi_series() functions only accepted the full URL when most of the times, the only thing that changed between them was the number of the id. So I added a simple function to paste the entire URL string for the call to the API.
The second headache had to do with downloading multiple id’s. The list returned when using inegi_series() with the metadata parameter as TRUE is a bit clunky when using it in a loop or apply function. So I added a compact function that returns all the information in a tidy data.frame:
Values | Dates | Name | Update | Region | Units | Indicator | Frequency |
---|---|---|---|---|---|---|---|
7945204 | 1993-01-01 | Producto interno bruto, a precios de mercado | 2017/08/22 | Nacional | Millones de pesos a precios de 2008 | 381016 | Trimestral |
7939362 | 1993-04-01 | Producto interno bruto, a precios de mercado | 2017/08/22 | Nacional | Millones de pesos a precios de 2008 | 381016 | Trimestral |
7954943 | 1993-07-01 | Producto interno bruto, a precios de mercado | 2017/08/22 | Nacional | Millones de pesos a precios de 2008 | 381016 | Trimestral |
8268036 | 1993-10-01 | Producto interno bruto, a precios de mercado | 2017/08/22 | Nacional | Millones de pesos a precios de 2008 | 381016 | Trimestral |
8210538 | 1994-01-01 | Producto interno bruto, a precios de mercado | 2017/08/22 | Nacional | Millones de pesos a precios de 2008 | 381016 | Trimestral |
8413362 | 1994-04-01 | Producto interno bruto, a precios de mercado | 2017/08/22 | Nacional | Millones de pesos a precios de 2008 | 381016 | Trimestral |
I hope this update is useful to everyone doing data science with Mexican stats. Any new suggestions or questiosn are welcome via twitter or a github issue request.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.