New R package: GetCVMData
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Package GetCVMData
is an alternative to GetDFPData
. Both have the same objective: fetch corporate data of Brazilian companies trading at B3, but diverge in their source. While GetDFPData
imports data directly from the DFP and FRE systems, GetCVMData
uses the CVM ftp site for grabbing compiled .csv files.
When doing large scale importations, GetDFPData
fells sluggish due to the parsing of large xml files. As an example, building the dataset available in my data page takes around six hours of execution using 10 cores of my home computer.
GetCVMData
is lean and fast. Since the data is already parsed in csv files, all the code does is organize the files, download and read. For comparison, all DFP documents, annual financial reports, available in CVM can be imported in less than 1 minute. Additionally, GetCVMData
can also parse ITR (quarterly) data, which was not available in GetDFPData
.
However, be aware that the output data is not the same. I kept all original column names from CVM and some information, such as tickers, are not available in GetCVMData
.
Here’s an example of usage:
if (!require(devtools)) install.packages('devtools') if (!require(GetCVMData)) devtools::install_github('msperlin/GetCVMData') # not in CRAN yet library(GetCVMData) library(tidyverse) # fetch information about companies df_info <- get_info_companies() # search for companies df_search <- search_company('odontoprev') # DFP annual data id_cvm <- df_search$CD_CVM[1] # use NULL for all companies df_dfp <- get_dfp_data(companies_cvm_codes = id_cvm, first_year = 2015, last_year = 2019, type_docs = 'DRE|BPA|BPP', # income, assets, liabilities type_format = 'con' # consolidated statements ) glimpse(df_dfp) # ITR quarterly data df_itr <- get_itr_data(companies_cvm_codes = id_cvm, first_year = 2010, last_year = 2020, type_docs = 'DRE|BPA|BPP', # income, assets, liabilities type_format = 'con' # consolidated statements ) glimpse(df_itr) # FRE data (not yet implemented..) #df_fre <- get_fre_data() ## Rows: 897 ## Columns: 16 ## $ CNPJ_CIA <chr> "58.119.199/0001-51", "58.119.199/0001-51", "58.119.199/… ## $ DT_REFER <date> 2015-12-31, 2015-12-31, 2015-12-31, 2015-12-31, 2015-12… ## $ VERSAO <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,… ## $ DENOM_CIA <chr> "ODONTOPREV S.A.", "ODONTOPREV S.A.", "ODONTOPREV S.A.",… ## $ CD_CVM <dbl> 20125, 20125, 20125, 20125, 20125, 20125, 20125, 20125, … ## $ GRUPO_DFP <chr> "DF Consolidado - Demonstração do Resultado", "DF Consol… ## $ MOEDA <chr> "REAL", "REAL", "REAL", "REAL", "REAL", "REAL", "REAL", … ## $ ESCALA_MOEDA <chr> "MILHAR", "MILHAR", "MILHAR", "MILHAR", "MILHAR", "MILHA… ## $ ORDEM_EXERC <chr> "ÚLTIMO", "ÚLTIMO", "ÚLTIMO", "ÚLTIMO", "ÚLTIMO", "ÚLTIM… ## $ DT_INI_EXERC <date> 2015-01-01, 2015-01-01, 2015-01-01, 2015-01-01, 2015-01… ## $ DT_FIM_EXERC <date> 2015-12-31, 2015-12-31, 2015-12-31, 2015-12-31, 2015-12… ## $ CD_CONTA <chr> "3.01", "3.02", "3.03", "3.04", "3.04.01", "3.04.02", "3… ## $ DS_CONTA <chr> "Receita de Venda de Bens e/ou Serviços", "Custo dos Ben… ## $ VL_CONTA <dbl> 124969100, -58252800, 66716300, -35850300, -12007400, -1… ## $ cnpj_number <dbl> 5.81192e+13, 5.81192e+13, 5.81192e+13, 5.81192e+13, 5.81… ## $ source_file <chr> "dre_cia_aberta_con_2015.csv", "dre_cia_aberta_con_2015.… ## Rows: 4,868 ## Columns: 16 ## $ CNPJ_CIA <chr> "58.119.199/0001-51", "58.119.199/0001-51", "58.119.199/… ## $ DT_REFER <date> 2011-03-31, 2011-03-31, 2011-03-31, 2011-03-31, 2011-03… ## $ VERSAO <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,… ## $ DENOM_CIA <chr> "ODONTOPREV S.A.", "ODONTOPREV S.A.", "ODONTOPREV S.A.",… ## $ CD_CVM <dbl> 20125, 20125, 20125, 20125, 20125, 20125, 20125, 20125, … ## $ GRUPO_DFP <chr> "DF Consolidado - Balanço Patrimonial Ativo", "DF Consol… ## $ MOEDA <chr> "REAL", "REAL", "REAL", "REAL", "REAL", "REAL", "REAL", … ## $ ESCALA_MOEDA <chr> "MIL", "MIL", "MIL", "MIL", "MIL", "MIL", "MIL", "MIL", … ## $ ORDEM_EXERC <chr> "ÚLTIMO", "ÚLTIMO", "ÚLTIMO", "ÚLTIMO", "ÚLTIMO", "ÚLTIM… ## $ DT_FIM_EXERC <date> 2011-03-31, 2011-03-31, 2011-03-31, 2011-03-31, 2011-03… ## $ CD_CONTA <chr> "1", "1.01", "1.01.01", "1.01.02", "1.01.02.01", "1.01.0… ## $ DS_CONTA <chr> "Ativo Total", "Ativo Circulante", "Caixa e Equivalentes… ## $ VL_CONTA <dbl> 9.60803e+15, 2.41649e+15, 8.35200e+13, 1.67052e+15, 1.67… ## $ cnpj_number <dbl> 5.81192e+13, 5.81192e+13, 5.81192e+13, 5.81192e+13, 5.81… ## $ source_file <chr> "itr_cia_aberta_bpa_con_2011.csv", "itr_cia_aberta_bpa_c… ## $ DT_INI_EXERC <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
Lets plot the quarterly profit of ODONTOPREV S.A.:
quarterly_profits <- df_itr %>% filter(GRUPO_DFP == 'DF Consolidado - Demonstração do Resultado', DS_CONTA == 'Lucro/Prejuízo Consolidado do Período') # plot it p <- ggplot(quarterly_profits, aes(x = DT_FIM_EXERC, y = VL_CONTA)) + geom_col() + theme_bw() + labs(title = paste0('Quarterly profits of ', quarterly_profits$DENOM_CIA[1]), caption = 'Data from CVM', x = '', y = 'Consolidade Profits') p
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.