R code snippet : Transform from long format to wide format
[This article was first published on SH Fintech Modeling, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This post introduces a useful R code snippet for transforming the long format data to the wide format. We occasionally encounter the long format data such as yield curve data since yield curve has two dimensions : maturity and date. For this end, we can use reshape() R built-in function. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Long and Wide formats
What is the long and wide format? A picture paints a thousand words.
We want to transform the long format data to the wide format data for a panel of time series analysis such as a term structure of interest rates. As financial data is usually extracted from database system, we occasionally encounter the long format data. For example, the data in the above figure is a sample of Euro area yield curve which has the long format. To facilitate an empirical analysis, the wide format is appropriate.
Transforming between the long and wide format can be carried out by using reshape() R function. No further explanation is needed. Let’s see the R code below.
R code
The following R code read sample data and transform the long format to the wide format and vice versa. When using the reshape() function, we need to set the direction argument as “long” or “wide”. In particular, we need to add new column name with some delimitator (., _, etc) to the wide format data when we transform it the long format.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | #========================================================# # Quantitative ALM, Financial Econometrics & Derivatives # ML/DL using R, Python, Tensorflow by Sang-Heon Lee # # https://kiandlee.blogspot.com #——————————————————–# # Long to wide format and vice versa for yield data #========================================================# graphics.off(); rm(list = ls()) # sample data : ECB zero yields str_data <– “term date rate 3M 2021-01-29 -0.625 3M 2021-02-26 -0.612 3M 2021-03-31 -0.636 3M 2021-04-30 -0.628 3M 2021-05-31 -0.632 3M 2021-06-30 -0.650 3M 2021-07-30 -0.663 3M 2021-08-31 -0.676 3M 2021-09-30 -0.712 3M 2021-10-29 -0.736 3M 2021-11-30 -0.895 3M 2021-12-31 -0.731 3Y 2021-01-29 -0.771 3Y 2021-02-26 -0.648 3Y 2021-03-31 -0.711 3Y 2021-04-30 -0.684 3Y 2021-05-31 -0.666 3Y 2021-06-30 -0.672 3Y 2021-07-30 -0.813 3Y 2021-08-31 -0.760 3Y 2021-09-30 -0.677 3Y 2021-10-29 -0.537 3Y 2021-11-30 -0.766 3Y 2021-12-31 -0.620 10Y 2021-01-29 -0.512 10Y 2021-02-26 -0.246 10Y 2021-03-31 -0.279 10Y 2021-04-30 -0.180 10Y 2021-05-31 -0.146 10Y 2021-06-30 -0.203 10Y 2021-07-30 -0.440 10Y 2021-08-31 -0.393 10Y 2021-09-30 -0.170 10Y 2021-10-29 -0.069 10Y 2021-11-30 -0.350 10Y 2021-12-31 -0.188 20Y 2021-01-29 -0.176 20Y 2021-02-26 0.103 20Y 2021-03-31 0.142 20Y 2021-04-30 0.252 20Y 2021-05-31 0.287 20Y 2021-06-30 0.201 20Y 2021-07-30 -0.059 20Y 2021-08-31 -0.033 20Y 2021-09-30 0.195 20Y 2021-10-29 0.103 20Y 2021-11-30 -0.115 20Y 2021-12-31 0.056″ #========================================== # Read a sample of ECB zero coupon yields #========================================== df_long <– read.table(text = str_data, header = TRUE) #========================================== # Transform LONG to WIDE format #========================================== # using “wide” option df_wide <– reshape(df_long, direction = “wide”, idvar = “date”, timevar = “term”) df_wide # initialize row names rownames(df_long) <– NULL # delete a unnecessary prefix in column names colnames(df_wide) <– gsub(“rate.”,“”, colnames(df_wide)) df_wide #========================================== # Transform WIDE to LONG format #========================================== df_wide2 <– df_wide # need to add new column name as a prefix colnames(df_wide2)[–1] <– paste0(“term.”, colnames(df_wide)[–1]) # using “long” option df_long2 <– reshape(df_wide2, direction = “long”, idvar=“date”, varying = colnames(df_wide2)[–1], sep = “.”) # initialize row names rownames(df_long2) <– NULL df_long2 | cs |
Running the above R code produces the following wide format of the yield curve data.
We can also transform the wide format data to the long format conversely.
To leave a comment for the author, please follow the link and comment on their blog: SH Fintech Modeling.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.