R code snippet : Transform from long format to wide format

sang-heon lee

12 mins ago

[This article was first published on SH Fintech Modeling, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

< !--shlee --> This post introduces a useful R code snippet for transforming the long format data to the wide format. We occasionally encounter the long format data such as yield curve data since yield curve has two dimensions : maturity and date. For this end, we can use reshape() R built-in function.

Long and Wide formats

What is the long and wide format? A picture paints a thousand words.

We want to transform the long format data to the wide format data for a panel of time series analysis such as a term structure of interest rates. As financial data is usually extracted from database system, we occasionally encounter the long format data. For example, the data in the above figure is a sample of Euro area yield curve which has the long format. To facilitate an empirical analysis, the wide format is appropriate.

Transforming between the long and wide format can be carried out by using reshape() R function. No further explanation is needed. Let’s see the R code below.

< !--콘텐츠 내 자동 삽입 광고 배치하기 (compact)-->< !--–>

R code

The following R code read sample data and transform the long format to the wide format and vice versa. When using the reshape() function, we need to set the direction argument as “long” or “wide”. In particular, we need to add new column name with some delimitator (., _, etc) to the wide format data when we transform it the long format.

< !--shlee -->

100

101

102

103

#========================================================#

# Quantitative ALM, Financial Econometrics & Derivatives

# ML/DL using R, Python, Tensorflow by Sang-Heon Lee

# https://kiandlee.blogspot.com

#——————————————————–#

# Long to wide format and vice versa for yield data

#========================================================#

graphics.off(); rm(list = ls())

# sample data : ECB zero yields

str_data <– “term date rate

3M 2021-01-29 -0.625

3M 2021-02-26 -0.612

3M 2021-03-31 -0.636

3M 2021-04-30 -0.628

3M 2021-05-31 -0.632

3M 2021-06-30 -0.650

3M 2021-07-30 -0.663

3M 2021-08-31 -0.676

3M 2021-09-30 -0.712

3M 2021-10-29 -0.736

3M 2021-11-30 -0.895

3M 2021-12-31 -0.731

3Y 2021-01-29 -0.771

3Y 2021-02-26 -0.648

3Y 2021-03-31 -0.711

3Y 2021-04-30 -0.684

3Y 2021-05-31 -0.666

3Y 2021-06-30 -0.672

3Y 2021-07-30 -0.813

3Y 2021-08-31 -0.760

3Y 2021-09-30 -0.677

3Y 2021-10-29 -0.537

3Y 2021-11-30 -0.766

3Y 2021-12-31 -0.620

10Y 2021-01-29 -0.512

10Y 2021-02-26 -0.246

10Y 2021-03-31 -0.279

10Y 2021-04-30 -0.180

10Y 2021-05-31 -0.146

10Y 2021-06-30 -0.203

10Y 2021-07-30 -0.440

10Y 2021-08-31 -0.393

10Y 2021-09-30 -0.170

10Y 2021-10-29 -0.069

10Y 2021-11-30 -0.350

10Y 2021-12-31 -0.188

20Y 2021-01-29 -0.176

20Y 2021-02-26 0.103

20Y 2021-03-31 0.142

20Y 2021-04-30 0.252

20Y 2021-05-31 0.287

20Y 2021-06-30 0.201

20Y 2021-07-30 -0.059

20Y 2021-08-31 -0.033

20Y 2021-09-30 0.195

20Y 2021-10-29 0.103

20Y 2021-11-30 -0.115

20Y 2021-12-31 0.056″

#==========================================

# Read a sample of ECB zero coupon yields

#==========================================

df_long <– read.table(text = str_data, header = TRUE)

#==========================================

# Transform LONG to WIDE format

#==========================================

# using “wide” option

df_wide <– reshape(df_long, direction = “wide”,

idvar = “date”,

timevar = “term”)

df_wide

# initialize row names

rownames(df_long) <– NULL

# delete a unnecessary prefix in column names

colnames(df_wide) <– gsub(“rate.”,“”, colnames(df_wide))

df_wide

#==========================================

# Transform WIDE to LONG format

#==========================================

df_wide2 <– df_wide

# need to add new column name as a prefix

colnames(df_wide2)[–1] <–

paste0(“term.”, colnames(df_wide)[–1])

# using “long” option

df_long2 <– reshape(df_wide2, direction = “long”,

idvar=“date”,

varying = colnames(df_wide2)[–1],

sep = “.”)

# initialize row names

rownames(df_long2) <– NULL

df_long2

Colored by Color Scripter

< !--shlee --> Running the above R code produces the following wide format of the yield curve data.

We can also transform the wide format data to the long format conversely.

< !--shlee -->

To leave a comment for the author, please follow the link and comment on their blog: SH Fintech Modeling.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.