Creating a matrix from a long data.frame

[This article was first published on mages' blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

There can never be too many examples for transforming data with R. So, here is another example of reshaping a data.frame into a matrix.

Here I have a data frame that shows incremental claim payments over time for different loss occurrence (origin) years.

n <- 7
claims.df <- data.frame(
originf = factor(rep(2007:2013, n:1)),
dev=sequence(n:1),
paid=
c(3511, 3215, 2266, 1712, 1059, 587,
340, 4001, 3702, 2278, 1180, 956,
629, 4355, 3932, 1946, 1522, 1238,
4295, 3455, 2023, 1320, 4150, 3747,
2320, 5102, 4548, 6283)
)
view raw triangledata.R hosted with ❤ by GitHub

The format of the data frame above is how this kind of data is usually stored in a data base. However, I would like to see the payments of the different origin years in rows of a matrix.

The first idea might be to use the reshape function, but that would return a data.frame. Yet, it is actually much easier with the matrix function itself. Most of the code below is about formatting the dimension names of the matrix. Note that I use the with function to save me a bit of typing.

(claims.triangle <- with(claims.df, {
M <- matrix(nrow=n, ncol=n,
dimnames=list(origin=levels(originf), dev=1:n))
M[cbind(originf, dev)] <- paid
M
}))
## dev
## origin 1 2 3 4 5 6 7
## 2007 3511 3215 2266 1712 1059 587 340
## 2008 4001 3702 2278 1180 956 629 NA
## 2009 4355 3932 1946 1522 1238 NA NA
## 2010 4295 3455 2023 1320 NA NA NA
## 2011 4150 3747 2320 NA NA NA NA
## 2012 5102 4548 NA NA NA NA NA
## 2013 6283 NA NA NA NA NA NA
view raw triangle1.R hosted with ❤ by GitHub

An elegant alternative to matrix provides the acast function of the reshape2 package. It has a nice formula argument and allows me not only to specify the aggregation function, but also to add the margin totals.

library(reshape2)
acast(claims.df, originf ~ dev , value.var='paid',
fun.aggregate=sum, margins=TRUE)
1 2 3 4 5 6 7 (all)
## 2007 3511 3215 2266 1712 1059 587 340 12690
## 2008 4001 3702 2278 1180 956 629 0 12746
## 2009 4355 3932 1946 1522 1238 0 0 12993
## 2010 4295 3455 2023 1320 0 0 0 11093
## 2011 4150 3747 2320 0 0 0 0 10217
## 2012 5102 4548 0 0 0 0 0 9650
## 2013 6283 0 0 0 0 0 0 6283
## (all) 31697 22599 10833 5734 3253 1216 340 75672
view raw triangle2.R hosted with ❤ by GitHub

To leave a comment for the author, please follow the link and comment on their blog: mages' blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)