Creating a matrix from a long data.frame
[This article was first published on mages' blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
There can never be too many examples for transforming data with R. So, here is another example of reshaping a Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
data.frame
into a matrix
.Here I have a data frame that shows incremental claim payments over time for different loss occurrence (origin) years.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
n <- 7 | |
claims.df <- data.frame( | |
originf = factor(rep(2007:2013, n:1)), | |
dev=sequence(n:1), | |
paid= | |
c(3511, 3215, 2266, 1712, 1059, 587, | |
340, 4001, 3702, 2278, 1180, 956, | |
629, 4355, 3932, 1946, 1522, 1238, | |
4295, 3455, 2023, 1320, 4150, 3747, | |
2320, 5102, 4548, 6283) | |
) |
The format of the data frame above is how this kind of data is usually stored in a data base. However, I would like to see the payments of the different origin years in rows of a matrix.
The first idea might be to use the
reshape
function, but that would return a data.frame
. Yet, it is actually much easier with the matrix
function itself. Most of the code below is about formatting the dimension names of the matrix. Note that I use the with
function to save me a bit of typing.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(claims.triangle <- with(claims.df, { | |
M <- matrix(nrow=n, ncol=n, | |
dimnames=list(origin=levels(originf), dev=1:n)) | |
M[cbind(originf, dev)] <- paid | |
M | |
})) | |
## dev | |
## origin 1 2 3 4 5 6 7 | |
## 2007 3511 3215 2266 1712 1059 587 340 | |
## 2008 4001 3702 2278 1180 956 629 NA | |
## 2009 4355 3932 1946 1522 1238 NA NA | |
## 2010 4295 3455 2023 1320 NA NA NA | |
## 2011 4150 3747 2320 NA NA NA NA | |
## 2012 5102 4548 NA NA NA NA NA | |
## 2013 6283 NA NA NA NA NA NA |
An elegant alternative to
matrix
provides the acast
function of the reshape2
package. It has a nice formula argument and allows me not only to specify the aggregation function, but also to add the margin totals.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(reshape2) | |
acast(claims.df, originf ~ dev , value.var='paid', | |
fun.aggregate=sum, margins=TRUE) | |
1 2 3 4 5 6 7 (all) | |
## 2007 3511 3215 2266 1712 1059 587 340 12690 | |
## 2008 4001 3702 2278 1180 956 629 0 12746 | |
## 2009 4355 3932 1946 1522 1238 0 0 12993 | |
## 2010 4295 3455 2023 1320 0 0 0 11093 | |
## 2011 4150 3747 2320 0 0 0 0 10217 | |
## 2012 5102 4548 0 0 0 0 0 9650 | |
## 2013 6283 0 0 0 0 0 0 6283 | |
## (all) 31697 22599 10833 5734 3253 1216 340 75672 |
To leave a comment for the author, please follow the link and comment on their blog: mages' blog.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.