R : a combined usage of split, lapply and do.call

Posted on August 29, 2021 by sang-heon lee in R bloggers | 0 Comments

[This article was first published on K & L Fintech Modeling, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post explains a combined usage of split(), lapply() and do.call() R functions, so called Split-Apply-Combine approach. These are frequently used for group based calculations such as weighted average or aggregation. It will be useful when complicated user-defined function is applied differently to each group such as currency.

lapply, split, and do.call

lapply R function takes a list as its input and apply built-in or use-defined function to its list members.

1
2
lapply(list, function)
 
cs

Raw data is typically the data.frame not the list. When we want to perform lapply() on data.frame, It is therefore needed to convert this data.frame to the corresponding list. For this purpose, we use split() R function, which take data.frame and a key column as input and return list object separated by key column.

1
2
split(data.frame, key column of data.frame)
 
Colored by Color Scripter
cs

Although lapply() is very useful, it is somewhat annoying to deal with its returning list object. We convert this list object to the corresponding data.frame using do.call() R function in the following way.

1
2
do.call(rbind, list) 
 
cs

We want to aggregate weighted sensitivities (ws) within its currency (currency) from input data.frame by using split() and lapply() R functions. Finally, output data.frame is constructed by using do.call() R function. This overall process is illustrated by the following figure.

Case 1) Single output

The following R code perform a summationof ws by currency group.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
#=========================================================================#
# Financial Econometrics & Derivatives, ML/DL using R, Python, Tensorflow  
# by Sang-Heon Lee 
#
# https://kiandlee.blogspot.com
#————————————————————————-#
# A versatile usage of lapply
#=========================================================================#
 
graphics.off()  # clear all graphs
rm(list = ls()) # remove all files from your workspace
 
#————————————————————————–
# Input data.frame
#————————————————————————–
    
    str.data <– 
        “currency    maturity    ws
         USD        3m            285000000
         USD        3m            456000000
         USD        1y            112000000
         USD        2y            56000000
         EUR        3m            1785000000
         EUR        6m            200000000
         EUR        1y            250000000
         EUR        1y            1855000000
         CNY        6m            84000000
         CNY        6m            42000000
         CNY        6m            144000000
         AUD        6m            213000000
         AUD        2y            106000000
         AUD        2y            214000000″
    df.data <– read.table(text=str.data, header = TRUE)
    print(df.data)
 
 
#————————————————–
# 1) Single output
#————————————————–
 
    # split input data by key columns
    
    lt.data <– split(df.data, df.data$currency)
 
    # single output
    lt.out  <– lapply(
        lt.data, 
        function(x){ data.frame(sum_ws = sum(x$ws))})
    
    # concatenate rows
    df.out <– do.call(rbind,lt.out) 
    rownames(df.out) <– NULL; print(df.out)
 
 
–––––––––––––––––––––––––––––––––––––––––––––––––––
 
> print(df.out)
    sum_ws
1 5.33e+08
2 2.70e+08
3 4.09e+09
4 9.09e+08
 
Colored by Color Scripter
cs

Case 2) Multiple output

The following R code calculate a summation and maximum of ws by currency group with each corresponding currency

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#————————————————–
# 2) Multiple output
#————————————————–
    
    # split input data by key column and do lapply
    lt.out  <– lapply( 
        
        # split by currency for outer lapply
        split(df.data, df.data$currency), 
        function(x){
            data.frame(curr   = max(x$currency),
                       sum_ws = sum(x$ws),
                       max_ws = max(x$ws))
        })
    
    # concatenate rows
    df.out <– do.call(rbind,lt.out) 
    rownames(df.out) <– NULL; print(df.out)
 
 
–––––––––––––––––––––––––––––––––––––––––––––––––––
 
>  print(df.out)
  curr   sum_ws     max_ws
1  AUD 5.33e+08  214000000
2  CNY 2.70e+08  144000000
3  EUR 4.09e+09 1855000000
4  USD 9.09e+08  456000000
 
Colored by Color Scripter
cs

Case 3) Multiple output while original row is preserved

The following R code calculates a summation of ws by currency group with each corresponding currency and also calculate weight within its currency. This case happens when we need to calculate some variables with both ungrouped and grouped (aggregated) variables such as weights within group. For this purpose, we need to preserve the row of original input data. This can be done by returning key or other column variables without group operation such as curr = x$currency, ws = x$ws except for grouped variables such as sum_ws = sum(x$ws).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#————————————————–
# 3) Multiple output while original row is preserved
#————————————————–
    
    # split input data by key column and do lapply
    lt.out  <– lapply( 
        
        # split by currency for lapply
        split(df.data, df.data$currency), 
        function(x){
            data.frame(curr   = x$currency,
                       ws     = x$ws,
                       sum_ws = sum(x$ws))
        })
    
    # concatenate rows
    df.out <– do.call(rbind,lt.out)
    rownames(df.out) <– NULL; print(df.out)
    
    # add another group based calculation
    df.out$group_wgt <– df.out$ws/df.out$sum_ws
    print(df.out)
 
 
–––––––––––––––––––––––––––––––––––––––––––––––––––
 
>     print(df.out)
   curr         ws   sum_ws  group_wgt
1   AUD  213000000 5.33e+08 0.39962477
2   AUD  106000000 5.33e+08 0.19887430
3   AUD  214000000 5.33e+08 0.40150094
4   CNY   84000000 2.70e+08 0.31111111
5   CNY   42000000 2.70e+08 0.15555556
6   CNY  144000000 2.70e+08 0.53333333
7   EUR 1785000000 4.09e+09 0.43643032
8   EUR  200000000 4.09e+09 0.04889976
9   EUR  250000000 4.09e+09 0.06112469
10  EUR 1855000000 4.09e+09 0.45354523
11  USD  285000000 9.09e+08 0.31353135
12  USD  456000000 9.09e+08 0.50165017
13  USD  112000000 9.09e+08 0.12321232
14  USD   56000000 9.09e+08 0.06160616
 
Colored by Color Scripter
cs

Case 4) Multiple output with multiple key columns

The following R code calculates a summation of ws by currency and maturity group with each corresponding currency. In this case, nested lapply() is used.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#————————————————–
# 4) Multiple output with multiple key columns
#————————————————–
    
    # outer lapply
    lt.out  <– lapply( 
        
        # split by currency for outer lapply
        split(df.data, df.data$currency ), 
        function(x){
            
            # inner lapply
            y <– lapply(
                
                # split by maturity for inner lapply
                split(x, x$maturity), 
                function(x) {
                    data.frame(curr = max(x$currency),
                           mat  = max(x$maturity),
                           sum_ws = sum(x$ws),
                           max_ws = max(x$ws))})
            
            # concatenate inner rows
            do.call(rbind,y)
        })
    
    # concatenate outer rows
    df.out <– do.call(rbind,lt.out)
    rownames(df.out) <– NULL; print(df.out)
 
–––––––––––––––––––––––––––––––––––––––––––––––––––
 
> print(df.out)
  curr mat     sum_ws     max_ws
1  AUD  2y  320000000  214000000
2  AUD  6m  213000000  213000000
3  CNY  6m  270000000  144000000
4  EUR  1y 2105000000 1855000000
5  EUR  3m 1785000000 1785000000
6  EUR  6m  200000000  200000000
7  USD  1y  112000000  112000000
8  USD  2y   56000000   56000000
9  USD  3m  741000000  456000000
 
Colored by Color Scripter
cs

Case 5) Multiple output with multiple key columns while original row is preserved

The following R code is the same version of case 3) when original row is preserved.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#————————————————–
# 5) Multiple output with multiple key columns
#    while original row is preserved
#————————————————–
    
    # outer lapply
    lt.out  <– lapply( 
        
        # split by currency for outer lapply
        split(df.data, df.data$currency ), 
        function(x){
            
            # inner lapply
            y <– lapply(
                
                # split by maturity for inner lapply
                split(x, x$maturity), 
                function(x) {
                    data.frame(curr = x$currency,
                               mat  = x$maturity,
                               ws   = x$ws,
                               sum_ws = sum(x$ws))})
            # concatenate inner rows
            do.call(rbind,y)
        })
    
    # concatenate outer rows
    df.out <– do.call(rbind,lt.out) 
    rownames(df.out) <– NULL
    
    # add another group based calculation
    df.out$group_wgt <– df.out$ws/df.out$sum_ws
    print(df.out)   
 
 
–––––––––––––––––––––––––––––––––––––––––––––––––––
 
>     print(df.out)   
   curr mat         ws     sum_ws group_wgt
1   AUD  2y  106000000  320000000 0.3312500
2   AUD  2y  214000000  320000000 0.6687500
3   AUD  6m  213000000  213000000 1.0000000
4   CNY  6m   84000000  270000000 0.3111111
5   CNY  6m   42000000  270000000 0.1555556
6   CNY  6m  144000000  270000000 0.5333333
7   EUR  1y  250000000 2105000000 0.1187648
8   EUR  1y 1855000000 2105000000 0.8812352
9   EUR  3m 1785000000 1785000000 1.0000000
10  EUR  6m  200000000  200000000 1.0000000
11  USD  1y  112000000  112000000 1.0000000
12  USD  2y   56000000   56000000 1.0000000
13  USD  3m  285000000  741000000 0.3846154
14  USD  3m  456000000  741000000 0.6153846
 
Colored by Color Scripter
cs

From this post, we can find that combination of split, lapply and do.call R runctions deliver output which we want to generate for the group operations. $\blacksquare$

To leave a comment for the author, please follow the link and comment on their blog: K & L Fintech Modeling.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

R : a combined usage of split, lapply and do.call

lapply, split, and do.call

Case 1) Single output

Case 2) Multiple output

Case 3) Multiple output while original row is preserved

Case 4) Multiple output with multiple key columns

Case 5) Multiple output with multiple key columns while original row is preserved

Related

lapply, split, and do.call

Case 1) Single output

Case 2) Multiple output

Case 3) Multiple output while original row is preserved

Case 4) Multiple output with multiple key columns

Case 5) Multiple output with multiple key columns while original row is preserved

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)