Data manipulation in r using data frames – an extensive article of basics part2 – aggregation and sorting
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Welcome to the second part of this two-part series on data manipulation in R. This article aims to present the reader with different ways of data aggregation and sorting. Here is the composition of this article.
- Aggregation in R
- Why aggregate data?
- Data used in this article
- Aggregation using base R
- Aggregation using the aggregate function
- Assigning a name to an aggregated column in aggregate function in R
- Aggregating multiple columns using an aggregate function in R
- Aggregating multiple columns data by multiple functions using the aggregate function in R
- Aggregation using by() function
- Aggregation using sweep() function
- Aggregation using dplyr
- Aggregation using group_by and summarize functions
- Aggregation by group_by and summarize functions with multiple variables and functions
- Aggregation using group_by and summarize functions with a range of variables
- Aggregation using group_by and summarize_if function
- Aggregation using group_by and summarize_at function
- Aggregation using group_by and summarize_all function
- Aggregation using data.table
- Converting data frame to data table
- Aggregating single variable using data table
- Assigning a custom name to aggregated variable in data table
- Aggregating multiple variables using multiple aggregation functions in data table
- Aggregating variables as well as filtering observations in data table
- Counting the number of observations within a group in data table
- Aggregation using sqldf
- Sorting in R
- Why sort data?
- Sorting using base R
- Sorting data using order function
- Sorting data in descending order using order function
- Sorting using dplyr
- Sorting data using arrange function
- Sorting data in descending order using arrange function
- Sorting using data.table
- Sorting data using order function
- Sorting data in descending order using order function
- Sorting data using setorder function
- Sorting data in descending order using setorder function
- Sorting using sqldf
Aggregation in R
Why aggregate data?
Usually, the data has two types, qualitative and quantitative. These two are statistical terms. Qualitative data defines the characteristics of the data. Labels, properties, attributes, and categories are all examples of qualitative data. As the name suggests, data that express the quality is qualitative data. On the other hand, quantitative data represent numbers. In other disciplines like data warehousing or business intelligence, qualitative data is equivalent to dimensions and quantitative data to measures.
Data analysis is a complex process and involves several steps. Some of these steps may include data to be examined by its quality. Usually, the qualitative data’s granularity is higher. Granularity is the level of detail.
For example, a dataset that contains all countries of the world may have multiple variables describing qualitative data. The name of the country would be at the most granular level, as all countries’ names would be unique and, no two countries will have the same name. Whereas, granularity level would rise as we look at the countries by continents and then by the hemisphere.
Similarly, in an analysis, data is examined on various levels by its different qualities. At this point, aggregation comes into the picture. It is required if you want to explore quantitative data elements by its quality that sits higher than the most granular level.
In this article, we will practice different aggregation functions and options offered by base R and other packages like dplyr and data.table. Note that this article does not aim to list all the functions that (if used in a way) can aggregate data, whereas the aim here is to explain the concept of data aggregation. R is a rich language that offers different ways of doing the same thing.
Data used in this article
financials <- read.csv("constituents-financials_csv.csv") str(financials) ## 'data.frame': 505 obs. of 14 variables: ## $ Symbol : chr "MMM" "AOS" "ABT" "ABBV" ... ## $ Name : chr "3M Company" "A.O. Smith Corp" "Abbott Laboratories" "AbbVie Inc." ... ## $ Sector : chr "Industrials" "Industrials" "Health Care" "Health Care" ... ## $ Price : num 222.9 60.2 56.3 108.5 150.5 ... ## $ Price.Earnings: num 24.3 27.8 22.5 19.4 25.5 ... ## $ Dividend.Yield: num 2.33 1.15 1.91 2.5 1.71 ... ## $ Earnings.Share: num 7.92 1.7 0.26 3.29 5.44 1.28 7.43 3.39 6.19 0.03 ... ## $ X52.Week.Low : num 259.8 68.4 64.6 125.9 162.6 ... ## $ X52.Week.High : num 175.5 48.9 42.3 60 114.8 ... ## $ Market.Cap : num 1.39e+11 1.08e+10 1.02e+11 1.81e+11 9.88e+10 ... ## $ EBITDA : num 9.05e+09 6.01e+08 5.74e+09 1.03e+10 5.64e+09 ... ## $ Price.Sales : num 4.39 3.58 3.74 6.29 2.6 ... ## $ Price.Book : num 11.34 6.35 3.19 26.14 10.62 ... ## $ SEC.Filings : chr "http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=MMM" "http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=AOS" "http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=ABT" "http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=ABBV" ...
financials <- financials %>% select(Symbol, Sector, Price, X52.Week.Low, X52.Week.High) head(financials,10) ## Symbol Sector Price X52.Week.Low X52.Week.High ## 1 MMM Industrials 222.89 259.770 175.490 ## 2 AOS Industrials 60.24 68.390 48.925 ## 3 ABT Health Care 56.27 64.600 42.280 ## 4 ABBV Health Care 108.48 125.860 60.050 ## 5 ACN Information Technology 150.51 162.600 114.820 ## 6 ATVI Information Technology 65.83 74.945 38.930 ## 7 AYI Industrials 145.41 225.360 142.000 ## 8 ADBE Information Technology 185.16 204.450 114.451 ## 9 AAP Consumer Discretionary 109.63 169.550 78.810 ## 10 AMD Information Technology 11.22 15.650 9.700
Our final data now has five variables and 505 observations.
Aggregation using base R
Aggregation using the aggregate function
Aggregate is a generic function that can be used for both data frames and time series.
aggregate(financials$Price, by = list(financials$Sector), FUN = sum) ## Group.1 x ## 1 Consumer Discretionary 10418.90 ## 2 Consumer Staples 2711.98 ## 3 Energy 1852.40 ## 4 Financials 6055.81 ## 5 Health Care 8083.46 ## 6 Industrials 7831.47 ## 7 Information Technology 8347.00 ## 8 Materials 2559.67 ## 9 Real Estate 2927.52 ## 10 Telecommunication Services 100.81 ## 11 Utilities 1545.45
Assigning a name to an aggregated column in aggregate function in R
The result above shows that the function aggregate split the original data into small subsets by sector variable and applied sum function over the price variable. The result above shows x as the name of the summarised column. If you want to rename it, a little tweak in the code using setName function will do the trick. Here is an example.
setNames(aggregate(financials$Price, by = list(financials$Sector), FUN = sum),c("Sector","Total.Price")) ## Sector Total.Price ## 1 Consumer Discretionary 10418.90 ## 2 Consumer Staples 2711.98 ## 3 Energy 1852.40 ## 4 Financials 6055.81 ## 5 Health Care 8083.46 ## 6 Industrials 7831.47 ## 7 Information Technology 8347.00 ## 8 Materials 2559.67 ## 9 Real Estate 2927.52 ## 10 Telecommunication Services 100.81 ## 11 Utilities 1545.45
Aggregating multiple columns using the aggregate function in R
Now to apply the same function on multiple variables, all you have to do is to supply an expression to subset the required columns from the data frame as an argument. Here is an example, we have used setName function on top to assign meaningful names to the variables.
The scenario here is to "find out the average price, average 52-weeks low price, and the average 52-week high price of all shares by all sectors".
setNames(aggregate(financials[,c("Price","X52.Week.Low","X52.Week.High")], by = list(financials$Sector), FUN = mean), c("Sector","Average.Price","Average.Low","Average.High")) ## Sector Average.Price Average.Low Average.High ## 1 Consumer Discretionary 124.03452 146.93143 96.09236 ## 2 Consumer Staples 79.76412 92.83229 68.92944 ## 3 Energy 57.88750 72.58969 48.14123 ## 4 Financials 89.05603 101.82185 72.69447 ## 5 Health Care 132.51574 160.75853 103.71925 ## 6 Industrials 116.88761 134.57948 90.83702 ## 7 Information Technology 119.24286 138.77864 91.89142 ## 8 Materials 102.38680 118.03885 85.58325 ## 9 Real Estate 88.71273 110.55045 82.87809 ## 10 Telecommunication Services 33.60333 41.69333 29.50367 ## 11 Utilities 55.19464 68.49732 52.80232
Aggregating multiple columns data by multiple functions using the aggregate function in R
Moving on to the next level of the scenario, we would like to apply multiple functions over the different variables. It is possible by defining a custom function. The requirement now is to "find out the minimum and maximum values of price, 52-weeks low price of all shares by all sectors". Here is an example.
aggregate(financials[c("Price","X52.Week.Low")], by = list(financials$Sector), FUN = function(x) c(min(x),max(x))) ## Group.1 Price.1 Price.2 X52.Week.Low.1 X52.Week.Low.2 ## 1 Consumer Discretionary 10.43 1806.06 13.480 2067.990 ## 2 Consumer Staples 19.96 208.73 21.175 229.500 ## 3 Energy 2.82 169.16 6.590 199.830 ## 4 Financials 13.38 509.38 16.530 594.520 ## 5 Health Care 25.20 601.00 29.930 697.260 ## 6 Industrials 14.45 334.30 30.590 361.790 ## 7 Information Technology 11.22 1007.71 15.650 1198.000 ## 8 Materials 17.16 387.65 20.250 435.150 ## 9 Real Estate 14.01 409.98 21.530 495.345 ## 10 Telecommunication Services 16.20 49.04 27.610 54.770 ## 11 Utilities 10.06 145.29 12.050 159.640
The above command can be written in different ways. Here are a couple of examples. Note the use of cbind and formula (~).
aggregate(cbind(financials$Price,financials$X52.Week.Low) ~financials$Sector, FUN = function(x) c(min(x),max(x))) ## financials$Sector V1.1 V1.2 V2.1 V2.2 ## 1 Consumer Discretionary 10.43 1806.06 13.480 2067.990 ## 2 Consumer Staples 19.96 208.73 21.175 229.500 ## 3 Energy 2.82 169.16 6.590 199.830 ## 4 Financials 13.38 509.38 16.530 594.520 ## 5 Health Care 25.20 601.00 29.930 697.260 ## 6 Industrials 14.45 334.30 30.590 361.790 ## 7 Information Technology 11.22 1007.71 15.650 1198.000 ## 8 Materials 17.16 387.65 20.250 435.150 ## 9 Real Estate 14.01 409.98 21.530 495.345 ## 10 Telecommunication Services 16.20 49.04 27.610 54.770 ## 11 Utilities 10.06 145.29 12.050 159.640 aggregate(cbind(Price,X52.Week.Low) ~ Sector, data = financials, FUN = function(x) c(min(x),max(x))) ## Sector Price.1 Price.2 X52.Week.Low.1 X52.Week.Low.2 ## 1 Consumer Discretionary 10.43 1806.06 13.480 2067.990 ## 2 Consumer Staples 19.96 208.73 21.175 229.500 ## 3 Energy 2.82 169.16 6.590 199.830 ## 4 Financials 13.38 509.38 16.530 594.520 ## 5 Health Care 25.20 601.00 29.930 697.260 ## 6 Industrials 14.45 334.30 30.590 361.790 ## 7 Information Technology 11.22 1007.71 15.650 1198.000 ## 8 Materials 17.16 387.65 20.250 435.150 ## 9 Real Estate 14.01 409.98 21.530 495.345 ## 10 Telecommunication Services 16.20 49.04 27.610 54.770 ## 11 Utilities 10.06 145.29 12.050 159.640
With the aggregate function, possibilities are endless. We can do a lot using the aggregate function. What we have seen in the sections above is just a part. Here are some other arguments which you can add to the aggregate function based on your requirements.
Argument | Use |
na.action | Use this to specify if your data contain NA or missing values and how you want to handle them |
simplify | If used results will be presented in vector or matrix form |
formula | Depending on subsetting requirement formulas like y ~ x or cbind(y1,y2) ~ x1 + x2 can be used |
data | The data frame in question which has the variables to subset and aggregate |
drop | Use if you want to drop unused combination of grouping values |
subset | Use it to limit the operation to certain observations |
Aggregation using by() function
by(data=financials$Price, INDICES = financials$Sector, FUN = sum, simplify = TRUE) ## financials$Sector: Consumer Discretionary ## [1] 10418.9 ## ------------------------------------------------------------ ## financials$Sector: Consumer Staples ## [1] 2711.98 ## ------------------------------------------------------------ ## financials$Sector: Energy ## [1] 1852.4 ## ------------------------------------------------------------ ## financials$Sector: Financials ## [1] 6055.81 ## ------------------------------------------------------------ ## financials$Sector: Health Care ## [1] 8083.46 ## ------------------------------------------------------------ ## financials$Sector: Industrials ## [1] 7831.47 ## ------------------------------------------------------------ ## financials$Sector: Information Technology ## [1] 8347 ## ------------------------------------------------------------ ## financials$Sector: Materials ## [1] 2559.67 ## ------------------------------------------------------------ ## financials$Sector: Real Estate ## [1] 2927.52 ## ------------------------------------------------------------ ## financials$Sector: Telecommunication Services ## [1] 100.81 ## ------------------------------------------------------------ ## financials$Sector: Utilities ## [1] 1545.45
Aggregation using sweep() function
sweep(financials["Price"], MARGIN = 1, STATS = 0,FUN = sum) ## [1] 52434.47
Aggregation using dplyr
Aggregation using group_by and summarize functions
financials %>% group_by(Sector) %>% summarize(total.price = sum(Price)) ## `summarise()` ungrouping output (override with `.groups` argument) ## # A tibble: 11 x 2 ## Sector total.price ## <chr> <dbl> ## 1 Consumer Discretionary 10419. ## 2 Consumer Staples 2712. ## 3 Energy 1852. ## 4 Financials 6056. ## 5 Health Care 8083. ## 6 Industrials 7831. ## 7 Information Technology 8347 ## 8 Materials 2560. ## 9 Real Estate 2928. ## 10 Telecommunication Services 101. ## 11 Utilities 1545.
If you were observant, then you may have noticed a warning as well. This warning is a hint to remind you how you want to control the grouping of data. You can suppress this warning by using options(dplyr.summarise.inform = FALSE). For more information, please click this link.
Also, did you notice how the results are printed above?
Nothing wrong with that, it is because the return value is in the form of tibble. You can print.data.frame() function to print is in a nice format. Here is an example.
financials %>% group_by(Sector) %>% summarize(total.price = sum(Price)) %>% print.data.frame() ## `summarise()` ungrouping output (override with `.groups` argument) ## Sector total.price ## 1 Consumer Discretionary 10418.90 ## 2 Consumer Staples 2711.98 ## 3 Energy 1852.40 ## 4 Financials 6055.81 ## 5 Health Care 8083.46 ## 6 Industrials 7831.47 ## 7 Information Technology 8347.00 ## 8 Materials 2559.67 ## 9 Real Estate 2927.52 ## 10 Telecommunication Services 100.81 ## 11 Utilities 1545.45
Lastly, did you notice that the summarised variable is now named total.price? By simply putting the new name in front of the aggregation function will do the trick. It is not mandatory, but I am sure you wouldn't like the default name.
Aggregation by group_by and summarize functions with multiple variables and functions
Let's progress ahead with our scenario and find out the total price, minimum 52-week low and maximum 52-week high price of all shares and by all sectors. The aim here is to show the use of multiple variables with multiple aggregation functions.
financials %>% group_by(Sector) %>% summarize(total.price = sum(Price), min.52.week.low = min(X52.Week.Low), max.52.week.high = max(X52.Week.High)) %>% print.data.frame() ## `summarise()` ungrouping output (override with `.groups` argument) ## Sector total.price min.52.week.low max.52.week.high ## 1 Consumer Discretionary 10418.90 13.480 1589.0000 ## 2 Consumer Staples 2711.98 21.175 152.0100 ## 3 Energy 1852.40 6.590 125.4600 ## 4 Financials 6055.81 16.530 368.0000 ## 5 Health Care 8083.46 29.930 459.3400 ## 6 Industrials 7831.47 30.590 256.4000 ## 7 Information Technology 8347.00 15.650 824.3000 ## 8 Materials 2559.67 20.250 302.0101 ## 9 Real Estate 2927.52 21.530 361.9000 ## 10 Telecommunication Services 100.81 27.610 42.8000 ## 11 Utilities 1545.45 12.050 124.1800
Also, we can supply more than one variable in the group_by function if you want to group data by multiple variables.
Aggregation using group_by and summarize functions with a range of variables
You can use the summarize function in a variety of ways. If you have several variables to aggregate, then the following type of command would help.
financials %>% group_by(Sector) %>% summarize(across(Price:X52.Week.High, ~sum(.x))) %>% print.data.frame() ## `summarise()` ungrouping output (override with `.groups` argument) ## Sector Price X52.Week.Low X52.Week.High ## 1 Consumer Discretionary 10418.90 12342.240 8071.759 ## 2 Consumer Staples 2711.98 3156.298 2343.601 ## 3 Energy 1852.40 2322.870 1540.519 ## 4 Financials 6055.81 6923.886 4943.224 ## 5 Health Care 8083.46 9806.271 6326.874 ## 6 Industrials 7831.47 9016.825 6086.081 ## 7 Information Technology 8347.00 9714.505 6432.399 ## 8 Materials 2559.67 2950.971 2139.581 ## 9 Real Estate 2927.52 3648.165 2734.977 ## 10 Telecommunication Services 100.81 125.080 88.511 ## 11 Utilities 1545.45 1917.925 1478.465
Aggregation using group_by and summarize_if function
financials %>% group_by(Sector) %>% summarize_if(is.numeric,sum) %>% print.data.frame() ## Sector Price X52.Week.Low X52.Week.High ## 1 Consumer Discretionary 10418.90 12342.240 8071.759 ## 2 Consumer Staples 2711.98 3156.298 2343.601 ## 3 Energy 1852.40 2322.870 1540.519 ## 4 Financials 6055.81 6923.886 4943.224 ## 5 Health Care 8083.46 9806.271 6326.874 ## 6 Industrials 7831.47 9016.825 6086.081 ## 7 Information Technology 8347.00 9714.505 6432.399 ## 8 Materials 2559.67 2950.971 2139.581 ## 9 Real Estate 2927.52 3648.165 2734.977 ## 10 Telecommunication Services 100.81 125.080 88.511 ## 11 Utilities 1545.45 1917.925 1478.465
Aggregation using group_by and summarize_at function
Summarize_at, on the other hand, helps you to aggregate more than one variable in one go. The requirement is that the variables are supplied as vector or vars.
financials %>% group_by(Sector) %>% summarize_at(c("Price","X52.Week.High"),sum) %>% print.data.frame() ## Sector Price X52.Week.High ## 1 Consumer Discretionary 10418.90 8071.759 ## 2 Consumer Staples 2711.98 2343.601 ## 3 Energy 1852.40 1540.519 ## 4 Financials 6055.81 4943.224 ## 5 Health Care 8083.46 6326.874 ## 6 Industrials 7831.47 6086.081 ## 7 Information Technology 8347.00 6432.399 ## 8 Materials 2559.67 2139.581 ## 9 Real Estate 2927.52 2734.977 ## 10 Telecommunication Services 100.81 88.511 ## 11 Utilities 1545.45 1478.465
Aggregation using group_by and summarize_all function
Lastly, summarize_all function helps to aggregate all the values from the data frame, and we can use multiple aggregation functions as well. Note in our data frame we have two character variables, let's prepare the data for using summarize_all function and aggregate data to show total and minimum values of all variable across the data frame.
financials %>% select(-Symbol) %>% group_by(Sector) %>% summarize_all(c(sum,min)) ## # A tibble: 11 x 7 ## Sector Price_fn1 X52.Week.Low_fn1 X52.Week.High_f… Price_fn2 X52.Week.Low_fn2 ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 Consu… 10419. 12342. 8072. 10.4 13.5 ## 2 Consu… 2712. 3156. 2344. 20.0 21.2 ## 3 Energy 1852. 2323. 1541. 2.82 6.59 ## 4 Finan… 6056. 6924. 4943. 13.4 16.5 ## 5 Healt… 8083. 9806. 6327. 25.2 29.9 ## 6 Indus… 7831. 9017. 6086. 14.4 30.6 ## 7 Infor… 8347 9715. 6432. 11.2 15.6 ## 8 Mater… 2560. 2951. 2140. 17.2 20.2 ## 9 Real … 2928. 3648. 2735. 14.0 21.5 ## 10 Telec… 101. 125. 88.5 16.2 27.6 ## 11 Utili… 1545. 1918. 1478. 10.1 12.0 ## # … with 1 more variable: X52.Week.High_fn2 <dbl>
Aggregation using data.table
Converting data frame to data table
financials <- setDT(financials) class(financials) ## [1] "data.table" "data.frame"
Aggregating single variable using data table
financials[,sum(Price), by=Sector] ## Sector V1 ## 1: Industrials 7831.47 ## 2: Health Care 8083.46 ## 3: Information Technology 8347.00 ## 4: Consumer Discretionary 10418.90 ## 5: Utilities 1545.45 ## 6: Financials 6055.81 ## 7: Materials 2559.67 ## 8: Real Estate 2927.52 ## 9: Consumer Staples 2711.98 ## 10: Energy 1852.40 ## 11: Telecommunication Services 100.81
Assigning a custom name to an aggregated variable in data table
financials[,list(total.price=sum(Price)), by=Sector]
financials[,.(total.price=sum(Price)), by=Sector] ## Sector total.price ## 1: Industrials 7831.47 ## 2: Health Care 8083.46 ## 3: Information Technology 8347.00 ## 4: Consumer Discretionary 10418.90 ## 5: Utilities 1545.45 ## 6: Financials 6055.81 ## 7: Materials 2559.67 ## 8: Real Estate 2927.52 ## 9: Consumer Staples 2711.98 ## 10: Energy 1852.40 ## 11: Telecommunication Services 100.81
Aggregating multiple variables using multiple aggregation functions in data table
financials[,.(total.price=sum(Price), mim.52.week.low=min(X52.Week.Low)), by=Sector]
financials[,list(total.price=sum(Price), mim.52.week.low=min(X52.Week.Low)), by=Sector] ## Sector total.price mim.52.week.low ## 1: Industrials 7831.47 30.590 ## 2: Health Care 8083.46 29.930 ## 3: Information Technology 8347.00 15.650 ## 4: Consumer Discretionary 10418.90 13.480 ## 5: Utilities 1545.45 12.050 ## 6: Financials 6055.81 16.530 ## 7: Materials 2559.67 20.250 ## 8: Real Estate 2927.52 21.530 ## 9: Consumer Staples 2711.98 21.175 ## 10: Energy 1852.40 6.590 ## 11: Telecommunication Services 100.81 27.610
Aggregating variables as well as filtering observations in data table
financials[Sector == "Industrials",list(total.price=sum(Price), mim.52.week.low=min(X52.Week.Low)), by=Sector] ## Sector total.price mim.52.week.low ## 1: Industrials 7831.47 30.59
Counting the number of observations within a group in data table
financials[,.N, by=Sector] ## Sector N ## 1: Industrials 67 ## 2: Health Care 61 ## 3: Information Technology 70 ## 4: Consumer Discretionary 84 ## 5: Utilities 28 ## 6: Financials 68 ## 7: Materials 25 ## 8: Real Estate 33 ## 9: Consumer Staples 34 ## 10: Energy 32 ## 11: Telecommunication Services 3
by=.(variable to group 1, variable to group 2)
Aggregation using sqldf
sqldf('select Sector, sum(Price), min("X52.Week.Low") from financials group by Sector having sum(Price) > 5000') ## Sector sum(Price) min("X52.Week.Low") ## 1 Consumer Discretionary 10418.90 13.48 ## 2 Financials 6055.81 16.53 ## 3 Health Care 8083.46 29.93 ## 4 Industrials 7831.47 30.59 ## 5 Information Technology 8347.00 15.65
sqldf('select Sector, sum(Price) as "total.price", min("X52.Week.Low") as "min.52.week.low" from financials group by Sector having sum(Price) > 5000') ## Sector total.price min.52.week.low ## 1 Consumer Discretionary 10418.90 13.48 ## 2 Financials 6055.81 16.53 ## 3 Health Care 8083.46 29.93 ## 4 Industrials 7831.47 30.59 ## 5 Information Technology 8347.00 15.65
Sorting in R
Why sort data?
Sorting data using base R
Sorting data using order function
financials[order(Symbol),c("Symbol","Price")] ## Symbol Price ## 1: A 65.05 ## 2: AAL 48.60 ## 3: AAP 109.63 ## 4: AAPL 155.15 ## 5: ABBV 108.48 ## --- ## 501: XYL 70.24 ## 502: YUM 76.30 ## 503: ZBH 115.53 ## 504: ZION 50.71 ## 505: ZTS 71.51
Sorting data in descending order using order function
financials[order(desc(Symbol)),c("Symbol","Price")] ## Symbol Price ## 1: ZTS 71.51 ## 2: ZION 50.71 ## 3: ZBH 115.53 ## 4: YUM 76.30 ## 5: XYL 70.24 ## --- ## 501: ABBV 108.48 ## 502: AAPL 155.15 ## 503: AAP 109.63 ## 504: AAL 48.60 ## 505: A 65.05
Sorting data using dplyr
Sorting data using the arrange function
financials %>% group_by(Sector) %>% summarize(sum(Price)) %>% arrange(Sector) %>% print.data.frame() ## `summarise()` ungrouping output (override with `.groups` argument) ## Sector sum(Price) ## 1 Consumer Discretionary 10418.90 ## 2 Consumer Staples 2711.98 ## 3 Energy 1852.40 ## 4 Financials 6055.81 ## 5 Health Care 8083.46 ## 6 Industrials 7831.47 ## 7 Information Technology 8347.00 ## 8 Materials 2559.67 ## 9 Real Estate 2927.52 ## 10 Telecommunication Services 100.81 ## 11 Utilities 1545.45
Sorting data in descending order using arrange function
financials %>% group_by(Sector) %>% summarize(sum(Price)) %>% arrange(desc(Sector)) %>% print.data.frame() ## `summarise()` ungrouping output (override with `.groups` argument) ## Sector sum(Price) ## 1 Utilities 1545.45 ## 2 Telecommunication Services 100.81 ## 3 Real Estate 2927.52 ## 4 Materials 2559.67 ## 5 Information Technology 8347.00 ## 6 Industrials 7831.47 ## 7 Health Care 8083.46 ## 8 Financials 6055.81 ## 9 Energy 1852.40 ## 10 Consumer Staples 2711.98 ## 11 Consumer Discretionary 10418.90
Sorting data using data table
Sorting data using order function
financials <- setDT(financials) financials[order(Sector),sum(Price), by=Sector] ## Sector V1 ## 1: Consumer Discretionary 10418.90 ## 2: Consumer Staples 2711.98 ## 3: Energy 1852.40 ## 4: Financials 6055.81 ## 5: Health Care 8083.46 ## 6: Industrials 7831.47 ## 7: Information Technology 8347.00 ## 8: Materials 2559.67 ## 9: Real Estate 2927.52 ## 10: Telecommunication Services 100.81 ## 11: Utilities 1545.45
Sorting data in descending order using order function
financials[order(desc(Sector)),sum(Price), by=Sector] ## Sector V1 ## 1: Utilities 1545.45 ## 2: Telecommunication Services 100.81 ## 3: Real Estate 2927.52 ## 4: Materials 2559.67 ## 5: Information Technology 8347.00 ## 6: Industrials 7831.47 ## 7: Health Care 8083.46 ## 8: Financials 6055.81 ## 9: Energy 1852.40 ## 10: Consumer Staples 2711.98 ## 11: Consumer Discretionary 10418.90
Sorting data using setorder function
order.data <- setorder(financials[,c("Symbol","Price")],Symbol) order.data ## Symbol Price ## 1: A 65.05 ## 2: AAL 48.60 ## 3: AAP 109.63 ## 4: AAPL 155.15 ## 5: ABBV 108.48 ## --- ## 501: XYL 70.24 ## 502: YUM 76.30 ## 503: ZBH 115.53 ## 504: ZION 50.71 ## 505: ZTS 71.51
Sorting data in descending order using setorder function
order.data <- setorder(financials[,c("Symbol","Price")],-Symbol) order.data ## Symbol Price ## 1: ZTS 71.51 ## 2: ZION 50.71 ## 3: ZBH 115.53 ## 4: YUM 76.30 ## 5: XYL 70.24 ## --- ## 501: ABBV 108.48 ## 502: AAPL 155.15 ## 503: AAP 109.63 ## 504: AAL 48.60 ## 505: A 65.05
Sorting data using sqldf
financials <- read.csv("constituents-financials_csv.csv") sqldf('select Sector, sum(Price), min("X52.Week.Low") from financials group by Sector order by Sector') ## Sector sum(Price) min("X52.Week.Low") ## 1 Consumer Discretionary 10418.90 13.480 ## 2 Consumer Staples 2711.98 21.175 ## 3 Energy 1852.40 6.590 ## 4 Financials 6055.81 16.530 ## 5 Health Care 8083.46 29.930 ## 6 Industrials 7831.47 30.590 ## 7 Information Technology 8347.00 15.650 ## 8 Materials 2559.67 20.250 ## 9 Real Estate 2927.52 21.530 ## 10 Telecommunication Services 100.81 27.610 ## 11 Utilities 1545.45 12.050
sqldf('select Sector, sum(Price), min("X52.Week.Low") from financials group by Sector order by Sector desc') ## Sector sum(Price) min("X52.Week.Low") ## 1 Utilities 1545.45 12.050 ## 2 Telecommunication Services 100.81 27.610 ## 3 Real Estate 2927.52 21.530 ## 4 Materials 2559.67 20.250 ## 5 Information Technology 8347.00 15.650 ## 6 Industrials 7831.47 30.590 ## 7 Health Care 8083.46 29.930 ## 8 Financials 6055.81 16.530 ## 9 Energy 1852.40 6.590 ## 10 Consumer Staples 2711.98 21.175 ## 11 Consumer Discretionary 10418.90 13.480
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.