R Weekly Bulletin Vol – IV
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This week’s R bulletin will cover topics like removing duplicate rows, finding row number, sorting a data frame in the same order, sorting a data frame in different order, and creating two tabs in an excel workbook. We will also cover functions like Sys.time, julian, and the second & minute function from the lubridate package. Hope you like this R weekly bulletin. Enjoy reading!
Shortcut Keys
1. To open a file in R – Ctrl+O
2. To open a new R script – Ctrl+Shift+N
3. To save an R script – Ctrl+S
Problem Solving Ideas
Remove duplicate rows in R
Consider the following data frame “df” comprising of 3 rows of a 1-minute OHC of a stock.
Example:
open = c(221, 221.25, 221.25) high = c(221.75, 221.45, 221.45) close = c(221.35, 221.2, 221.2) df = data.frame(open, high, close) print(df)
As can be seen from the output, the 2nd and the 3rd rows are duplicates. If we want to retain only the unique rows, we can use the duplicated function in the following manner:
df[!duplicated(df), ]
This will remove the 3rd duplicate row, and retain the first 2 rows. In case we want the duplicate row, we can do so in the following way:
df[duplicated(df), ]
Return row number for a particular value in a column
To find the row number for a particular value in a column in a vector/data frame we can use the “which” function.
Example 1:
day = c("Mon", "Tue", "Wed", "Thurs", "Fri", "Sat", "Sun") row_number = which(day == "Sat") print(row_number)[1] 6
Example 2: Consider a 2 column data frame “df” comprising of the stock symbols and their respective closing price for the day. To find the row number corresponding to the HCL stock we call the “which” function on the Ticker column with its value selected as HCL.
Ticker = c("INFY", "TCS", "HCL") ClosePrice = c(2021, 2294, 910) data = data.frame(Ticker, ClosePrice) row_number = which(data$Ticker == "HCL") print(row_number)[1] 3
Sorting a data frame by two columns in the same order
To sort a data frame by two columns in the same order, we can use the “order” function and the “with” function. Consider a data frame comprising of stock symbols, their categorization, and the percentage change in price. We first sort the data frame based on category and then sort based on the percentage change in price.
The order function by default sorts in an ascending manner. Hence to sort both the columns in descending order we keep the decreasing argument as TRUE.
Example – Sorting a data frame by two columns
# Create a data frame Ticker = c("INFY", "TCS", "HCLTECH", "SBIN") Category = c(1, 1, 1, 2) Percent_Change = c(2.3, -0.25, 0.5, 0.25) df = data.frame(Ticker, Category, Percent_Change) print(df)
# Sorting by Category column first and then the Percent_Change column: df_sort = df[with(df, order(Category, Percent_Change, decreasing = TRUE)), ] print(df_sort)
Sorting a data frame by two columns in different order
To sort a data frame by two columns in different order, we can use the “order” function along with the “with” function.
Consider a data frame comprising of stock symbols, their categorization, and the percentage change in price. Assume that we want to sort first in an ascending order by column “Category”, and then by column “Percent_Change” in a descending order.
The order function by default sorts in an ascending manner. Hence, to sort the “Category” column we mention it as the first variable in the order function without prepending it with any sign. To sort the “Percent_Change” column in a descending order we prepend it with a negative sign.
Example – Sorting a data frame by two columns
# Create a data frame Ticker = c("INFY", "TCS", "HCLTECH", "SBIN") Category = c(1, 1, 1, 2) Percent_Change = c(2.3, -0.25, 0.5, 0.25) df = data.frame(Ticker, Category, Percent_Change) print(df)
# Sort by Category column first and then the Percent_Change column: df_sort = df[with(df, order(Category, -Percent_Change)), ] print(df_sort)
Creating two tabs in the output excel workbook
At times we want to write & save the multiple results generated after running our R script in an excel workbook on separate worksheets. To do so, we can make use of the “append” argument in the write.xlsx function.To write to an excel file, one must first install and load the xlsx package in R.
Example: In this example, we are creating two worksheets in the “Stocks.xlsx” workbook. In the worksheet named, “Top Gainers”, we save the table_1 output, while in the “Top Losers” worksheet we save the table_2 output. To create a second worksheet in the same workbook, we keep the append argument as TRUE in the second line.
write.xlsx(table_1, "Stocks.xlsx", sheetName = "Top Gainers", append = FALSE) write.xlsx(table_2, " Stocks.xlsx", sheetName = "Top Losers", append = TRUE)
Functions Demystified
Sys.time function
The Sys.time function gives the current date and time.
Example:
date_time = Sys.time() print(date_time)[1] “2017-04-15 16:25:38 IST”
The function can be used to find the time required to run a code by placing this function at the start and the end of the code. The difference between the start and the end will give the time taken to execute the code.
julian function
The julian function is used to extract the Julian date, which is the number of days since January 1, 1970. Given a date, the syntax of the function is given as:
julian(date)
Example:
date = as.Date("2010-03-15") julian(date)[1] 14683
attr(,”origin”)
[1] “1970-01-01”
Alternatively one can use the as.integer function to get the same result
second and minute functions
These functions are part of the lubridate package. The second function retrieves/sets the second component of a date-time object, while the minute function retrieves/sets the minute component. It allows Date-time objects like those belonging to the POSIXct, POSIXlt, Date, zoo, xts, and the timeSeries objects.
Example:
library(lubridate) # Retrieving the seconds We have used the ymd_hms function to parse the given object. x = ymd_hms("2016-06-01 12:23:45") second(x)[1] 45
minute function:
library(lubridate) # Retrieving the minute from a date-time object x = ymd_hms("2016-06-01 12:23:45") minute(x)[1] 23
# Retrieving the minute from a time object. We have used the hms function to parse the given object. x = hms("15:29:06") minute(x)[1] 29
Next Step
We hope you liked this bulletin. In the next weekly bulletin, we will list more interesting ways and methods plus R functions for our readers.
The post R Weekly Bulletin Vol – IV appeared first on .
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.