[This article was first published on My Paper Trades, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
My thirst for statistics has been increasing. IV had another requirement, which would eventually be useful to me as well. He currently downloads FII and DII buy and sell values and its impact on Nifty manually in Excel. He suggested me to try and automate this process in R. Wow! Some more learning of R, which would eventually help me in building my strategies!
Unlike in NSE EOD Bhavcopy and BSE EOD Bhavcopy downloaders, I tried a different approach to download FII + DII stats from NSEIndia website (as if I had a choice), along with the index Nifty values and change over previous day. The National Stock Exchange of India has different structures for different pages. To download index values, you have to refer to the link http://www.nseindia.com/content/indices/ind_histvalues.htm whereas for FII and DII stats, you need to visit http://www.nseindia.com/content/equities/eq_fiidii_archives.htm.
Both of these pages are HTML forms, you have to enter parameters like index name, start and end dates. Another challenge is that each of these pages displays a maximum of 100 rows in table and if you need all rows (more than 100), you have to download a csv (dynamically generated in temp location). I faced problem in accessing the csv, as for older dates, the link generated was not valid. Hence, I decided to read the page itself, parse the table to be consumed as data frame. For instances, where the parameters generate more than 100 rows, I decided to use while loop.
OK, now here is the solution for it.
I am using the packages RHTMLForms, RCurl, and XML. While RCurl and XML packages are available at R repositories, RHTMLForms package is maintained by omegahat, and can be installed by using the following command
install.packages(‘RHTMLForms’, repos = “http://www.omegahat.org/R”)
Another peculiar problem faced is that NSE India website has enforced check for useragent, which if not specified explicitly in R, would not allow access to the desired data.
This can be achieved by defining RCurl settings before loading the library. The command is
options(RCurlOptions = list(useragent = “R”))
library(RCurl)
NOTE: 9-Aug-2011, I have modified the code a bit to enable this script with optional input parameters, like Start Date as 16-April-2007, from which both FII and DII stats are available, along with End Date as current date. Another change that has been made is to check, if file already exist; if it does not exist, the script creates the file, if it exists, it reads the last entry in the file, and starts downloading records beyond that to current date.
outputDir = “D:\\FII Stats”
filename = “FII-DII-Nifty.csv”
filename = “FII-DII-Nifty.csv”
startDate = as.Date(“2007-04-16″, order=”ymd”)
endDate =Sys.Date()
The code can be downloaded from here.
#install.packages('RHTMLForms', repos = "http://www.omegahat.org/R") library(RHTMLForms) options(RCurlOptions = list(useragent = "R")) library(RCurl) library(XML) library(timeDate) blnfileExist=FALSE ###################################################################### # User Input outputDir = "D:\\FII Stats" filename = "FII-DII-Nifty.csv" ###################################################################### ###################################################################### # Optional User Input startDate = as.Date("2007-04-16", order="ymd") endDate =Sys.Date() ###################################################################### # If file exists, read the file to retrieve dates for # which information already exists if (file.exists(file.path(outputDir, filename))) { existingStats <- read.csv(file.path(outputDir, filename), header=TRUE, sep=",") # Read Last date in the csv and add 1 to begin with next date range startDate = end(as.timeDate(existingStats$Date)) startDate = as.Date(startDate)+1 blnfileExist = TRUE }else { # User defined startDate existingStats <- NULL blnfileExist = FALSE } # Read the HTML page since we cannot use htmlParse() directly # as it does not specify the user agent or an # Accept:*.* urlNifty <- "http://www.nseindia.com/content/indices/ind_histvalues.htm"; urlFIIDIIEq <- "http://www.nseindia.com/content/equities/eq_fiidii_archives.htm" contentNifty = getURLContent(urlNifty) contentFIIDIIEq = getURLContent(urlFIIDIIEq) # Now that we have the page, parse it and use the RHTMLForms # package to create an R function that will act as an interface # to the form. docNifty = htmlParse(contentNifty, asText = TRUE) docFIIDIIEq = htmlParse(contentFIIDIIEq, asText = TRUE) # need to set the URL for this document since we read it from # text, rather than from the URL directly docName(docNifty) = urlNifty docName(docFIIDIIEq) = urlFIIDIIEq # Create the form description and generate the R # function "call" the formNifty = getHTMLFormDescription(docNifty)[[1]] funNifty = createFunction(formNifty) formFIIDIIEq = getHTMLFormDescription(docFIIDIIEq)[[1]] funFIIDIIEq = createFunction(formFIIDIIEq) # now we can invoke the form from R. We only need 2 # inputs - FromDate and ToDate #Since the NSE URLs only display a maximum of 100 records, # we shall attempt the range in loop myStDt = startDate while (myStDt <= endDate){ if (endDate<= myStDt + 50){ myEnDt = endDate }else { myEnDt = myStDt + 50 } print(paste("Downloading from",myStDt,"to", myEnDt )) Nifty = funNifty(fromDate = as.character(myStDt-5, "%d-%m-%Y"), toDate = as.character(myEnDt, "%d-%m-%Y"), indexType="S&P CNX NIFTY") FIIDIIEq = funFIIDIIEq (fromDate = as.character(myStDt, "%d-%m-%Y"), toDate = as.character(myEnDt, "%d-%m-%Y"), category="all") # Having looked at the tables, I think we want the the 4th one. tableNifty = readHTMLTable(htmlParse(Nifty, asText = TRUE), which = 4, skip.rows = 3, trim=TRUE, as.data.frame = TRUE, header = TRUE, stringsAsFactors = FALSE) # Having looked at the tables, I think we want the the 4th one. tableFIIDIIEq = readHTMLTable(htmlParse(FIIDIIEq, asText = TRUE), which = 4, trim=TRUE, as.data.frame = TRUE, header = TRUE, stringsAsFactors = FALSE) #Select only FII stats #Format Date Column from string to Date type tableFIIDIIEq$Date <- as.Date(tableFIIDIIEq$Date, format="%d-%b-%Y") #Order by Date in Ascending Order tableFIIDIIEq<-tableFIIDIIEq[order(tableFIIDIIEq$Date,decreasing = TRUE),] dfFIIEq <-subset(tableFIIDIIEq, Category=="FII") colnames(dfFIIEq)[3] <- "FIIBuyValue" colnames(dfFIIEq)[4] <- "FIISellValue" colnames(dfFIIEq)[5] <- "FIINetValue" #Select only DII stats dfDIIEq <-subset(tableFIIDIIEq, Category=="DII") colnames(dfDIIEq)[3] <- "DIIBuyValue" colnames(dfDIIEq)[4] <- "DIISellValue" colnames(dfDIIEq)[5] <- "DIINetValue" #Merge FII and DII Stats dfFIIDIIEq<-merge(dfFIIEq,dfDIIEq, by.x="Date", by.y="Date") #Convert into numeric for addition dfFIIDIIEq$DIINetValue <- as.numeric(dfFIIDIIEq$DIINetValue) dfFIIDIIEq$FIINetValue <- as.numeric(dfFIIDIIEq$FIINetValue) # Find Effective Inflow dfFIIDIIEq$EffectiveInflow <- dfFIIDIIEq$FIINetValue + dfFIIDIIEq$DIINetValue #Process Nifty dfNifty <- tableNifty colnames(dfNifty)[6] <- "Volume" colnames(dfNifty)[7] <- "TurnoverInCr" #Format Date Column from string to Date type dfNifty$Date <- as.Date(dfNifty$Date, format="%d-%b-%Y") #Format CLose Column from string to Numeric dfNifty$Close <- as.numeric(dfNifty$Close) #Order by Date in Ascending Order dfNifty <-dfNifty[order(dfNifty$Date,decreasing = TRUE),] #Create a function to find out Change in Nifty Closing over previous day FUN=function(a) c(diff(a), NA) dfNifty$Change<- FUN(dfNifty$Close)*-1 #Store the merged dataframe in temp and then join it with earlier results temp <- merge(dfNifty,dfFIIDIIEq, by.x="Date", by.y="Date") if (startDate == myStDt){ dfNiftyFIIDIIEq = temp }else { dfNiftyFIIDIIEq <- rbind(dfNiftyFIIDIIEq,temp) } myStDt = myStDt + 51 } closeAllConnections() ###################################################################### #Get Rid of unwanted columns dfNiftyFIIDIIEq$Category.x <- NULL dfNiftyFIIDIIEq$Category.y <- NULL # dfNiftyFIIDIIEq$Open <- NULL # dfNiftyFIIDIIEq$High <- NULL # dfNiftyFIIDIIEq$Low <- NULL # dfNiftyFIIDIIEq$Volume <- NULL # dfNiftyFIIDIIEq$TurnoverInCr<- NULL # If file exists, merge the retrieved records if (blnfileExist) { print("Appending to existing file") #existingStats$Date <- as.Date(existingStats$Date, format = "%d-%m-%Y") dfNiftyFIIDIIEq$Date <- as.Date(dfNiftyFIIDIIEq$Date, format = "%d-%m-%Y") dfNiftyFIIDIIEq <- rbind(dfNiftyFIIDIIEq,existingStats) dfNiftyFIIDIIEq <- dfNiftyFIIDIIEq[order(dfNiftyFIIDIIEq$Date, decreasing=TRUE),] }else { print("Writing to a new file") dfNiftyFIIDIIEq <- dfNiftyFIIDIIEq[order(dfNiftyFIIDIIEq$Date, decreasing=TRUE),] } #Finally write the csv file write.csv(dfNiftyFIIDIIEq ,file=file.path(outputDir, filename),row.names = FALSE)
Next steps, add more indices in this list and try to figure out any co-relation.
To leave a comment for the author, please follow the link and comment on their blog: My Paper Trades.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.