Exploratory data analysis on P/E ratio of Indian Stocks
[This article was first published on meet Saptarsi, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Price Earnings ratio (P/E) is one of the very popular ratios reported with all stocks. Very simply this is thought as – Current Market Price / Earning per Share. An operational definition of Earning per Share would be Total profit divided by # of Shares . I will redirect interested readers for further reading to
In this post, I would just like to show, how we can grab P/E data from Web and create some visualizations on it. My focus right now is Indian stocks and I intend to use the below website
So my first step is gearing up for the data extraction and essentially that is the most non-trivial task. As shown in the figure below, there is separate pages for each sector and we need to click on individual links , to go to that page and get the P/E ratios.
Here is something , I did outside ‘r’ , creating a csv file with the sector names , using delimiters while importing text and paste special as transpose , here is how my csv file would look. I would never discourage using multiple tools as this would be required to solve real world issues
So now I can import this in a dataset and read one row at a time and go to necessary URLs , but god have different plans J , it’s not that straightforward
Case 1 : Single word sector names :
We have sector as ‘Banks’and the sector link is as below
Again it is a no brainer , we can pick up the base url , append the sector name after a forward slash and then append the string ‘-Sector’ , this is true for most single word sector names like ‘FMCG’ , ‘Tyres’ , ‘Heathcare’ etc
Case 2: Multiple words without ‘-‘ , ‘&’ and ‘/’
We have sector as ‘Tobacco Products’ and the sector link is as below
This is also not that difficult apart from adding the ‘-Sector’ we need replace the spaces by a ‘-‘ .
Case 3: Multiple words with a ‘-‘
We have sector name as ‘IT-Software’, where we have to remove other spaces if exiting. There can be several other cases, but for discussion sake , I will limit myself here
Case 4: Multiple words with a ‘/‘
We have sector name as ‘Stock/ Commodity Brokers’, so the “/” needs to be removed
# Reading in dataset
sectorsv1 <- read.csv("C:/Users/user/Desktop/Datasets/sectorsv1.csv")
# Converting to a matrix , this is a practice generally I follow
sectorvm<-as.matrix(sectorsv1)
we can access individual sectors by , sectorvm[rowno,colon]
pe<-c()
cname<-c()
cnt<-0
baseurl<-'http://www.indiainfoline.com/MarketStatistics/PE-Ratios/'
sectorvm<-as.matrix(sectorsv1)
for(i in 1:nrow(sectorvm))
{
securl<-sectorvm[i,1]
# Fixed true indicated the string is to matched as is and is not a regular expression
# Substitution of the different cases as we explained , we will point out using gsub instead of sub
# else only the first instance will be replaced
if(length(grep(‘ ‘,securl,fixed=TRUE))!=1)
{
securl<-paste(securl,'-Sector', sep="")
}
else
{
securl<-gsub(' ', '-', securl, ignore.case =FALSE, fixed=TRUE)
if(length(grep(‘—‘,securl,fixed=TRUE))==1)
{
securl<-gsub(' ---', '-', securl, ignore.case =FALSE, fixed=TRUE)
}
if(length(grep(‘&’,securl,fixed=TRUE))==1)
{
securl<-gsub('&', 'and', securl, ignore.case =FALSE, fixed=TRUE)
}
if(length(grep(‘/’,securl,fixed=TRUE))==1)
{
securl<-gsub('/', '', securl, ignore.case =FALSE, fixed=TRUE)
}
if(length(grep(‘,’,securl,fixed=TRUE))==1)
{
securl<-gsub(',', '', securl, ignore.case =FALSE, fixed=TRUE)
}
securl<-paste(securl,'-Sector', sep="")
}
fullurl<-paste(baseurl,securl, sep="")
print(fullurl)
if (url.exists(fullurl))
{
petbls<-readHTMLTable(fullurl)
# Exploring the tables we found out relevant information on table 2
# Also the data is getting stored as factor , just doing an as.numeric will not suffice
# we need to do an as.character and then an as.numeric
pe<-c(pe,as.numeric(as.character(petbls[[2]]$PE)))
cname<-c(cname, as.character (petbls[[2]]$Company))
cnt = cnt + 1
}
}
Different functions that we have used are explained as below
readHTMLTables -> Given a url , this function can retrieve the contents of the tag from html page. We need to use appropriate no. for the same. Like in this case we have used table no 2.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Never miss an update!
Click here to close (This popup will not appear again)
n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se |
1797 | 59.71 | 76.92 | 20.09 | 46.64 | 29.79 | 0 | 587.5 | 587.5 | 2.15 | 7.25 | 1.81 |
hist(pe,col='blue',main='P/E Distribution')
We get the below histogram for the P/E ratio , which shows it is nowhere near a normal distribution , with it’s peakedness and skew as confirmed from the summary statistics as well
We will never the less do a normalty test
shapiro.test(pe) Shapiro-Wilk normality test data: pe W = 0.7496, p-value < 2.2e-16
Basically the null hypothesis is, the values come from a normal distribution and we see the p value to be very insignificant and hence we can easily reject the null.
Drawing a box plot on the P/E ratios
boxplot(pe,col='blue')
Finding the outliers
boxplot.stats(pe)$out 484.33 327.91 587.50 cname[which(pe %in% boxplot.stats(pe)$out)]
[1] "Bajaj Electrical" "BF Utilities" "Ruchi Infrastr."
Of course no prize guessing we should stay out of these stocks
So if we summarize this is kind of exploratory data analysis on PE ratio of Indian stocks
· We saw, we can get content out of url and html tables
· We added them in a data frame
· Looked at summary statistics , histogram and did a normality test
· Plotted a box plot and found the outliers
To leave a comment for the author, please follow the link and comment on their blog: meet Saptarsi.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Copyright © 2022 | MH Corporate basic by MH Themes