Gini coefficient, concentration measurement: an implementation in R
[This article was first published on The Beginner Programmer, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Another subject we took in the statistics class was the Gini index.
Gini index or ratio or coefficient is used to calculate how much a certain transferable phenomenon such as income or stocks for instance, is concentrated.
For example, say you are evaluating a company and you’d like to know more about how the shares are divided among the shareholders. You could use Gini index for that!
I’ve calculated the index using R and random data you can download here. In case you’d like to know more about Gini index check here.
Here my simple R implementation of the index.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Load data | |
tb <- read.table("C:\\b.txt",header=TRUE,sep=",") | |
# Add 5 new columns for analysis purposes | |
for(i in 1:5) | |
{ | |
cbind(tb,0) | |
} | |
# Storing the number of rows and columns | |
nRow <- nrow(tb) | |
nCol <- ncol(tb) | |
# Cumulative frequencies | |
i <- 1 | |
totalF = sum(tb[,2]) | |
while(i <= nRow) | |
{ | |
if(i==1) | |
{ | |
tb[1,3] <- tb[1,2] | |
tb[1,4] <- tb[1,2]/1000 | |
}else{ | |
tb[i,3] <- tb[i-1,3]+tb[i,2] | |
tb[i,4] <- tb[i-1,3]/totalF + tb[i,2]/1000 | |
} | |
i <- i + 1 | |
} | |
i <- 1 | |
while(i<=nRow) | |
{ | |
tb[i,5] <- tb[i,1]*tb[i,2] | |
if(i==1) | |
{ | |
tb[i,6] <- tb[i,5] | |
}else{ | |
tb[i,6] <- tb[i-1,6]+tb[i,5] | |
} | |
i <- i + 1 | |
} | |
i <- 1 | |
while(i <= nRow) | |
{ | |
tb[i,7] <- tb[i,6]/sum(tb[,5]) | |
i = i +1 | |
} | |
# Show and plot the data | |
tb | |
a <- c(0,1) | |
b <- c(0,1) | |
c <- c(0,tb[,4]) | |
d <- c(0,tb[,7]) | |
plot(a,b,main="Concentration",type="l",col="green",lwd=2) | |
lines(c,d,type="b",col="red",ylab="Relative freq",xlab="Relative freq",lwd=2) | |
# Calculate Gini's R concentration index | |
getR <- function(mat) | |
{ | |
R <- 0.5 | |
area = 0.5*tb[1,4]*tb[1,7] | |
i <- 2 | |
while(i <= nRow) | |
{ | |
area = area + 0.5*(tb[i,4]-tb[i-1,4])*(tb[i-1,7]+tb[i,7]) | |
i = i + 1 | |
} | |
acmax <- (sum(tb[,2])-1)/(2*sum(tb[,2])) | |
R <- (R - area)/acmax | |
return(R) | |
} | |
# Print data | |
paste("Concentration index R is: ",getR(tb)*100,"%") |
Here below are the results
It looks like the data I used shows a 24% concentration. Cool!
To leave a comment for the author, please follow the link and comment on their blog: The Beginner Programmer.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.