Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
infomation_schema.TABLES contains the following fields: TABLE_SCHEMA, TABLE_NAME, TABLE_ROWS, AVG_ROW_LENGTH and MAX_DATA_LENGTH (and a few others). We can first have a look at the relative sizes of the schemas with the MySQL query “SELECT TABLE_SCHEMA,SUM(DATA_LENGTH) SCHEMA_LENGTH FROM information_schema.TABLES WHERE TABLE_SCHEMA!='information_schema' GROUP BY TABLE_SCHEMA“.
1 2 3 4 5 6 7 8 9 10 11 12 13 | library("ggplot2") # You'll need ggplot2 0.9 for this. library("reshape2") library("RMySQL") connection<-dbConnect(MySQL(), user="username", password="XXXXXX",host="127.0.0.1",port=3306,dbname='') query<-"SELECT TABLE_SCHEMA,SUM(DATA_LENGTH) SCHEMA_LENGTH FROM information_schema.TABLES WHERE TABLE_SCHEMA!='information_schema' GROUP BY TABLE_SCHEMA" result<-dbGetQuery(connection,query) result$TABLE_SCHEMA<-reorder(result$TABLE_SCHEMA,result$SCHEMA_LENGTH) p<-ggplot(result)+geom_bar(aes(x=TABLE_SCHEMA,y=SCHEMA_LENGTH))+coord_flip() p<-p+xlab("Size")+ylab("") p<-p+opts(title="Schemas' size") print(p) |
And for the whole overview, let’s break each schema down by tables:
1 2 3 4 5 6 7 | query<-"SELECT TABLE_SCHEMA,TABLE_NAME,TABLE_ROWS,DATA_LENGTH FROM information_schema.TABLES WHERE TABLE_SCHEMA!='information_schema'" result<-dbGetQuery(connection,query) result<-within(result,TABLE_NAME<-factor(TABLE_NAME,levels=sort(TABLE_NAME,decreasing=TRUE))) p<-ggplot(result)+geom_bar(aes(x=TABLE_NAME,y=DATA_LENGTH))+coord_flip()+facet_wrap(~TABLE_SCHEMA,scales='free') p<-p+xlab("Size")+ylab("") p<-p+opts(title="Tables' size") print(p) |
Also, using the AVG_ROW_LENGTH and MAX_DATA_LENGTH and assuming a relatively constant row length, we can derive the maximum number of rows that a table can use, which gives us an estimate of how much space there is left:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | query<-"SELECT TABLE_SCHEMA,TABLE_NAME,100*TABLE_ROWS/FLOOR(MAX_DATA_LENGTH/AVG_ROW_LENGTH) AS USED FROM information_schema.TABLES WHERE TABLE_SCHEMA!='information_schema'" # query<-"SELECT TABLE_SCHEMA,TABLE_NAME,RAND(42)*100 AS USED FROM information_schema.TABLES WHERE TABLE_SCHEMA!='information_schema'" result<-dbGetQuery(connection,query) result$LEFTOVER<-100-result$USED result<-within(result,TABLE_NAME<-factor(TABLE_NAME,levels=sort(TABLE_NAME,decreasing=TRUE))) result<-melt(result,id.vars=c("TABLE_SCHEMA","TABLE_NAME"),variable.name='TYPE',value.name='PROPORTION',na.rm=TRUE) p<-ggplot(result) p<-p+geom_bar(aes(x=TABLE_NAME,y=PROPORTION,fill=TYPE),stat='identity') p<-p+coord_flip()+facet_wrap(~TABLE_SCHEMA,scales='free') p<-p+scale_fill_manual(values=c("USED"='#DD0000',LEFTOVER='#AAAAAA')) p<-p+xlab('')+ylab('')+opts(title="Tables' usage") print(p) query<-"SELECT TABLE_SCHEMA, MAX(100*TABLE_ROWS/FLOOR(MAX_DATA_LENGTH/AVG_ROW_LENGTH)) AS USED FROM information_schema.TABLES WHERE TABLE_SCHEMA!='information_schema' GROUP BY TABLE_SCHEMA" # query<-"SELECT TABLE_SCHEMA, MAX(100*RAND(42)) AS USED FROM information_schema.TABLES WHERE TABLE_SCHEMA!='information_schema' GROUP BY TABLE_SCHEMA" result<-dbGetQuery(connection,query) result$LEFTOVER<-100-result$USED result$TABLE_SCHEMA<-reorder(result$TABLE_SCHEMA,result$USED) result<-melt(result,id.vars=c("TABLE_SCHEMA"),variable.name='TYPE',value.name='PROPORTION',na.rm=TRUE) p<-ggplot(result) p<-p+geom_bar(aes(x=TABLE_SCHEMA,y=PROPORTION,fill=TYPE),stat='identity') p<-p+coord_flip() p<-p+scale_fill_manual(values=c("USED"='#DD0000',LEFTOVER='#AAAAAA')) p<-p+xlab("")+ylab("")+opts(title="Largest Usage") print(p) dbDisconnect(connection) |
About dbConnect(): I left it here to make things easier to replicate but I normally call a simple function which is just a wrapper for it, with my username and password in. This way my credentials are in one single place instead of all over my scripts.
PS: This is my first anniveRsary! I’ve been using R for a year now. And I’m certainly planning to carry on.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.