Prediction model with HANA and R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
These days, I have been reading and playing a lot with R, and I really come to love it…of course, I don’t have a clue on those weird statistics formulas, but it doesn’t mean I can’t use R and try do some awesome stuff with it.
So, yesterday I was thinking about doing another integration between HANA and R, my new adopted kids, so I came with the idea of building a prediction model for a flight company. I followed this steps.
1.- First, I need to choose a table, so I picked SNVOICE:
This table offers us, the carrier id, the date and book id, meaning the amount of tickets sold in a particular day. And from here when can do some calculation and determine how many tickets were sold in each month of a particular year.
2.- I needed a table to store my new information, so I created the table TICKETS_BY_YEAR:
3.- I needed a Procedure script to analyse the table, determine the total amount per day of the month and then gave a grand total per month.
CREATE PROCEDURE GetTicketsByMonth (IN var_year NVARCHAR(4),IN var_carrid NVARCHAR(2)) LANGUAGE SQLSCRIPT AS v_found NVARCHAR(2) := 1; sum_bookid INT; v_date NVARCHAR(8) := ''; BEGIN TT_MONTH = select fldate, count(bookid) as "BOOKID" from sflight.snvoice where year(fldate) = VAR_YEAR and carrid = VAR_CARRID group by fldate order by fldate asc; v_date := (:var_year * 10000) + 101; while :v_found <= 12 do select sum(bookid) into sum_bookid from :TT_MONTH where month(fldate) = :v_found; insert into TICKETS_BY_YEAR values(v_date,sum_bookid); v_date := :v_date + 100; v_found := :v_found + 1; end while; END;
4.- Of course...I needed to call my Procedure...
CALL P075400.GetTicketsByMonth('2011','''AA''');
5.- Once finished, I checked my table to see if everything worked as expected...
6.- After realizing that my data was nice and clean, I exported to an .CSV file (Sorry...no pics this time...I already post it in a previous blog)
7.- I went to my R Studio and start coding...
Flight_Tickets=read.csv(file="Flight_Tickets.csv",header=TRUE) period=Flight_Tickets$PERIOD tickets=Flight_Tickets$TICKETS var_year=substr(period[1],1,4) var_year=as.integer(var_year) var_year=var_year+1 var_year=as.character(var_year) new_period=gsub("^\\d{4}",var_year,period) next_year=data.frame(year=new_period,StringsAsFactors=FALSE) prt.lm=lm(tickets ~ period) pred=predict(prt.lm,next_year,interval="none") plot(tickets,type="b", col="red", main="Annual Tickets Sale", xlab="Months",ylab="Tickets") lines(pred,type="b",col="blue") legend("bottomleft",inset=.05,title="Real vs. Predicted", c("Real","Predicted"), lty=c(1,1),col=c("red","blue"))
8.- I watch my generated graphic showing the real tickets sale vs. the predicted tickets sale. The real is for every month of 2011 and the predicted for every month of 2012.
![](https://i2.wp.com/3.bp.blogspot.com/-9Yew7nJVl1A/TxGAmu-RGKI/AAAAAAAABsU/FOhGlP-q448/s400/Annual_Tickets_Sale.jpeg?w=578)
9.- Nothing to do here...it's done -:)
10.- See you next time with more HANA, R or another nice technology.
Greetings,
Blag.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.