Site icon R-bloggers

How to Plot Observed and Predicted values in R

[This article was first published on finnstats », and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Plot Observed and Predicted values in R, In order to visualize the discrepancies between the predicted and actual values, you may want to plot the predicted values of a regression model in R.

This tutorial demonstrates how to make this style of the plot using R and ggplot2.

Approach 1: Plot of observed and predicted values in Base R

The following code demonstrates how to construct a plot of expected vs. actual values after fitting a multiple linear regression model in R.

How to find z score in R-Easy Calculation-Quick Guide »

Load Library and dataset

library(mlbench)
data("BostonHousing2")
head(BostonHousing2)
   town tract      lon     lat medv cmedv    crim zn indus chas   nox    rm  age
1     Nahant  2011 -70.9550 42.2550 24.0  24.0 0.00632 18  2.31    0 0.538 6.575 65.2
2 Swampscott  2021 -70.9500 42.2875 21.6  21.6 0.02731  0  7.07    0 0.469 6.421 78.9
3 Swampscott  2022 -70.9360 42.2830 34.7  34.7 0.02729  0  7.07    0 0.469 7.185 61.1
4 Marblehead  2031 -70.9280 42.2930 33.4  33.4 0.03237  0  2.18    0 0.458 6.998 45.8
5 Marblehead  2032 -70.9220 42.2980 36.2  36.2 0.06905  0  2.18    0 0.458 7.147 54.2
6 Marblehead  2033 -70.9165 42.3040 28.7  28.7 0.02985  0  2.18    0 0.458 6.430 58.7
     dis rad tax ptratio      b lstat
1 4.0900   1 296    15.3 396.90  4.98
2 4.9671   2 242    17.8 396.90  9.14
3 4.9671   2 242    17.8 392.83  4.03
4 6.0622   3 222    18.7 394.63  2.94
5 6.0622   3 222    18.7 396.90  5.33
6 6.0622   3 222    18.7 394.12  5.21
model <- lm(log(price) ~ log(carat), data = diamonds)

plot predicted vs. actual values

plot(x=predict(model), y= BostonHousing2$medv,
     xlab='Predicted Values',
     ylab='Actual Values',
     main='Predicted vs. Actual Values')
abline(a=0, b=1)

The x-axis shows the model’s predicted values, while the y-axis shows the dataset’s actual values. The estimated regression line is the diagonal line in the center of the plot.

Because each data point is quite close to the projected regression line, we may conclude that the regression model fits the data reasonably well.

Point Biserial Correlation in R-Quick Guide »

We can also make a data frame that displays the actual and expected values for each data point:

data <- data.frame(actual= BostonHousing2$medv, predicted=predict(model))
head(data)

view data frame values

 actual predicted
1   24.0  25.92488
2   21.6  24.62709
3   34.7  31.03788
4   33.4  29.46740
5   36.2  30.70795
6   28.7  24.70194

Approach2: Plot of Predicted vs. Observed Values in ggplot2

Using the ggplot2 data visualization package, the following code explains how to make a plot of predicted vs. actual values.

library(ggplot2)

plot predicted vs. actual values

ggplot(df, aes(x=predict(model), y= BostonHousing2$medv)) +
  geom_point() +
  geom_abline(intercept=0, slope=1) +
  labs(x='Predicted Values', y='Actual Values', title='Predicted vs. Actual Values')

The predicted values from the model are displayed on the x-axis, while the actual values from the dataset are displayed on the y-axis.

How to Create a Covariance Matrix in R

The post How to Plot Observed and Predicted values in R appeared first on finnstats.

To leave a comment for the author, please follow the link and comment on their blog: finnstats ».

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.