Unveiling Car Specs with Multidimensional Scaling in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
Visualizing similarities between data points can be tricky, especially when dealing with many features. This is where multidimensional scaling (MDS) comes in handy. It allows us to explore these relationships in a lower-dimensional space, typically 2D or 3D for easier interpretation. In R, the cmdscale()
function from base R and is a great tool for performing classical MDS.
cmdscale()
Here’s a breakdown of its arguments:
distance_matrix
: This is the key argument. It represents a matrix containing the pairwise distances between your data points. You can calculate this using thedist()
function.eig
: A logical value indicating whether you want the function to return the eigenvalues (default isFALSE
). Eigenvalues help assess the quality of the dimensionality reduction.k
: This specifies the number of dimensions for the resulting low-dimensional space (default is 2). You can choose higher values for more complex data, but visualization becomes trickier....
: Additional arguments can be used for fine-tuning the MDS process, but these are less common for basic applications.
Car Specs with MDS: A Step-by-Step Example
Let’s use the built-in mtcars
dataset in R to demonstrate the power of MDS. This dataset contains information about various car models, including aspects like horsepower, mileage, and weight. While these features provide valuable insights, visualizing all of them simultaneously can be challenging. MDS will help us explore the relationships between these car specifications in a 2D space.
Here’s the code with explanations:
# Select relevant numerical features (exclude car names) car_features <- mtcars[, c(3:11)] # Calculate pairwise distances between car features distance_matrix <- dist(car_features) head(distance_matrix, 3)
[1] 0.6153251 54.8426385 98.1117059
# Perform MDS to get a 2D representation mds_results <- cmdscale(distance_matrix, k = 2) head(mds_results, 3)
[,1] [,2] Mazda RX4 -79.62307 2.157120 Mazda RX4 Wag -79.62522 2.172370 Datsun 710 -133.87165 -5.033323
# Create a base R plot plot(mds_results[, 1], mds_results[, 2], xlab = "Dimension 1", ylab = "Dimension 2", main = "MDS of Car Specs (mtcars)") # Add text labels for car names (optional) text(mds_results, labels = rownames(mtcars), col = "blue", cex = 0.62, pos = 1)
- We load the
mtcars
dataset usingdata(mtcars)
. - We select relevant numerical features from the dataset (excluding car names) and store them in
car_features
. - The
dist()
function calculates the pairwise distances between data points based on the chosen features and stores them in thedistance_matrix
. - We run
cmdscale()
on the distance matrix, specifying two dimensions (k = 2
) for the output. The results are stored inmds_results
. - Finally, we use the base R
plot()
function to create a scatter plot. We set axis labels and a main title for the plot.
Optional Step:
- We can add text labels for each car model (using car names from
mtcars$mpg
) on the plot using thetext()
function. We set thepos
argument to 1 to position the text labels above the data points and we set thecex
argument to0.62
so the size of the text decreases.
This plot can reveal interesting patterns. Cars closer together might share similar characteristics in terms of horsepower, weight, and other specifications. You might also observe some separation based on fuel efficiency reflected by the optional text labels.
Experiment and Discover!
MDS is a powerful tool for exploring data similarity in R. Now that you’ve seen the basics of cmdscale()
and base R plotting functions, why not try it on your dataset? Remember to calculate the distance matrix appropriately based on the features you’re interested in. Play around with the number of dimensions (k
) to see how it affects the visualization. By experimenting with MDS, you might uncover hidden relationships within your car data or any other dataset you choose to explore!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.