Site icon R-bloggers

Unveiling Car Specs with Multidimensional Scaling in R

[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
< section id="introduction" class="level1">

Introduction

Visualizing similarities between data points can be tricky, especially when dealing with many features. This is where multidimensional scaling (MDS) comes in handy. It allows us to explore these relationships in a lower-dimensional space, typically 2D or 3D for easier interpretation. In R, the cmdscale() function from base R and is a great tool for performing classical MDS.

< section id="cmdscale" class="level1">

cmdscale()

Here’s a breakdown of its arguments:

< section id="car-specs-with-mds-a-step-by-step-example" class="level1">

Car Specs with MDS: A Step-by-Step Example

Let’s use the built-in mtcars dataset in R to demonstrate the power of MDS. This dataset contains information about various car models, including aspects like horsepower, mileage, and weight. While these features provide valuable insights, visualizing all of them simultaneously can be challenging. MDS will help us explore the relationships between these car specifications in a 2D space.

Here’s the code with explanations:

# Select relevant numerical features (exclude car names)
car_features <- mtcars[, c(3:11)]

# Calculate pairwise distances between car features
distance_matrix <- dist(car_features)
head(distance_matrix, 3)
[1]  0.6153251 54.8426385 98.1117059
# Perform MDS to get a 2D representation
mds_results <- cmdscale(distance_matrix, k = 2)
head(mds_results, 3)
                    [,1]      [,2]
Mazda RX4      -79.62307  2.157120
Mazda RX4 Wag  -79.62522  2.172370
Datsun 710    -133.87165 -5.033323
# Create a base R plot
plot(mds_results[, 1], mds_results[, 2], 
     xlab = "Dimension 1", ylab = "Dimension 2",
     main = "MDS of Car Specs (mtcars)")

# Add text labels for car names (optional)
text(mds_results, labels = rownames(mtcars), col = "blue", cex = 0.62,
     pos = 1)

  1. We load the mtcars dataset using data(mtcars).
  2. We select relevant numerical features from the dataset (excluding car names) and store them in car_features.
  3. The dist() function calculates the pairwise distances between data points based on the chosen features and stores them in the distance_matrix.
  4. We run cmdscale() on the distance matrix, specifying two dimensions (k = 2) for the output. The results are stored in mds_results.
  5. Finally, we use the base R plot() function to create a scatter plot. We set axis labels and a main title for the plot.

Optional Step:

This plot can reveal interesting patterns. Cars closer together might share similar characteristics in terms of horsepower, weight, and other specifications. You might also observe some separation based on fuel efficiency reflected by the optional text labels.

< section id="experiment-and-discover" class="level1">

Experiment and Discover!

MDS is a powerful tool for exploring data similarity in R. Now that you’ve seen the basics of cmdscale() and base R plotting functions, why not try it on your dataset? Remember to calculate the distance matrix appropriately based on the features you’re interested in. Play around with the number of dimensions (k) to see how it affects the visualization. By experimenting with MDS, you might uncover hidden relationships within your car data or any other dataset you choose to explore!

To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version