A shiny app for exploratory data analysis
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I recently learnt how to build basic R Shiny apps. To practice using Shiny, I created a simple app that you can use to perform simple exploratory data analysis. You can use the app here to play around with the diamonds
dataset from the ggplot2
package. To use the app for your own dataset, download the raw R code here (just the app.R
file) and assign your dataset to raw_df
. In the rest of this post, I outline how to use this app.
(Credits: I worked off the source code for the “Diamonds Explorer” app. There are a few versions of this app out there and I can’t find the exact source I used, but it was very close to the source code of this version.)
As you can see from the screenshot below, the main panel (on the right) has 4 tabs. The last two tabs simply give the output of calling the summary
and str
functions on the entire dataset; they are not affected by the controls in the side panel. The “Data Snippet” panel prints up to 15 rows of the dataset for a peek into what the dataset looks like. (These 15 rows are the first 15 rows of the dataset used to create the plot on the “Plot” tab.)
The most interesting tab is probably the “Plot” tab. First let me describe how the app selects the dataset it makes the plot for. By default, it picks 1000 random observations or all the observations if the dataset has less than 1000 rows. The user can input the random seed for reproducibility. The user can also control the number of observations using the slider, and choose the observations randomly or take the top from the dataset.
The type of plot the app makes depends on the type of variables given to it. In the screenshot above, one numeric variable and one non-numeric variable is given, so the app makes a boxplot. If two numeric variables are given, it makes a scatterplot:
For scatterplots, the user has the option to jitter the points and/or to add a smoothing line:
If two non-numeric variables are given, the app makes a heatmap depicting how often each combination is present in the data:
The plots above depict the joint distribution of two variables in the dataset. If the user wants a visualization for just one variable, the user can set the “Y” variable to “None”. If the “X” variable is numeric, the app plots a histogram:
If the “X” variable is non-numeric, the app plots a bar plot showing counts:
Finally, let’s talk about color. For simplicity, the app only allows plots to be colored by non-numeric variables. Below is an example of a colored scatterplot:
As the screenshot below shows, color works for boxplots too. (Color does not work for heatmaps.)
Color can be used for the one-dimensional plots as well:
There are certainly several improvements that can be made to this app. For one, it would be nice if the user could upload their dataset through the app (instead of downloading my source code and assigning their dataset to the raw_df
variable). There could also be more controls to let the user change certain aspects of the plot (e.g. transparency of points), but at some point the UI might become too overwhelming for the user.
Happy exploring!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.