Create a line graph with ggplot

Use the geom_line() aesthetic to draw line graphs and customize its styling using the color parameter. Specify which coordinates to use for each line with the group parameter. Create your first line graph using geom_line() Define how different lines are connected using the group parameter Change … Continue reading

Data Science Conference Austria 2020

Data Science Conference Austria 2020 Data Science Conference (DSC) Austria is knocking on YOUR door - and it is all for free! ???????????? DSC Austria will happen on September 8-9th and during the event, you will get a chance to listen to over 15 high-quality talks and 8 tech tutorials on the … Continue reading

Specify additional aesthetics for points

ggplot2 implements the grammar of graphics to map attributes from a data set to plot features through aesthetics. This framework can be used to adjust the point size, color and transparency alpha of points in a scatter plot. Add additional plotting dimensions through aesthetics Adjust the point … Continue reading

Create a scatter plot with ggplot

Make your first steps with the ggplot2 package to create a scatter plot. Use the grammar-of-graphics to map data set attributes to your plot and connect different layers using the + operator. Define a dataset for the plot using the ggplot() function Specify a geometric layer using the … Continue reading

Why data visualization is important

Data visualization is not only important to communicate results but also a powerful technique for exploratory data analysis. Each plot type like scatter plots, line graphs, bar charts and histograms has its own purpose and can be leveraged in a powerful way using the ggplot2 package. Understand … Continue reading

Create a data transformation pipeline

All data transformation functions in dplyr can be connected through the pipe %>% operator to create powerful and yet expressive data transformation pipelines. Use the pipe operator %>% to combine multiple dplyr functions into one pipeline %>% filter(___) %>% select(___) %>% … Continue reading

Sort data frames by columns

To select areas of interest in a data frame they often need to be ordered by specific columns. The dplyr arrange() function supports data frame orderings by multiple columns in ascending and descending order. Use the arrange() function to sort data frames. Sort data frames by multiple columns … Continue reading

Filter data frame rows

We often want to operate only on a specific subset of rows of a data frame. The dplyr filter() function provides a flexible way to extract the rows of interest based on multiple conditions. Use the filter() function to sort out the rows of a data frame that fulfill a specified condition Filter a … Continue reading

Select columns from a data frame

To select only a specific set of interesting data frame columns dplyr offers the select() function to extract columns by names, indices and ranges. You can even rename extracted columns with select(). Learn to use the select() function Select columns from a data frame by name or index Rename … Continue reading

Introduction to dplyr

Learn what dplyr does Get an overview of Select, Filter and Sort Learn what Joins, Aggregations and Pipelines are What is dplyr There’s the joke that 80 percent of data science is cleaning the data and 20 percent is complaining about cleaning the data. Anthony Goldbloom, Founder and CEO of … Continue reading

Select first or last rows of a data frame

We often do not need to look at all the contents of a data frame in the console. Instead, only parts of it are sufficient like the top or bottom retrieved through the head() and tail() functions. Select the top of a data frame Select the bottom of a data frame Specify the number of lines to … Continue reading

Determine the size of a data frame

The size of a data frame, like the number of rows or columns, is often required and can be determined in various ways. Get number of rows of a data frame Get number of columns of a data frame Get dimensions of a data frame nrow(___) ncol(___) dim(___) length(___) Data Frame … Continue reading