Hi there. In this post, I feature some functions from R’s ggvis package for data visualization. This work is based on some trial and error in RStudio/RMarkdown.
When it comes to data visualization in R, ggplot2 usually comes to mind. The ggvis package in R provides a good alternative to ggplot2 and it also includes some interactive plot features.
The screenshot below is from http://ggvis.rstudio.com/ and it gives a brief explanation of what ggvis is about.
For ggvis installation into R use the code:
# Installation:
install.packages("ggvis")
To load in the ggvis
package, use the code:
# Load the package into R:
library(ggvis)
In this histogram example, I simulate 10000 standard uniform random variables and display the results.
##### Histogram (Standard Uniform Distribution Samples):
unifs <- runif(n = 10000, min = 0, max = 1)
unifs <- data.frame(unifs)
# Check/Preview dataframe:
head(unifs)
## unifs
## 1 0.4698736
## 2 0.3627675
## 3 0.1290401
## 4 0.6071039
## 5 0.7514981
## 6 0.2334543
# ggvis Histogram Plot:
# Source For Plot Title: https://stackoverflow.com/questions/25018598/add-a-plot-title-to-ggvis
unifs %>% ggvis(~unifs) %>%
layer_histograms(boundary = 0) %>%
add_axis("x", title = "\n Values", title_offset = 50) %>%
add_axis("y", title = "Counts \n", title_offset = 50) %>%
add_axis("x", orient = "top", ticks = 0,
title = "Histogram Of Simulated Standard Uniforms \n",
properties = axis_props(axis = list(stroke = "white"), labels = list(fontSize = 0)))
## Guessing width = 0.05 # range / 20
The layer_histograms()
part gives the histograms while the add_axis()
parts gives labels and the plot title. A workaround solution (reference: here) was used for the plot title. The title_offset argument was for spacing on the axes labels.
Our histogram of simulated uniform random variables does not exactly match a rectangle but it is close to it (uniform density function).
This bar graph example is based on a simulation of 10000 dice rolls. (The die is six sided.)
The sample()
function in R allows for random selection of values or strings.
##### Bar Graph:
# Simulate dice rolls (1, 2, 3, 4, 5 and 6)
dice_sim <- sample(x = seq(from = 1, to = 6, by = 1), size = 10000, replace = TRUE)
dice_sim <- data.frame(dice_sim)
# Preview the data:
head(dice_sim)
## dice_sim
## 1 6
## 2 5
## 3 3
## 4 4
## 5 5
## 6 1
str(dice_sim)
## 'data.frame': 10000 obs. of 1 variable:
## $ dice_sim: num 6 5 3 4 5 1 4 3 3 1 ...
From the str(dice_sim)
function, the variable dice_sim comes out as numeric. I want the dice_sim variable to be a factor variable with factors of 1 to 6.
# Convert to factors:
dice_sim$dice_sim <- as.factor(dice_sim$dice_sim)
To obtain bar graphs in R’s ggvis package, you need the layer_bars()
function.
# Bar Graph:
dice_sim %>% ggvis(~dice_sim) %>%
layer_bars() %>%
add_axis("x", title = "\n Outcome", title_offset = 50) %>%
add_axis("y", title = "Counts \n", title_offset = 50) %>%
add_axis("x", orient = "top", ticks = 0,
title = "Dice Roll Simulation Results \n",
properties = axis_props(axis = list(stroke = "white"),
labels = list(fontSize = 0)))
From our simulations results, the number 2 appears the most often (mode). These results do not match the theoretical/expected result of \(10000/6 \approx 1667\) for each outcome. Remember that in many cases that theoretical results do not necessarily match the results in real life.
The title_offset argument in add_axis are used such that the numbers in the ticks do not overlap with the labels. (i.e Counts, 800 and 1000).
For this section, the cats dataset from the MASS library is used.
##### Scatterplot Example Of cats Data:
require(MASS)
data(cats)
# Preview the data:
head(cats)
## Sex Bwt Hwt
## 1 F 2.0 7.0
## 2 F 2.0 7.4
## 3 F 2.0 9.5
## 4 F 2.1 7.2
## 5 F 2.1 7.3
## 6 F 2.1 7.6
# Rename column names:
colnames(cats) <- c("Sex", "Body_Wt", "Heart_Wt")
For the scatterplot, I want to take a look at body weight versus the heart weights of the cats in the dataset. In ggvis, you need to specify which variables would be x and y respectively. Also, you need to use layer_points()
to obtain the data points.
# ggvis scatterplot:
cats %>% ggvis(x = ~Body_Wt, y = ~ Heart_Wt) %>%
layer_points(fill = ~Sex) %>%
add_axis("x", title = "\n Body Weight", title_offset = 50) %>%
add_axis("y", title = "Heart Weight \n", title_offset = 50) %>%
add_axis("x", orient = "top", ticks = 0,
title = "Body Weight Vs Heart Weight Of Cats \n",
properties = axis_props(axis = list(stroke = "white"),
labels = list(fontSize = 0)))
In layer_points()
, the fill = ~Sex option was inputted to indicated which points were for males and which points were for females. It appears that there are more male cats than female cats in this dataset. (Counts would need to be obtained to check.)
Here is an example of a regression line through the points regardless of gender. From statistics, regression lines are another way of saying lines of best fits. We want to fit a line through the points such that the total of the square distances is minimized.
## Regression Lines:
# ggvis Regression Line (Ordinary Least Squares Regression)
# Line of best fit regardless of gender
# Another option is LOESS (local regression)
cats %>% ggvis(x = ~Body_Wt, y = ~ Heart_Wt) %>%
layer_points(fill = ~Sex) %>%
layer_model_predictions(model = "lm", se = TRUE) %>%
add_axis("x", title = "\n Body Weight", title_offset = 50) %>%
add_axis("y", title = "Heart Weight \n", title_offset = 50) %>%
add_axis("x", orient = "top", ticks = 0,
title = "Body Weight Vs Heart Weight Of Cats \n",
properties = axis_props(axis = list(stroke = "white"),
labels = list(fontSize = 0)))
## Guessing formula = Heart_Wt ~ Body_Wt
You may notice that the code for a linear regression line is not that much different than the one for the scatterplot. We just add layer_model_predictions(model = "lm", se = TRUE)
. In model = “lm”, lm means linear model and se = TRUE gives the confidence intervals around the line. (I think it is a 95% confidence interval.)
Regression Line For Each Gender
# ggvis Regression Line (Ordinary Least Squares Regression):
# Line of best fit for each gender
# Done by adding in group_by(Sex)
cats %>% ggvis(x = ~Body_Wt, y = ~ Heart_Wt) %>%
layer_points(fill = ~Sex) %>%
group_by(Sex) %>%
layer_model_predictions(model = "lm", se = TRUE) %>%
add_axis("x", title = "\n Body Weight", title_offset = 50) %>%
add_axis("y", title = "Heart Weight \n", title_offset = 50) %>%
add_axis("x", orient = "top", ticks = 0,
title = "Body Weight Vs Heart Weight Of Cats \n",
properties = axis_props(axis = list(stroke = "white"),
labels = list(fontSize = 0)))
## Guessing formula = Heart_Wt ~ Body_Wt
Unlike ggplot2, ggvis is capable of creating interactive plots. Interactive plots allows the user to change values of parameters, change colours, and change visual settings.
This section features two examples of interactive plots in ggvis. The faraway library in R is used here.
Example One
This first example looks at a USA wages dataset.
## ggvis Interactive Plots:
# Some Interactive Controls
library(faraway) # Dataset package
data(uswages)
head(uswages)
## wage educ exper race smsa ne mw so we pt
## 6085 771.60 18 18 0 1 1 0 0 0 0
## 23701 617.28 15 20 0 1 0 0 0 1 0
## 16208 957.83 16 9 0 1 0 0 1 0 0
## 2720 617.28 12 24 0 1 1 0 0 0 0
## 9723 902.18 14 12 0 1 0 1 0 0 0
## 22239 299.15 12 33 0 1 0 0 0 1 0
The head()
function allows for previewing the dataset. We have the variables.columns wage, educ, exper, race, smsa, ne, mw, so, we, and pt. In the interactive scatterplot, I want to compare years of experience with the weekly wages.
uswages %>% ggvis(x = ~exper, y = ~wage) %>%
layer_points(fill := input_select(c("red", "blue", "green", "black"),
label = "Colour"),
size := input_slider(10, 50, value = 30, label = "Size")) %>%
add_axis("x", title = "\n Years Of Experience", title_offset = 50) %>%
add_axis("y", title = "Weekly Wage (US) \n", title_offset = 50) %>%
add_axis("x", orient = "top", ticks = 0,
title = "Weekly Wages Of US Males In 1988 \n",
properties = axis_props(axis = list(stroke = "white"),
labels = list(fontSize = 0)))
## Warning: Can't output dynamic/interactive ggvis plots in a knitr document.
## Generating a static (non-dynamic, non-interactive) version of the plot.
The R code is not much different from the code for scatterplots earlier. What is new is adding the input_select and input_slider options to fill and size. The user can choose between the colours red, blue, green and black from the colour list. In the slider for size, the user can adjust the size of the points by dragging the slider left or right. Moving the slider to the right increases the size of the points while moving the slider to the left decreases the size of the points. I have set the initial size value at 30 as indicated by value = 30 in the code.
(The screenshot image above shows the red points with a size of 30.)
Example Two
This second example features the star data from R’s faraway dataset library. I compare star temperature with light intensity.
## Second example:
data(star)
head(star) # Preview data
## index temp light
## 1 1 4.37 5.23
## 2 2 4.56 5.74
## 3 3 4.26 4.93
## 4 4 4.56 5.74
## 5 5 4.30 5.19
## 6 6 4.46 5.46
star %>% ggvis(x = ~temp, y = ~light) %>%
layer_points(fill := input_select(c("red", "darkgreen", "black"),
label = "Colour"),
size := input_slider(10, 80, value = 45, label = "Size"),
opacity := input_slider(0.2, 1, value = 0.6, label = "Opacity")) %>%
add_axis("x", title = "\n Temperature", title_offset = 50) %>%
add_axis("y", title = "Light Intensity \n", title_offset = 50) %>%
add_axis("x", orient = "top", ticks = 0,
title = "Star Temperature Vs Light Intensity Scatterplot \n",
properties = axis_props(axis = list(stroke = "white"),
labels = list(fontSize = 0)))
## Warning: Can't output dynamic/interactive ggvis plots in a knitr document.
## Generating a static (non-dynamic, non-interactive) version of the plot.
In this screenshot, I show the colour of the points as darkgreen, the size of the points as 45 and the opacity being 0.6. (Opacity is the measure of opaqueness or how shading you want in the points.)
To include the opacity feature I add in:
opacity := input_slider(0.2, 1, value = 0.6, label = "Opacity")