In the last lesson we introduced ggplot2, a powerful package for the creation of publication-quality plots. The ggplot2 package cannot produce interactive plots, though. This means that the reader / user cannot modify the plot for better visualizing the data.
At least two different solutions are availabler in R for prodicing interactive plots: plotly and shiny.
We introduce now the plotly package (https://plot.ly). This package was realized and is maintained by the homonym company, which also provides similar libraries for JavaScript, Python and Matlab. The key point regarding plotly is the possibility of realizing plots that can be manipulated real-time by the user. Let’s start with the volcano plots we saw in the previous lesson.
#loading the packages
library(ggplot2)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
#remaking the volcano plot
#simulate data
numGenes <- 1000;
numSamples <- 20;
outcome <- c(rep(1, 10), rep(0, 10));
dataset <- matrix(rnorm(numGenes * numSamples),
numGenes, numSamples);
dataset <- dataset + abs(min(dataset)) + 1;
rownames(dataset) <- paste('S', 1:numGenes, sep = '')
colnames(dataset) <- paste('G', 1:numSamples, sep = '')
#significance
pvalues <- apply(dataset, 1,
function(x){t.test(x[outcome == 1], x[outcome == 0])$p.value})
logPvalues <- -1 * log10(pvalues)
logFoldChanges <- apply(dataset, 1,
function(x){log2(mean(x[outcome == 1]) / mean(x[outcome == 0]))})
significant <- logPvalues >= 2;
significant[significant == TRUE] <- 'significant'
significant[significant == 'FALSE'] <- 'non-significant'
#gene characteristics
pathway <- c(rep('Wnt signaling', numGenes/2), rep('MAPK signaling', numGenes/2));
transcriptLength <- 10 * runif(numGenes);
#storing all info in a dataset
geneNames <- rownames(dataset)
toPlot <- data.frame(geneNames,
logFoldChanges,
logPvalues,
significant,
pathway,
transcriptLength)
#actual volcano plot
p <- ggplot() +
geom_point(data = toPlot,
mapping = aes(x = logFoldChanges,
y = logPvalues,
color = significant,
shape = pathway,
size = transcriptLength),
alpha = 0.7) +
ggtitle('Volcano plot') +
xlab('Log2 Fold Changes') +
ylab('Log10 p-values') +
scale_color_manual(values = c('blue', 'red')) +
scale_size_continuous(name = 'transcript length') +
theme_bw()
plot(p)
Time now to increase the functionalities of our volcano plot. Let’s “plotlyfy” it!
ggplotly(p)
We can also replicate the same plot with plotly commands. The syntax is quite similar. A complete reference guide on plotly syntax can be found at http://plot.ly
plot_ly(toPlot, x = ~logFoldChanges,
y = ~logPvalues,
size = ~transcriptLength,
symbol = ~pathway,
symbols = c('circle','triangle-up'),
color = ~significant,
colors = c('#0C4B8E', '#BF382A')) %>%
add_markers() %>%
layout(scene = list(xaxis = list(title = 'log fold-change'),
yaxis = list(title = 'log p-values')))
Creating a 3D scatterplot with fixed size and transcript length as third dimension
Interactive plots are useful, however they need to be embedded in a website for being largely accessible. Furthermore, plotly allows only some types of interactions, while in principle the user may want to use the plot for investigating the data in several ways. For example, the user may want to set the significant threshold to a different value and visualize genes significance accordingly. This actually requires to re-perform some computations and re-generate the picture. Alternatively, the usr may require more computationally-demanding changes, as for example computing p-values with a different, possibly permutation-based, statistical test.
In synthesis, we may want to be able to:
The following image gives an idea:
Client / Server architecture
Shiny gives the possibility of creating such an architecture directly from an R script. This means that no information about HTML, CSS, JavaScript, PHP is required. Let’ see how a simple shiny application looks like.
library(shiny)
ui <- fluidPage(
sliderInput(inputId = 'multiplier', label = 'Simple scale parameter',
min = 0, max = 100, value = 50),
plotOutput(outputId = 'p')
)
server <- function(input, output){
output$p <- renderPlot({
hist(rnorm(1000) * input$multiplier, xlab = 'Values', main = 'Example plot')
});
}
shinyApp(ui = ui, server = server,
options = list(port = 8080, launch.browser = TRUE))
There are two main components in a shiny application:
ui: this is the component defining the user interface. The interface is usually composed of one or more input components (sliderInput) and one or more output components (plotOutput). Each component is added vertically to the user interface, unless specified otherwise.
server: this is the function defining the server side computations. This function is never run on the browser. The argument of the function are fixed, and correspond to the input provided by the user and the corresponding output computed on the server side.
This simple example is not particularly interesting. Let’s see how we can modify the volcano plot on the fly by using shiny
rm(list = ls())
library(shiny)
library(ggplot2)
#encapsulating the plot in a function
plotFunction <- function(input){
set.seed(12345)
numGenes <- 1000;
numSamples <- 20;
outcome <- c(rep(1, 10), rep(0, 10));
dataset <- matrix(rnorm(numGenes * numSamples),
numGenes, numSamples);
dataset <- dataset + abs(min(dataset)) + 1;
rownames(dataset) <- paste('S', 1:numGenes, sep = '')
colnames(dataset) <- paste('G', 1:numSamples, sep = '')
#significance
pvalues <- apply(dataset, 1,
function(x){t.test(x[outcome == 1], x[outcome == 0])$p.value})
logPvalues <- -1 * log10(pvalues)
logFoldChanges <- apply(dataset, 1,
function(x){log2(mean(x[outcome == 1]) / mean(x[outcome == 0]))})
significant <- logPvalues >= -1 * log10(input$threshold)
significant[significant == TRUE] <- 'significant'
significant[significant == 'FALSE'] <- 'non-significant'
#gene characteristics
pathway <- c(rep('Wnt signaling', numGenes/2), rep('MAPK signaling', numGenes/2));
transcriptLength <- 10 * runif(numGenes);
#storing all info in a dataset
geneNames <- rownames(dataset)
toPlot <- data.frame(geneNames,
logFoldChanges,
logPvalues,
significant,
pathway,
transcriptLength)
#volcano plot
p <- ggplot() +
geom_point(data = toPlot,
mapping = aes(x = logFoldChanges,
y = logPvalues,
color = significant,
shape = pathway,
size = transcriptLength),
alpha = 0.7) +
ggtitle('Volcano plot') +
xlab('Log2 Fold Changes') +
ylab('Log10 p-values') +
scale_color_manual(values = c('blue', 'red')) +
scale_size_continuous(name = 'transcript length') +
theme(legend.position = ifelse(input$legendPresent == 'yes', 'right', 'none'))
p
}
#user interface
ui <- fluidPage(
sliderInput(inputId = 'threshold', label = 'FDR threshold',
min = 0, max = 1, value = 0.05),
radioButtons(inputId = 'legendPresent', label = 'Plot legend',
choices = list('yes', 'no'), inline = TRUE),
plotOutput(outputId = 'p')
)
#server side function
server <- function(input, output){
#making the plot reactive
p <- reactive(plotFunction(input));
#preparing the plot
output$p <- renderPlot(p(), width = 400);
}
#launching the app
shinyApp(ui = ui, server = server,
options = list(port = 8080, launch.browser = TRUE))
The key aspect of the previous code is the reactive function in the server side. Since we want to modify the volcano plot on the fly, we must indicate that this must be generate again each time the user provides a new input. The reactive function ensures this behavior.