This guide shows how to calculate differential construct abundance
and visualize the results. To calculate changes in barcode abundance, we
use the DESeq2
package, commonly used in differential
expression analysis. Finally, we provide a utility function to visualize
the results as an MA-plot (base abundance on the x axis, log2 fold
change on the y axis).
Preparation
In order to perform this analysis, we need the
SummarizedExperiment
(dset
) object from the
Quality Control results, and we need to know which comparisons
we want to perform.
From the Quality Control step, we know that we have two
repeats each of Mock
and Sample
for a
patient-specific (pat1
) and a common library
(common
). These two repeats are important, as we need to
estimate within-condition variability as well as between-condition
variability.
However, we need to make sure to perform the screen analysis only on relevant samples, otherwise the model may mis-estimate the variability:
dset = dset[,grepl("pat1", dset$patient)]
colData(dset)
#> DataFrame with 4 rows and 10 columns
#> sample_id patient rep origin barcode total_reads
#> <character> <factor> <numeric> <character> <character> <numeric>
#> mock1 mock1 pat1+common 1 Mock TGAGTCC 224687
#> mock2 mock2 pat1+common 2 Mock CAAGATG 223935
#> screen1 screen1 pat1+common 1 Sample AACCGAC 454355
#> screen2 screen2 pat1+common 2 Sample AGAATCG 450164
#> mapped_reads smp short label
#> <numeric> <character> <character> <character>
#> mock1 224687 Mock-1 pat1+common Mock-1 pat1+common Mock-1 (..
#> mock2 223935 Mock-2 pat1+common Mock-2 pat1+common Mock-2 (..
#> screen1 454355 Sample-1 pat1+common Sample-1 pat1+common Sample-1..
#> screen2 450164 Sample-2 pat1+common Sample-2 pat1+common Sample-2..
Calculating differential abundance
When we want to perform a comparison between two conditions, we refer
in this comparison to the origin
column of the sample
sheet. In our case, we have the origin Mock
and
Sample
, which describe an experiment of mock-transfected
and co-cultured cells and cells transfected with the actualy construct
library, respectively.
Hence, our comparison here is that we want to see the changes of
Sample
over the Mock
condition, which we
indicate as a character vector of c(sample, reference)
or a
list thereof.
In case of a single comparison (character vector), a
data.frame
will be returned. If there are more comparisons
supplied in a list of character vectors, the result will be a list of
data.frame
s:
res = screen_calc(dset, list(c("Sample", "Mock")))
#> converting counts to integer mode
#> Warning in DESeq2::DESeqDataSet(dset, ~rep + origin): some variables in design
#> formula are characters, converting to factors
#> the design formula contains one or more numeric variables with integer values,
#> specifying a model with increasing fold change for higher values.
#> did you mean for this to be a factor? if so, first convert
#> this variable to a factor using the factor() function
#> using pre-existing size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> Joining with `by = join_by(barcode)`
The result will look like the following:
res[[1]]
#> # A tibble: 212 × 19
#> barcode baseMean log2FoldChange lfcSE stat pvalue padj bc_type var_id
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <chr>
#> 1 AACAAC… 884. -3.27 0.300 -10.9 1.63e-27 3.45e-25 pat1 chr1:…
#> 2 AACACC… 1015. -2.09 0.352 -5.94 2.80e- 9 2.97e- 7 pat1 chr1:…
#> 3 AACAAC… 1766. -0.708 0.239 -2.96 3.09e- 3 2.19e- 1 pat1 chr7:…
#> 4 AACGCG… 2802. 0.565 0.236 2.40 1.66e- 2 8.13e- 1 common chr18…
#> 5 AACAAC… 961. -0.605 0.261 -2.31 2.07e- 2 8.13e- 1 pat1 chr7:…
#> 6 AACAAC… 693. -0.633 0.278 -2.27 2.30e- 2 8.13e- 1 pat1 chr2:…
#> 7 AACGCT… 1838. -0.534 0.244 -2.19 2.87e- 2 8.59e- 1 common CD74-…
#> 8 AACAGT… 1463. -0.527 0.246 -2.14 3.24e- 2 8.59e- 1 pat1 RET--…
#> 9 AACGCG… 1638. -0.472 0.242 -1.95 5.09e- 2 9.27e- 1 common chr18…
#> 10 AACGAC… 613. -0.540 0.285 -1.89 5.86e- 2 9.27e- 1 common chr7:…
#> # ℹ 202 more rows
#> # ℹ 10 more variables: mut_id <chr>, pep_id <chr>, pep_type <chr>,
#> # gene_name <chr>, gene_id <chr>, tx_id <chr>, tiled <chr>, n_tiles <dbl>,
#> # nt <dbl>, peptide <chr>
Plotting the screen results
We can plot the result of this differential abundance analysis using
the plot_screen
function. The only argument we need supply
is a results table, but we can fine-tune the plot with the following
additional arguments:
-
sample
– which library (or libraries) to plot (default: all) -
links
– whether to draw arrows between ref and significant alt peptides -
labs
– whether to label constructs in less dense areas of the plot -
cap_fc
– a maximum (and negative minimum) fold change value to bound data points to
plot_screen(res$`Sample vs Mock`)
#> Joining with `by = join_by(bc_type, mut_id)`
#> Registered S3 methods overwritten by 'ggpp': method from
#> heightDetails.titleGrob ggplot2 widthDetails.titleGrob ggplot2
#> Loading required package: ggplot2
As with the Quality Control plots already, we can also
create and interactive plot. This will not display links between
ref
and alt
peptides, but instead highlight a
contruct group (by mut_id
) when interacting with a data
point.