Synopsis

chip - tee - snee

This package is in active development.

Future updates will be focused on decreasing RAM usage and providing direct support for more analysis tasks.

For now I recommend reducing RAM usage by decreasing signal profile resolution, view size, and the number of regions analyzed.

The following should be workable on most people’s computers.

For a better understanding of considerations running and interpreting t-SNE, see this hands-on explanation to get oriented.

Features

Primary Functions

Installation and Loading

From github

chiptsne relies on another package from our group ssvQC. ssvQC must be installed first via github before chiptsne. devtools is required for installing R packages from github.

These installation instructions have been verified for R 4.2.0 and Bioconductor version 3.15.

if(!require("devtools")) install.packages("devtools")
if(!require("ssvQC")) devtools::install_github("FrietzeLabUVM/ssvQC")
if(!require("chiptsne")) devtools::install_github("FrietzeLabUVM/chiptsne")

if(!require("magrittr")) install.packages("magrittr")
if(!require("tidyverse")) install.packages("tidyverse")

Load the library

suppressPackageStartupMessages({
    library(chiptsne)
    library(magrittr)
    library(tidyverse)    
})

Set options

theme_set(theme_classic())

Running t-SNE

Set parameters

These parameters will determine how multiple functions behave.

“Signal” means a read pileup profile from a a bam file or score from a bigwig file.

options("mc.cores" = 20)
color_mapping = c("H3K4me3" = "forestgreen",
                  "H3K27me3" = "firebrick1",
                  "input" = "gray")

bam_files = system.file("extdata", package = "chiptsne") %>%
    dir(pattern = "bam$", full.names = TRUE)
bam_cfg_df = data.frame(file = bam_files)
bam_cfg_df = bam_cfg_df %>% 
    mutate(name = sub("_pooled.+", "", basename(file))) %>%
    separate(name, sep = "_", into = c("cell", "mark"), remove = FALSE)
bam_cfg_df = arrange(bam_cfg_df, mark, cell)
bam_cfg_df$name = factor(bam_cfg_df$name, levels = bam_cfg_df$name)
bam_cfg_df$name_split = bam_cfg_df$name
levels(bam_cfg_df$name_split) = gsub("_", "\n", bam_cfg_df$name_split)
cfg_signal = ConfigSignal(bam_cfg_df, 
                          run_by = "All", 
                          color_by = "mark", 
                          color_mapping = color_mapping, 
                          window_size = 50, 
                          view_size = 1.6e3, 
                          center_signal_at_max = TRUE, 
                          flip_signal_mode = "high_on_left", 
                          cluster_value = "raw", 
                          sort_value = "raw", 
                          plot_value = "raw")
## read_mode has been guessed as bam_SE
## Currently ssvQC cannot guess whether a bam file is SE or PE.  Please manually specify bam_PE if appropriate.
# because this example relies on small subsets of larger bam files we use
# cluster_value and plot_value of "raw".  This is not recommended for real data.
# Either "RPM" or "linearQuartile" are recommended.

“Features” are intervals provided in one of several ways. In this example we load a bed file to a GRanges and use that. Other alternatives would be loading peak call files and merging them as appropriate or even using TSS sites of interest.

query_file = system.file("extdata/query_gr.bed", package = "chiptsne") 
query_gr = rtracklayer::import.bed(query_file)
debug(ssvQC:::QcConfigFeatures.GRanges)
cfg_features = ConfigFeatures.GRanges(query_gr = query_gr, n_peaks = length(query_gr))

Run t-SNE

With valid configuration object for signal and features we can call ChIPtSNE.

ct = ChIPtSNE(
    features_config = cfg_features, 
    signal_config = cfg_signal, 
    n_glyphs_x = 3, 
    n_glyphs_y = 3, 
    n_heatmap_pixels_x = 5, 
    n_heatmap_pixels_y = 5
)
ct = ChIPtSNE.runTSNE(ct)
## Warning in run_tsne(object@signal_data, sts_parent@perplexity, y_var =
## ssvQC:::val2var[sts_parent@signal_config@cluster_value], : Reducing perplexity to 7 to accommodate data of 30
## rows.
## making plot...
## Decreasing nearest-neighbors to 6.  Original value was too high for dataset.

Auto-generate plots

chiptsne uses the ssvQC framework for analyzing sequencing data. As a result some plots are generated by default. These can be a useful starting point but the various ctPlot* functions are more flexible and powerful.

A basic heatmap is still a worth looking at.

ct$plots$signal$heatmaps$query_features$All_signal 

“Regional glyphs” summarize profile patterns across the t-SNE space by splitting the space into bins and average profile in each bin.

ct$plots$TSNE$regional_glyphs$query_features$All_signal + coord_fixed()
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

“Regional heatmaps” reduce each profile to a single value (max profile value for instance). These reduced profiles are then averaged in binned regions across the t-SNE space. In contrast to “regional glyphs” this method can represent spatial patterns at much higher resolution but nuances of profile shape are lost.

ct$plots$TSNE$regional_heatmap$query_features$All_signal + coord_fixed()

ctPlot* functions

Mapping the signal for each region to the t-SNE is the most obvious way to begin thinking about this type of data.

Here we see the dominant trend is high H3K4me3 at one end and high H3K27me3 at the other. None of these regions have high input signal so it is unlikely we need to worry about the presence of artifact peaks.

ctPlotPoints(ct) + 
    scale_color_viridis_c(limits = c(0, 20), na.value = "yellow") +
    coord_fixed()
## Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the existing
## scale.

By tweaking the ChIPtSNE object we can restructure our point plot to separate the two cell lines. In our previous plot the values for the two cell lines were combined by averaging them.

ct$signal_config$run_by = "cell"
## Updating to_run to all items in 'cell'.
ctPlotPoints(ct) + 
    scale_color_viridis_c(limits = c(0, 20), na.value = "yellow") +
    coord_fixed()
## Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the existing
## scale.

This is the same type of plot as the “regional heatmap” but we can tweak paramters and do different facetting for a more useful image.

ctPlotBinAggregates(ct, xbins = 4, min_size = 2) +
    coord_fixed()
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

This is the same type of plot as the “regional profiles”, but again we have much more control.

ctPlotSummaryProfiles(ct)