Scoring fly enhancers using NRLB models

Chaitanya Rastogi and Harmen J. Bussemaker

2020-02-26

This vignette shows how to use the NRLBtools package. It is related to the paper on the No Read Left Behind (NRLB) algorithm by Rastogi et al., PNAS (2018).

Working with NRLB models for DNA binding specificity

The package makes it easy to use the DNA binding specicity models that were built from SELEX in vitro binding data using the NRLB algorithm and as described in the Rastogi et al., PNAS (2018) paper. The available models are:

library(NRLBtools)
## Loading required package: ggplot2
## Loading required package: ggseqlogo
## Loading required package: scales
## Welcome to the NRLBtools package
NRLBtools::model.info()
##       Protein  Platform        Info GenomicMaximum
## 1        AbdB SELEX-seq  HoxMonomer      1.0000000
## 2         Exd SELEX-seq  HoxMonomer      1.0000000
## 3         Lab SELEX-seq  HoxMonomer      1.0000000
## 4          Pb SELEX-seq  HoxMonomer      1.0000000
## 5         Scr SELEX-seq  HoxMonomer      1.0000000
## 6       UbxIa SELEX-seq  HoxMonomer      1.0000000
## 7      UbxIVa SELEX-seq  HoxMonomer      1.0000000
## 8     ExdAbdA SELEX-seq ExdHoxDimer      0.7926055
## 9     ExdAbdB SELEX-seq ExdHoxDimer      0.8906533
## 10    ExdAntp SELEX-seq ExdHoxDimer      0.9175093
## 11     ExdDfd SELEX-seq ExdHoxDimer      0.8553881
## 12     ExdLab SELEX-seq ExdHoxDimer      1.0000000
## 13      ExdPb SELEX-seq ExdHoxDimer      0.8217055
## 14     ExdScr SELEX-seq ExdHoxDimer      0.9197844
## 15   ExdUbxIa SELEX-seq ExdHoxDimer      0.8262734
## 16  ExdUbxIVa SELEX-seq ExdHoxDimer      0.8429623
## 17       AbdA SELEX-seq  HoxMonomer      1.0000000
## 18        Dfd SELEX-seq  HoxMonomer      1.0000000
## 19      CEBPb  HT-SELEX     NucOnly             NA
## 20      CEBPd  HT-SELEX     NucOnly             NA
## 21       E2F1  HT-SELEX     NucOnly             NA
## 22       EBF1  HT-SELEX     NucOnly             NA
## 23       EGR1  HT-SELEX     NucOnly             NA
## 24       ELF1  HT-SELEX     NucOnly             NA
## 25       ELK1  HT-SELEX     NucOnly             NA
## 26       ETS1  HT-SELEX     NucOnly             NA
## 27      GABPA  HT-SELEX     NucOnly             NA
## 28      GATA3  HT-SELEX     NucOnly             NA
## 29       IRF4  HT-SELEX     NucOnly             NA
## 30       MAFF  HT-SELEX     NucOnly             NA
## 31       MAFK  HT-SELEX     NucOnly             NA
## 32      MEF2A  HT-SELEX     NucOnly             NA
## 33     NFATC1  HT-SELEX     NucOnly             NA
## 34       NFE2  HT-SELEX     NucOnly             NA
## 35      NR2C2  HT-SELEX     NucOnly             NA
## 36       NRF1  HT-SELEX     NucOnly             NA
## 37       PAX5  HT-SELEX     NucOnly             NA
## 38     POU2F2  HT-SELEX     NucOnly             NA
## 39   POU5F1P1  HT-SELEX     NucOnly             NA
## 40       RFX5  HT-SELEX     NucOnly             NA
## 41      RUNX3  HT-SELEX     NucOnly             NA
## 42        SP1  HT-SELEX     NucOnly             NA
## 43     TFAP2A  HT-SELEX     NucOnly             NA
## 44     TFAP2C  HT-SELEX     NucOnly             NA
## 45       USF1  HT-SELEX     NucOnly             NA
## 46        YY1  HT-SELEX     NucOnly             NA
## 47     ZNF143  HT-SELEX     NucOnly             NA
## 48        MAX SELEX-seq     NucOnly             NA
## 49        MAX SELEX-seq    NucDinuc             NA
## 50        MAX  HT-SELEX     NucOnly             NA
## 51        MAX  HT-SELEX    NucDinuc             NA
## 52        MAX SMiLE-seq     NucOnly             NA
## 53        MAX SMiLE-seq    NucDinuc             NA
## 54        p53 SELEX-seq    WildType             NA
## 55        p53 SELEX-seq         Δ30             NA
## 56       ATF4 SELEX-seq    ATF4Only             NA
## 57      CEBPb SELEX-seq   CEBPbOnly             NA
## 58       ATF4 SELEX-seq  ATF4-CEBPb             NA
## 59      CEBPb SELEX-seq  ATF4-CEBPb             NA
## 60 ATF4-CEBPb SELEX-seq  ATF4-CEBPb             NA
##                             Genome
## 1  BSgenome.Dmelanogaster.UCSC.dm3
## 2  BSgenome.Dmelanogaster.UCSC.dm3
## 3  BSgenome.Dmelanogaster.UCSC.dm3
## 4  BSgenome.Dmelanogaster.UCSC.dm3
## 5  BSgenome.Dmelanogaster.UCSC.dm3
## 6  BSgenome.Dmelanogaster.UCSC.dm3
## 7  BSgenome.Dmelanogaster.UCSC.dm3
## 8  BSgenome.Dmelanogaster.UCSC.dm3
## 9  BSgenome.Dmelanogaster.UCSC.dm3
## 10 BSgenome.Dmelanogaster.UCSC.dm3
## 11 BSgenome.Dmelanogaster.UCSC.dm3
## 12 BSgenome.Dmelanogaster.UCSC.dm3
## 13 BSgenome.Dmelanogaster.UCSC.dm3
## 14 BSgenome.Dmelanogaster.UCSC.dm3
## 15 BSgenome.Dmelanogaster.UCSC.dm3
## 16 BSgenome.Dmelanogaster.UCSC.dm3
## 17 BSgenome.Dmelanogaster.UCSC.dm3
## 18 BSgenome.Dmelanogaster.UCSC.dm3
## 19                            <NA>
## 20                            <NA>
## 21                            <NA>
## 22                            <NA>
## 23                            <NA>
## 24                            <NA>
## 25                            <NA>
## 26                            <NA>
## 27                            <NA>
## 28                            <NA>
## 29                            <NA>
## 30                            <NA>
## 31                            <NA>
## 32                            <NA>
## 33                            <NA>
## 34                            <NA>
## 35                            <NA>
## 36                            <NA>
## 37                            <NA>
## 38                            <NA>
## 39                            <NA>
## 40                            <NA>
## 41                            <NA>
## 42                            <NA>
## 43                            <NA>
## 44                            <NA>
## 45                            <NA>
## 46                            <NA>
## 47                            <NA>
## 48                            <NA>
## 49                            <NA>
## 50                            <NA>
## 51                            <NA>
## 52                            <NA>
## 53                            <NA>
## 54                            <NA>
## 55                            <NA>
## 56                            <NA>
## 57                            <NA>
## 58                            <NA>
## 59                            <NA>
## 60                            <NA>

Visualizing affinity profiles for enhancers

We start by extracting the sequence underlying the genomic region of interest. Here, we will only use the E3N enhancer element (as shown in the paper) from the dm3 Drosophila genome:

enh = as.character(BSgenome.Dmelanogaster.UCSC.dm3::BSgenome.Dmelanogaster.UCSC.dm3$chrX[4915195:4915486])

An affinity profile along the E3N enhancer can be created as follows:

NRLBtools::score.seq(sequence = enh, nrlb.model = 16, plot = TRUE, nPeaks = 10, annotate = TRUE)
## ExdUbxIVa scores in enh
##        Affinity Position           Sequence
## 1  2.100306e-03       84 ACATAATTTGTAGTTTTT
## 2  1.733473e-03      185 GGTTTTTTTATTGCTTTT
## 3  1.149797e-03       32 TGCTGATTTGTTGACCCG
## 4  9.752449e-04      -79 CATGCACATAATTTGTAG
## 5  9.206493e-04       48 CGATAAAAAATGGGACTT
## 6  7.306726e-04     -228 GGAGTTCATAATTCCTGA
## 7  4.345313e-04      258 TTCTTCTTTATTACGCGA
## 8  3.288965e-04      -43 TGACCCGATAAAAAATGG
## 9  5.154725e-05        3 TTTTTTTTTATCCGGTGC
## 10 4.078117e-05      173 GTTCGATTGAGAGGTTTT

Working with new models

The NRLBtools package can also be used to interpret new NRLB models fit to SELEX data. To do so, one first needs to download NRLB from https://github.com/BussemakerLab/NRLB. For the following example, we will be using the precomputed results of running the default fit detailed in the README file. We begin by loading the CSV model results file, which contains all the relevant model information in a machine-readable format:

max = NRLBtools::load.models(fileName = system.file("extdata", "MAX-NRLBConfig.csv", package = "NRLBtools"))

Next, we can use model info to explore the types of models that are stored in this object:

NRLBtools::model.info(models = max)
##                  Time FitTime FitSteps FncCalls  k Flank   NS    Di    MG    PT
## 1 2018-06-26 18:27:04   0.836       56       68  8     0 TRUE FALSE FALSE FALSE
## 2 2018-06-26 18:27:03   0.779       58       67  8     0 TRUE FALSE FALSE FALSE
## 3 2018-06-26 18:27:05   0.517       38       42  8     0 TRUE FALSE FALSE FALSE
## 4 2018-06-26 18:27:06   0.620       41       50  9     0 TRUE FALSE FALSE FALSE
## 5 2018-06-26 18:27:05   0.917       65       76  9     0 TRUE FALSE FALSE FALSE
## 6 2018-06-26 18:27:07   0.543       35       44  9     0 TRUE FALSE FALSE FALSE
## 7 2018-06-26 18:27:09   0.654       51       62 10     0 TRUE FALSE FALSE FALSE
## 8 2018-06-26 18:27:08   1.507       97      117 10     0 TRUE FALSE FALSE FALSE
## 9 2018-06-26 18:27:10   0.439       40       44 10     0 TRUE FALSE FALSE FALSE
##      HT    RO                    NuSym DiSym Shift   NSBind   TrainL
## 1 FALSE FALSE      1.1.1.1.-1.-1.-1.-1  <NA>    -1 3.792424 55073.60
## 2 FALSE FALSE      1.1.1.1.-1.-1.-1.-1  <NA>     0 4.802052 55026.71
## 3 FALSE FALSE      1.1.1.1.-1.-1.-1.-1  <NA>     1 3.555617 55073.61
## 4 FALSE FALSE    1.1.1.1.*.-1.-1.-1.-1  <NA>    -1 2.639300 55143.52
## 5 FALSE FALSE    1.1.1.1.*.-1.-1.-1.-1  <NA>     0 2.718206 55134.65
## 6 FALSE FALSE    1.1.1.1.*.-1.-1.-1.-1  <NA>     1 2.642630 55143.53
## 7 FALSE FALSE 1.1.1.1.1.-1.-1.-1.-1.-1  <NA>    -1 3.022300 55086.39
## 8 FALSE FALSE 1.1.1.1.1.-1.-1.-1.-1.-1  <NA>     0 3.810979 55058.55
## 9 FALSE FALSE 1.1.1.1.1.-1.-1.-1.-1.-1  <NA>     1 3.023704 55086.40
##   TrainLPerRead AdjTrainLPerRead TestLPerRead AdjTestLPerRead Lambda
## 1      22.02944         14.21260           NA              NA 0.0025
## 2      22.01069         14.19385           NA              NA 0.0025
## 3      22.02944         14.21261           NA              NA 0.0025
## 4      22.05741         14.24057           NA              NA 0.0025
## 5      22.05386         14.23702           NA              NA 0.0025
## 6      22.05741         14.24058           NA              NA 0.0025
## 7      22.03455         14.21772           NA              NA 0.0025
## 8      22.02342         14.20658           NA              NA 0.0025
## 9      22.03456         14.21772           NA              NA 0.0025
##   NullVectors       PSAM
## 1           4   CACATGTG
## 2           4   CCACGTGG
## 3           4   CACATGTG
## 4           5  ACCACTGGT
## 5           5  CACCAGGTG
## 6           5  ACCACTGGT
## 7           5 ACACATGTGT
## 8           5 AACACGTGTT
## 9           5 ACACATGTGT

We can get more information about a particular model by providing the model’s row index:

NRLBtools::model.info(models = max, index = 8)
##   Mode  k RelativeAffinity BestSequence
## 1    1 10         0.000000   AACACGTGTT
## 2  NSB  -        -3.153965            -

We can also visualize our new model:

NRLBtools::logo(models = max, index = 8)

And use the model to score a sequence:

NRLBtools::score.seq(sequence = enh, models = max, index = 8, plot = T)