This vignette shows how to use the NRLBtools package. It is related to the paper on the No Read Left Behind (NRLB) algorithm by Rastogi et al., PNAS (2018).
The package makes it easy to use the DNA binding specicity models that were built from SELEX in vitro binding data using the NRLB algorithm and as described in the Rastogi et al., PNAS (2018) paper. The available models are:
## Loading required package: ggplot2
## Loading required package: ggseqlogo
## Loading required package: scales
## Welcome to the NRLBtools package
## Protein Platform Info GenomicMaximum
## 1 AbdB SELEX-seq HoxMonomer 1.0000000
## 2 Exd SELEX-seq HoxMonomer 1.0000000
## 3 Lab SELEX-seq HoxMonomer 1.0000000
## 4 Pb SELEX-seq HoxMonomer 1.0000000
## 5 Scr SELEX-seq HoxMonomer 1.0000000
## 6 UbxIa SELEX-seq HoxMonomer 1.0000000
## 7 UbxIVa SELEX-seq HoxMonomer 1.0000000
## 8 ExdAbdA SELEX-seq ExdHoxDimer 0.7926055
## 9 ExdAbdB SELEX-seq ExdHoxDimer 0.8906533
## 10 ExdAntp SELEX-seq ExdHoxDimer 0.9175093
## 11 ExdDfd SELEX-seq ExdHoxDimer 0.8553881
## 12 ExdLab SELEX-seq ExdHoxDimer 1.0000000
## 13 ExdPb SELEX-seq ExdHoxDimer 0.8217055
## 14 ExdScr SELEX-seq ExdHoxDimer 0.9197844
## 15 ExdUbxIa SELEX-seq ExdHoxDimer 0.8262734
## 16 ExdUbxIVa SELEX-seq ExdHoxDimer 0.8429623
## 17 AbdA SELEX-seq HoxMonomer 1.0000000
## 18 Dfd SELEX-seq HoxMonomer 1.0000000
## 19 CEBPb HT-SELEX NucOnly NA
## 20 CEBPd HT-SELEX NucOnly NA
## 21 E2F1 HT-SELEX NucOnly NA
## 22 EBF1 HT-SELEX NucOnly NA
## 23 EGR1 HT-SELEX NucOnly NA
## 24 ELF1 HT-SELEX NucOnly NA
## 25 ELK1 HT-SELEX NucOnly NA
## 26 ETS1 HT-SELEX NucOnly NA
## 27 GABPA HT-SELEX NucOnly NA
## 28 GATA3 HT-SELEX NucOnly NA
## 29 IRF4 HT-SELEX NucOnly NA
## 30 MAFF HT-SELEX NucOnly NA
## 31 MAFK HT-SELEX NucOnly NA
## 32 MEF2A HT-SELEX NucOnly NA
## 33 NFATC1 HT-SELEX NucOnly NA
## 34 NFE2 HT-SELEX NucOnly NA
## 35 NR2C2 HT-SELEX NucOnly NA
## 36 NRF1 HT-SELEX NucOnly NA
## 37 PAX5 HT-SELEX NucOnly NA
## 38 POU2F2 HT-SELEX NucOnly NA
## 39 POU5F1P1 HT-SELEX NucOnly NA
## 40 RFX5 HT-SELEX NucOnly NA
## 41 RUNX3 HT-SELEX NucOnly NA
## 42 SP1 HT-SELEX NucOnly NA
## 43 TFAP2A HT-SELEX NucOnly NA
## 44 TFAP2C HT-SELEX NucOnly NA
## 45 USF1 HT-SELEX NucOnly NA
## 46 YY1 HT-SELEX NucOnly NA
## 47 ZNF143 HT-SELEX NucOnly NA
## 48 MAX SELEX-seq NucOnly NA
## 49 MAX SELEX-seq NucDinuc NA
## 50 MAX HT-SELEX NucOnly NA
## 51 MAX HT-SELEX NucDinuc NA
## 52 MAX SMiLE-seq NucOnly NA
## 53 MAX SMiLE-seq NucDinuc NA
## 54 p53 SELEX-seq WildType NA
## 55 p53 SELEX-seq Δ30 NA
## 56 ATF4 SELEX-seq ATF4Only NA
## 57 CEBPb SELEX-seq CEBPbOnly NA
## 58 ATF4 SELEX-seq ATF4-CEBPb NA
## 59 CEBPb SELEX-seq ATF4-CEBPb NA
## 60 ATF4-CEBPb SELEX-seq ATF4-CEBPb NA
## Genome
## 1 BSgenome.Dmelanogaster.UCSC.dm3
## 2 BSgenome.Dmelanogaster.UCSC.dm3
## 3 BSgenome.Dmelanogaster.UCSC.dm3
## 4 BSgenome.Dmelanogaster.UCSC.dm3
## 5 BSgenome.Dmelanogaster.UCSC.dm3
## 6 BSgenome.Dmelanogaster.UCSC.dm3
## 7 BSgenome.Dmelanogaster.UCSC.dm3
## 8 BSgenome.Dmelanogaster.UCSC.dm3
## 9 BSgenome.Dmelanogaster.UCSC.dm3
## 10 BSgenome.Dmelanogaster.UCSC.dm3
## 11 BSgenome.Dmelanogaster.UCSC.dm3
## 12 BSgenome.Dmelanogaster.UCSC.dm3
## 13 BSgenome.Dmelanogaster.UCSC.dm3
## 14 BSgenome.Dmelanogaster.UCSC.dm3
## 15 BSgenome.Dmelanogaster.UCSC.dm3
## 16 BSgenome.Dmelanogaster.UCSC.dm3
## 17 BSgenome.Dmelanogaster.UCSC.dm3
## 18 BSgenome.Dmelanogaster.UCSC.dm3
## 19 <NA>
## 20 <NA>
## 21 <NA>
## 22 <NA>
## 23 <NA>
## 24 <NA>
## 25 <NA>
## 26 <NA>
## 27 <NA>
## 28 <NA>
## 29 <NA>
## 30 <NA>
## 31 <NA>
## 32 <NA>
## 33 <NA>
## 34 <NA>
## 35 <NA>
## 36 <NA>
## 37 <NA>
## 38 <NA>
## 39 <NA>
## 40 <NA>
## 41 <NA>
## 42 <NA>
## 43 <NA>
## 44 <NA>
## 45 <NA>
## 46 <NA>
## 47 <NA>
## 48 <NA>
## 49 <NA>
## 50 <NA>
## 51 <NA>
## 52 <NA>
## 53 <NA>
## 54 <NA>
## 55 <NA>
## 56 <NA>
## 57 <NA>
## 58 <NA>
## 59 <NA>
## 60 <NA>
It is always a good idea to visualize the model as an energy logo before using it. We can visualize the Exd-UbxIVa model used in the paper by using the logo function and its model number (16 in this case):
We can actually use model.info to provide more information about the NRLB models that come with this package. For example, we can see that model number 16 is a multi-mode model with two modes:
## Mode k RelativeAffinity BestSequence
## 1 1 18 0.000000 GGATGATTTATGACCTTT
## 2 2 10 -7.057776 ATCATTAAAA
## 3 NSB - -9.559147 -
We start by extracting the sequence underlying the genomic region of interest. Here, we will only use the E3N enhancer element (as shown in the paper) from the dm3 Drosophila genome:
An affinity profile along the E3N enhancer can be created as follows:
## ExdUbxIVa scores in enh
## Affinity Position Sequence
## 1 2.100306e-03 84 ACATAATTTGTAGTTTTT
## 2 1.733473e-03 185 GGTTTTTTTATTGCTTTT
## 3 1.149797e-03 32 TGCTGATTTGTTGACCCG
## 4 9.752449e-04 -79 CATGCACATAATTTGTAG
## 5 9.206493e-04 48 CGATAAAAAATGGGACTT
## 6 7.306726e-04 -228 GGAGTTCATAATTCCTGA
## 7 4.345313e-04 258 TTCTTCTTTATTACGCGA
## 8 3.288965e-04 -43 TGACCCGATAAAAAATGG
## 9 5.154725e-05 3 TTTTTTTTTATCCGGTGC
## 10 4.078117e-05 173 GTTCGATTGAGAGGTTTT
The NRLBtools package can also be used to interpret new NRLB models fit to SELEX data. To do so, one first needs to download NRLB from https://github.com/BussemakerLab/NRLB. For the following example, we will be using the precomputed results of running the default fit detailed in the README file. We begin by loading the CSV model results file, which contains all the relevant model information in a machine-readable format:
Next, we can use model info to explore the types of models that are stored in this object:
## Time FitTime FitSteps FncCalls k Flank NS Di MG PT
## 1 2018-06-26 18:27:04 0.836 56 68 8 0 TRUE FALSE FALSE FALSE
## 2 2018-06-26 18:27:03 0.779 58 67 8 0 TRUE FALSE FALSE FALSE
## 3 2018-06-26 18:27:05 0.517 38 42 8 0 TRUE FALSE FALSE FALSE
## 4 2018-06-26 18:27:06 0.620 41 50 9 0 TRUE FALSE FALSE FALSE
## 5 2018-06-26 18:27:05 0.917 65 76 9 0 TRUE FALSE FALSE FALSE
## 6 2018-06-26 18:27:07 0.543 35 44 9 0 TRUE FALSE FALSE FALSE
## 7 2018-06-26 18:27:09 0.654 51 62 10 0 TRUE FALSE FALSE FALSE
## 8 2018-06-26 18:27:08 1.507 97 117 10 0 TRUE FALSE FALSE FALSE
## 9 2018-06-26 18:27:10 0.439 40 44 10 0 TRUE FALSE FALSE FALSE
## HT RO NuSym DiSym Shift NSBind TrainL
## 1 FALSE FALSE 1.1.1.1.-1.-1.-1.-1 <NA> -1 3.792424 55073.60
## 2 FALSE FALSE 1.1.1.1.-1.-1.-1.-1 <NA> 0 4.802052 55026.71
## 3 FALSE FALSE 1.1.1.1.-1.-1.-1.-1 <NA> 1 3.555617 55073.61
## 4 FALSE FALSE 1.1.1.1.*.-1.-1.-1.-1 <NA> -1 2.639300 55143.52
## 5 FALSE FALSE 1.1.1.1.*.-1.-1.-1.-1 <NA> 0 2.718206 55134.65
## 6 FALSE FALSE 1.1.1.1.*.-1.-1.-1.-1 <NA> 1 2.642630 55143.53
## 7 FALSE FALSE 1.1.1.1.1.-1.-1.-1.-1.-1 <NA> -1 3.022300 55086.39
## 8 FALSE FALSE 1.1.1.1.1.-1.-1.-1.-1.-1 <NA> 0 3.810979 55058.55
## 9 FALSE FALSE 1.1.1.1.1.-1.-1.-1.-1.-1 <NA> 1 3.023704 55086.40
## TrainLPerRead AdjTrainLPerRead TestLPerRead AdjTestLPerRead Lambda
## 1 22.02944 14.21260 NA NA 0.0025
## 2 22.01069 14.19385 NA NA 0.0025
## 3 22.02944 14.21261 NA NA 0.0025
## 4 22.05741 14.24057 NA NA 0.0025
## 5 22.05386 14.23702 NA NA 0.0025
## 6 22.05741 14.24058 NA NA 0.0025
## 7 22.03455 14.21772 NA NA 0.0025
## 8 22.02342 14.20658 NA NA 0.0025
## 9 22.03456 14.21772 NA NA 0.0025
## NullVectors PSAM
## 1 4 CACATGTG
## 2 4 CCACGTGG
## 3 4 CACATGTG
## 4 5 ACCACTGGT
## 5 5 CACCAGGTG
## 6 5 ACCACTGGT
## 7 5 ACACATGTGT
## 8 5 AACACGTGTT
## 9 5 ACACATGTGT
We can get more information about a particular model by providing the model’s row index:
## Mode k RelativeAffinity BestSequence
## 1 1 10 0.000000 AACACGTGTT
## 2 NSB - -3.153965 -
We can also visualize our new model:
And use the model to score a sequence: