Cross-species TF scanner

This is the Bond Lab cross-species TF scanner. You enter a gene symbol, pick two or three species, and we fetch promoter DNA from NCBI, align the regions 5′ of the TSS, scan the reference for TF binding motifs, and flag hits where the underlying sequence still looks similar in the other species. Everything runs in your browser.

WHAT IT’S FOR

If you’ve ever run ConTra (Kreft et al., NAR 2017) and thought “yes, that’s the idea, but I only want promoters for one gene and I don’t want to babysit three file uploads”, this is basically our browser version of that workflow.

You pick a gene symbol, up to three species, and we pull promoter sequence from NCBI (same logic as Promoter Grabber). The sequences are aligned 5′ of the TSS so you’re comparing orthologous promoter regions, not random chunks of upstream DNA. We then scan the reference species for TF binding sites using the same JASPAR/MEME PWM library as the TF binding scanner, and ask a simple follow-up question: does the DNA under that hit look similar in the other species?

That second step is what separates “there’s a Sp1 site on human” from “there’s a Sp1-shaped bit of sequence that’s actually still there in mouse and rat”. We use it when we’re sanity-checking a promoter before cloning a reporter, arguing about whether a ChIP peak is evolutionarily plausible, or just trying to narrow a list of motifs worth testing.

It is not a substitute for checking orthology properly, and it won’t tell you whether your gene symbol resolves to the right transcript in every species. Treat the output as a hypothesis generator, then go and verify.

HOW IT WORKS

1. Fetch promoters

We query NCBI for your symbol in each species, work out transcript TSS positions, and group transcripts whose TSS falls within 10 bp of each other. You’ll usually see one main TSS group per species; pick the one that matches the biology you care about (often the dominant isoform).

2. Align promoters

Default is TBA-lite (threaded blockset alignment, in the spirit of TBA/MULTIZ [Blanchette et al., 2004]). Short version: we find local homology blocks with k-mer seeds, chain them on the reference sequence, and align each block with Needleman–Wunsch. Gappy bits in the alignment view are intentional — we’re not forcing a full global alignment across a promoter that may have inserted repeats or lost homology at the edges. There’s a progressive global option if you want the old behaviour, but for typical mammalian promoters we’d start with TBA-lite.

3. Scan the reference only

PWM scoring runs on the ungapped reference promoter sequence (same stringency slider idea as the TF binding scanner: minimum relative strength %). Only hits above your threshold are carried forward.

4. Conservation test

For each hit, we map the binding interval into the alignment (plus a small ± column window you can set) and compute % identity in species 2 (and species 3 if you’re running three-way). A hit is marked Conserved if every non-reference species clears your min % identity cutoff. Gaps in the alignment count like mismatches in that window — which is harsh but honest.

Motifs are filtered to matrices that exist for your reference taxon in the bundled library. If nothing scans, try another reference species or check that JASPAR actually has matrices for it.

HOW TO USE IT

Typical human / mouse / rat run:

  1. Enter a gene symbol (e.g. TP53, Nfya, Gapdh — whatever NCBI recognises for that symbol in each organism).
  2. Set reference species — this is where PWM scanning happens. Human as reference is the usual choice for our work.
  3. Pick species 2 and optionally species 3. Leave species 3 on “None” for a two-species comparison.
  4. Set bases 5′ of TSS (default 800 bp) and the 3′ boundary vs TSS if you need a sliver of downstream sequence. Total promoter length per species is capped at 1000 bp (5′ + into gene).
  5. Adjust min PWM strength — we often start around 85% and loosen only if the table is empty and we’re fishing.
  6. Set conservation window (± alignment columns around the hit) and min % identity in the other species (default 70% is a reasonable first pass; tighten for stringency).
  7. Click Load promoters & TSS groups. Wait — NCBI is rate-limited (~3 requests/s), so this isn’t instant.
  8. In the TSS panels, choose one group per species. Read the hints: promoter TSS is the most common position in the group, or the average if tied.
  9. Click Align & scan.
  10. Read the alignment first (block boundaries marked; yellow gaps = no homology in that species for that reference interval). Then the conserved TF binding elements table. Copy table as TSV if you want it in Excel.

For a deep dive on one species — exon map, different window, more transcripts — switch to Promoter Grabber and come back here once you’re happy with the TSS story.

READING THE RESULTS
  • Ref pos / sequence / strength — the motif call on the reference promoter, same spirit as the single-species scanner.
  • Sp2 %id / Sp3 %id — identity in the aligned columns under that hit (including your flank window).
  • Conserved — passes your % cutoff in every non-reference species you included.
  • Ref label — which reference transcript label we attached (handy when you go back to NCBI).

A strong PWM score with low cross-species identity is still useful: it might be a gained motif, a false positive, or just a region that didn’t align well. A modest score with high identity is sometimes more interesting for follow-up.

SETTINGS WORTH FIDDLING WITH
  • TBA-lite vs progressive global — TBA-lite for divergent promoters; progressive if you know the sequences are nearly identical.
  • PWM strength — higher = fewer, sharper hits.
  • Conservation ± columns — wider window = more forgiving (and more noise).
  • Min % identity — 70% is a starting point; 80–85% if you only want obvious conservation.
  • Two vs three species — third species is optional; conservation then requires both others to pass.
LIMITATIONS

Read this before the grant reviewer does:

  • Same symbol ≠ guaranteed ortholog. We assume you’ve chosen sensible species and symbols. For publication work, confirm orthology and genome build yourself.
  • NCBI transcript choice matters. Alternative TSS isoforms can completely change the promoter story.
  • PWM scanning is not experimental validation. No motif score proves a factor binds in your cell type.
  • Alignment quality caps biology. If the alignment is mostly gaps in a region, conservation scores there are meaningless.
  • Library coverage — not every TF has a matrix for every species; scanning is on the reference only.
  • Runs in your browser; very long promoters or huge combined sequence lengths hit hard limits (see error messages if you push it).
WHEN TO USE SOMETHING ELSE
QUICK EXAMPLE

Gene: Nfya · Reference: human · Species 2: mouse · Species 3: rat · 800 bp upstream · TBA-lite · 85% PWM · 70% identity · ±5 columns.

Load → pick the dominant TSS group in each species → align & scan → sort the table by Conserved → eyeball the alignment under the top hits → shortlist motifs for EMSA/reporter work or for comparing with your ChIP tracks.

Max 1,000 bp promoter per species (5′ + into gene).

85%

TBA-lite finds local homology blocks via k-mer seeds (like BLASTZ/MULTIZ), projects them onto the reference, and leaves non-homologous promoter sequence unaligned in other species.

Loading motif library…

About the author: This page was written by Dr Mark Bond from The Bond Lab at the University of Bristol. These notes reflect the methodology used in our cardiovascular and cell-signalling research. Questions about these methods: contact us or email mark.bond@bristol.ac.uk ORCID.