Sequence aligner

Multi-sequence alignment

This is The Bond Lab multi-sequence aligner. It will align three or more DNA or protein sequences in one shared column layout so you can compare them residue by residue. This tool is different from the pairwise tool that we provide, which is built for only two sequences. If you only have two DNA sequences, please use the pairwise aligner. Once you have a small panel of orthologs, isoforms, amplicons or protein homologues, the multi-sequence tool is the best choice.

WHAT MULTIPLE SEQUENCE ALIGNMENT IS FOR

In the lab and in silico, we rarely care about a single sequence in isolation. We often want to know which positions are conserved, or where insertions, deletions or mutations have occurred. It can also be helpful to know if PCR primers bind at variable conserved regions.

A multiple sequence alignment (MSA) places every sequence in the same coordinate system: column n is meant to represent “the same” biological position across rows, with gap characters (-) inserted where one sequence lacks a residue that others have.

Typical uses include:

  • Comparing PCR products or Sanger reads from several clones or species
  • Inspecting promoter or UTR sequences from different sources.
  • Comparing protein homologues to identfy domains or active sites
  • Preparing a quick conservation snapshot for a presentation figure, supplementary table, or lab notebook
  • Teaching or demonstrating alignment ideas without installing desktop software

The Bond Lab aligner is aimed at quick, interpretable checks in the browser. It is not designed to replace heavy pipelines such as full Clustal Omega jobs on hundreds of long genomes.

HOW TO USE IT

Just paste in your sequence in FASTA format or upload a multi-FASTA file (please use standard >header lines followed by sequence on a new line). Please choose DNA or Protein, click Align, and the page shows:

The results show one aligned row per input sequence with sequence IDs preserved from FASTA headers. Below the sequences a consensus / comparison row is drawn so that you can easily scan for sequence conservation. You can export the alignment to plain text, Rich Text Format (RTF), PNG, JPG, or print/PDF.

All parsing and alignment run in your browser. Sequences are not sent to a server for alignment, so your sensitive or unpublished data is safe as long as you are comfortable using a web page locally.

The aligner has an Input limit of up to 30 sequences, each up to 4000 residues, with a combined total of 80,000 residues across all sequences. These caps keep memory and runtime reasonable on typical laptops.

HOW IT WORKS

The implementation uses a progressive alignment strategy, not a full distance-matrix method like Clustal’s guide tree. The first thing the tool does is to Parse the FASTA headers to become row labels. Sequence letters are normalised (DNA to A/C/G/T; protein to standard amino acids, with selenocysteine U read as cysteine C for scoring). Next, the tool decides the merge order (greedy). The first sequence in your file is the starting anchor. The tool repeatedly picks the not-yet-aligned sequence that is most similar to the current anchor (by a fast pairwise comparison), adds it to the order and uses that sequence as the next anchor. This is a simple nearest-neighbour ordering. It is not guaranteed optimal, but its fast and easy to follow. Then it merges with global pairwise alignment. Each new sequence is aligned to the growing block using the Needleman–Wunsch algorithm (global alignment: gaps allowed at the ends as well as internally). When a new gap is introduced in the existing rows, that gap is propagated to every sequence already in the alignment, so column counts stay matched. The scoring methods is DNA: match +2, mismatch −1, gap −2 and Protein: BLOSUM62 matrix, gap −8

Finally the alignment is displayed. Long alignments are shown in blocks (54 columns wide) with a monospace layout. The middle row marks columns where all non-gap residues agree (|). For protein, average BLOSUM62 score across pairs in a column is shown as : (favourable) or . (unfavourable); for DNA, only exact matches are marked.

This is progressive MSA, not iterative refinement: order depends on your first sequence and greedy choices, so re-ordering FASTA entries can change the result. For publication-grade phylogenetic MSAs, use dedicated tools; for a simple “do these three promoters line up?” or “are these five ORFs in frame with each other?”, this approach is usually enough.

DNA vs PROTEIN MODE

Switch the DNA/PROTEIN mode before you align the sequences. DNA mode is strictly about ACGT. Protein mode uses BLOSUM62 for amino acid substitution scoring and richer consensus symbols. Mixing types in one run is not supported i.e. you cannot algin DNA and protein in the same job.

PRACTICAL TIPS
  • Put your reference or longest/most central sequence first if you want the greedy order to stay close to a biologically meaningful backbone.
  • For two sequences only, use the pairwise aligner instead.
  • If alignment fails with a size error, shorten sequences, reduce count, or split into batches.
  • Use text or RTF export when you need columns to stay aligned in Word; PNG/JPG are better for slides.
Molecule type

Paste or load multi-FASTA (3+ sequences), choose DNA or protein, then click Align.

About the author: This page was written by Dr Mark Bond from The Bond Lab at the University of Bristol. These notes reflect the methodology used in our cardiovascular and cell-signalling research. Questions about these methods: contact us or email mark.bond@bristol.ac.uk ORCID.