Protein phosphorylation predictor

Predict serine, threonine, and tyrosine phosphorylation sites from protein sequence. Inspired by PhosphoLingo with PhosphoLingo-compatible FASTA notation: append # to residues you want scored (or positive labels) and @ for negative labels. All analysis runs in your browser.

Choose ST or Y mode, adjust the probability threshold, and export a CSV table. Each site is scored with a 65-residue receptive field and ranked kinase motif classes (pro-directed, PKA/AKT-like, CK2, MAPK for ST; Src, RTK, JAK/STAT, ITAM-like for Y).

Protein sequence or FASTA

Predictor mode Serine / Threonine (ST) Tyrosine (Y)

Probability threshold: 0.55

Run prediction to score S/T or Y sites with a 65-residue receptive field.

How this tool works

Protein phosphorylation adds a phosphate group to serine (S), threonine (T), or tyrosine (Y) residues, often changing protein activity, localisation, or interactions. This predictor scores candidate sites from amino acid sequence alone by testing whether the local sequence context resembles known kinase substrate motifs. It is inspired by PhosphoLingo (protein language models plus convolutional neural networks) but runs entirely in your browser using lightweight motif rules for instant results.

Input format

Paste a one-letter amino acid sequence or FASTA record. Standard letters (ACDEFGHIKLMNPQRSTVWY) are accepted; whitespace and line breaks are ignored.

The tool understands PhosphoLingo FASTA notation. Place a marker immediately after the residue you want to annotate or score:

# — positive training label in PhosphoLingo; here, marks a site to score (or a known phospho site).
@ — negative label in PhosphoLingo; here, also marks a site to score (labels are ignored for prediction).

Example: MKPWETDY#MGPFRKIY@ marks the first Y with # and the second Y with @. If any markers are present in a sequence, only marked S/T or Y sites (depending on mode) are scored. If there are no markers, every S and T (ST mode) or every Y (Y mode) in the sequence is evaluated.

Sequences shorter than 15 amino acids are rejected. Sites within seven residues of either terminus are skipped because insufficient flanking context is available.

ST and Y predictor modes

PhosphoLingo trains separate models for serine/threonine and tyrosine phosphorylation. This tool mirrors that split:

Serine / Threonine (ST) — scores each S and T (or only marked S/T if FASTA markers are used). Serine/threonine kinases often recognise short linear motifs: proline-directed sites (CDK/MAPK family), basic residues upstream (PKA/AKT-like), acidic residues downstream (CK2), or upstream basic patches that support MAPK docking.
Tyrosine (Y) — scores each Y (or only marked Y residues). Tyrosine kinases favour contexts such as acidic residues C-terminal to the phospho-acceptor (Src and receptor tyrosine kinases), hydrophobic residues at −2 (growth-factor receptors), YxxL/I motifs in immune receptors (ITAM-like), or JAK/STAT recruitment patterns.

Run ST and Y analyses separately. A protein can contain both types of site, but the biochemical rules and kinase families differ enough that they are not combined into a single score.

65-residue receptive field

PhosphoLingo uses a default receptive field of 65 residues centred on each candidate site (32 upstream, the acceptor, 32 downstream). This tool uses the same window size. For each site, the results table shows the full 65-character context; dots (.) pad positions beyond the protein termini.

The phospho-acceptor residue is highlighted in red at the centre of that window and matches the Pos and aa columns. Near termini, the real sequence is shorter than 65 residues, so confidence is scaled down by a context quality factor even when the minimum seven-residue flank requirement is met.

Kinase motif scoring

For each candidate site, the tool extracts the 65-residue window and evaluates a set of kinase-class motif profiles. Each profile returns a raw match score based on local sequence patterns; scores are multiplied by a class weight and summed.

ST mode profiles

Pro-directed (CDK/MAPK) — strong bonus when proline follows the acceptor (S/T-P).
Basophilic (PKA/AKT-like) — arginine/lysine-rich segment upstream (−3 to −1).
Acidic (CK2) — aspartate/glutamate-rich segment downstream (+1 to +3).
MAPK docking — upstream basic patch (≥2–3 R/K) with optional proline-directed boost.

Y mode profiles

Src-family — hydrophobic residue at −2, acidic downstream, optional basic upstream.
Growth-factor receptor (EGFR/FGFR-like) — acidic and small-residue patterns around Y.
JAK/STAT recruitment — aliphatic bracketing around tyrosine.
Immune receptor (YxxL/I) — leucine/isoleucine at +2 with small residue at +3.

The profile with the highest weighted contribution is reported as Best kinase class. This is a sequence-motif hypothesis, not proof of which kinase phosphorylates the site in vivo.

Score, probability, and threshold

The raw Score is the sum of weighted motif contributions, adjusted by context quality near termini. That value is mapped to a Probability with a sigmoid function so that stronger motif combinations produce higher values. Sites with probability at or above the threshold slider (default 0.55) are labelled Phospho site; others are labelled Low.

Lower the threshold to flag more candidate sites (higher sensitivity, more false positives). Raise it for a stricter shortlist. The threshold does not change the underlying motif scores—only the binary call.

Reading the results table

Pos — 1-based position of the acceptor in the parsed sequence (after removing #/@ markers).
aa — acceptor residue (S, T, or Y). FASTA markers appear as small pills when present.
Context — 65-aa window with the phospho residue centred and highlighted.
Score / Probability — continuous confidence measures; sort order is by score.
Best kinase class — dominant motif family for that site.
Prediction — binary call from the threshold.

Use Export CSV to download all scored sites for spreadsheets or downstream pipelines. The export includes protein name, position, residue, probability, score, prediction, best kinase class, and context sequence.

Relation to PhosphoLingo

PhosphoLingo learns phosphorylation patterns from large training sets using protein language model embeddings (e.g. ProtTrans T5, ESM) and convolutional layers. That approach captures subtle sequence dependencies motif rules miss. This Bond Lab tool reuses PhosphoLingo’s FASTA format and 65-aa window but replaces the neural model with explicit kinase-motif heuristics so predictions run instantly without Python, a GPU, or uploading sequence data.

For publication-quality or high-stakes prioritisation, validate top hits with the full PhosphoLingo package, phosphoproteomics, kinase assays, or mutagenesis.

Limitations and privacy

Motif scores approximate kinase preference, not cellular context: localisation, kinase expression, priming phosphorylation, and 3D structure are not modelled. Many true sites lack canonical motifs; many motif matches are never phosphorylated in vivo.

All parsing and scoring run locally in your browser. Sequences are not sent to The Bond Lab servers. JavaScript must be enabled.