www.openprot.org Open in urlscan Pro
204.19.23.132  Public Scan

URL: https://www.openprot.org/
Submission: On January 12 via api from US — Scanned from US

Form analysis 0 forms found in the DOM

Text Content

 * Search
 * Downloads
 * Documentation
 * Contact

Explore the extended proteome
Search

Search


DOWNLOAD


PROTEIN LIBRARIES

Protein libraries in fasta format. Find also bed files as well as tabular data
on protein characteristics.


GENETIC VARIANTS


OPENVAR

OpenVar annotates the effect of genetic variants on proteins. Use your VCF
formated variant data to see how reference and alternative protein sequences are
affected.


PROTEOGENOMICS TOOL


OPENCUSTOMDB

OpenCustomDB produces customized protein libraries taking into account the
results of RNA sequencing data. Use your VCF formated variant data to obtain
customized protein libraries for mass spectrometry analysis.


THE CONCEPT BEHIND OPENPROT


CURRENT ANNOTATIONS

Current genome annotations hold limiting criteria for Open Reading Frames (ORF)
including a minimal ORF length of 100 codons and a single ORF per transcript.
Transcripts that do not meet these criteria are labeled non-coding (ncRNAs) and
transcripts from unprocessed pseudogenes are also systematically annotated
non-coding.



OPENPROT ANNOTATIONS

OpenProt relaxes traditional annotation criteria by including all ORFs longer
than 30 codons and allowing multiple ORFs per transcript as well as those
encoded in ncRNAs and transcripts of pseudogenes. OpenProt offers a deeper
description and thus a more realistic and biologically relevant perspective of
the proteome.



OPENPROT DISCOVERIES: RE-INTERPRET ALREADY ACQUIRED DATA

The annotation of sequences is central to current research in biomolecular
sciences. The addition of unannotated protein sequences in the OpenProt protein
library has resulted in many important discoveries in the human proteome through
the re-analysis of publicly available data. Many of these have been selected for
further investigation:

 * The FUS gene is dual-coding with both proteins contributing to FUS-mediated
   toxicity. https://doi.org/10.15252/embr.202050640

 * The Protein Coded by a Short Open Reading Frame, Not by the Annotated Coding
   Sequence, Is the Main Gene Product of the Dual-Coding Gene MIEF1.
   https://doi.org/10.1074/mcp.RA118.000593

 * Potentiation of B2 receptor signaling by AltB2R, a newly identified
   alternative protein encoded in the human bradykinin B2 receptor gene.
   https://doi.org/10.1016/j.jbc.2021.100329

 * UBB pseudogene 4 encodes functional ubiquitin variants.
   https://doi.org/10.1038/s41467-020-15090-6

 * An overlapping reading frame in the PRNP gene encodes a novel polypeptide
   distinct from the prion protein. https://doi.org/10.1096/fj.10-173815

 * An out-of-frame overlapping reading frame in the ataxin-1 coding sequence
   encodes a novel ataxin-1 interacting protein.
   https://doi.org/10.1074/jbc.M113.472654


THE OPENPROT PIPELINE


PREDICTION PIPELINE

The OpenProt ORF prediction pipeline starts from an exhaustive description of
the transcriptome consisting of all RNA transcripts reported by both Ensembl and
NCBI RefSeq. A 3-frame in silico translation then yields the ORFeome: any ORF
longer than 30 codons in any frame of any transcript. This ORFeome is then
filtered to categorize predicted ORFs. The first filter retrieves all known
proteins, or reference proteins (all ORF already annotated in Ensembl, NCBI
RefSeq, and/or UniProtKB). The second filter is based on the homology of
currently not annotated ORFs with the refProt of the same gene (if applicable),
and retrieves novel predicted isoforms. The remaining ORFs encode novel
proteins, called alternative proteins (altProts).



EVIDENCE PIPELINE

 * Conservation evidence: for every ORF annotated, OpenProt identifies orthologs
   and paralogs (across the 10 species currently supported by OpenProt).
 * Translation evidence: Publicly available ribosome profiling datasets are
   re-analysed using the Price algorithm. This gathers translation evidence for
   any ORF annotated in OpenProt.
 * Expression evidence: Publicly available mass spectrometry datasets are
   re-analysed using multiple search engines. This gathers expression evidence
   for any ORF annotated in OpenProt.


ACKNOWLEDGEMENTS

We would like to thank the tools and servers that made it possible to create the
new version of OpenProt
 * MMseqs2: which enabled us to perform large-scale multiple sequence alignments
 * AlphaFold: For three-dimensional structure prediction of proteins with msa
   greater than 30
 * OmegaFold: For predicting the three-dimensional structure of proteins from
   their protein sequences
 * flDPnn: For predicting the intrinsically disordered regions (IDRs)
 * The Eukaryotic Linear Motif Ressource (ELM): Which enabled us to find short
   linear motifs in our protein sequences
 * Deeploc2.0: For predicting the subcellular localization from their protein
   sequences

 * Compomics: For their tools SearchGUI, PeptideShaker, and Ms2Rescore used at
   the core of our pipeline
 * Interpro: for InterProScan wich enabled us to predict domains in the proteins
 * GTEx: for their expression profile in different tissues
 * PRICE: used to generate the RIBO score of our proteins
 * InParanoid: For the identification of ortholog and paralog groups
 * jBrowse: For the genome browser in the summary tab.
 * PRIDE: used to find and download most of our ms studies


OLDER VERSIONS OF OPENPROT

Openprot 1.6
OpenProt is supported by

Publications & How to Cite
License & Disclaimer
©Xavier Roucou & Marie Brunet, Université de Sherbrooke