www.openprot.org
Open in
urlscan Pro
204.19.23.132
Public Scan
URL:
https://www.openprot.org/
Submission: On January 12 via api from US — Scanned from US
Submission: On January 12 via api from US — Scanned from US
Form analysis
0 forms found in the DOMText Content
* Search * Downloads * Documentation * Contact Explore the extended proteome Search Search DOWNLOAD PROTEIN LIBRARIES Protein libraries in fasta format. Find also bed files as well as tabular data on protein characteristics. GENETIC VARIANTS OPENVAR OpenVar annotates the effect of genetic variants on proteins. Use your VCF formated variant data to see how reference and alternative protein sequences are affected. PROTEOGENOMICS TOOL OPENCUSTOMDB OpenCustomDB produces customized protein libraries taking into account the results of RNA sequencing data. Use your VCF formated variant data to obtain customized protein libraries for mass spectrometry analysis. THE CONCEPT BEHIND OPENPROT CURRENT ANNOTATIONS Current genome annotations hold limiting criteria for Open Reading Frames (ORF) including a minimal ORF length of 100 codons and a single ORF per transcript. Transcripts that do not meet these criteria are labeled non-coding (ncRNAs) and transcripts from unprocessed pseudogenes are also systematically annotated non-coding. OPENPROT ANNOTATIONS OpenProt relaxes traditional annotation criteria by including all ORFs longer than 30 codons and allowing multiple ORFs per transcript as well as those encoded in ncRNAs and transcripts of pseudogenes. OpenProt offers a deeper description and thus a more realistic and biologically relevant perspective of the proteome. OPENPROT DISCOVERIES: RE-INTERPRET ALREADY ACQUIRED DATA The annotation of sequences is central to current research in biomolecular sciences. The addition of unannotated protein sequences in the OpenProt protein library has resulted in many important discoveries in the human proteome through the re-analysis of publicly available data. Many of these have been selected for further investigation: * The FUS gene is dual-coding with both proteins contributing to FUS-mediated toxicity. https://doi.org/10.15252/embr.202050640 * The Protein Coded by a Short Open Reading Frame, Not by the Annotated Coding Sequence, Is the Main Gene Product of the Dual-Coding Gene MIEF1. https://doi.org/10.1074/mcp.RA118.000593 * Potentiation of B2 receptor signaling by AltB2R, a newly identified alternative protein encoded in the human bradykinin B2 receptor gene. https://doi.org/10.1016/j.jbc.2021.100329 * UBB pseudogene 4 encodes functional ubiquitin variants. https://doi.org/10.1038/s41467-020-15090-6 * An overlapping reading frame in the PRNP gene encodes a novel polypeptide distinct from the prion protein. https://doi.org/10.1096/fj.10-173815 * An out-of-frame overlapping reading frame in the ataxin-1 coding sequence encodes a novel ataxin-1 interacting protein. https://doi.org/10.1074/jbc.M113.472654 THE OPENPROT PIPELINE PREDICTION PIPELINE The OpenProt ORF prediction pipeline starts from an exhaustive description of the transcriptome consisting of all RNA transcripts reported by both Ensembl and NCBI RefSeq. A 3-frame in silico translation then yields the ORFeome: any ORF longer than 30 codons in any frame of any transcript. This ORFeome is then filtered to categorize predicted ORFs. The first filter retrieves all known proteins, or reference proteins (all ORF already annotated in Ensembl, NCBI RefSeq, and/or UniProtKB). The second filter is based on the homology of currently not annotated ORFs with the refProt of the same gene (if applicable), and retrieves novel predicted isoforms. The remaining ORFs encode novel proteins, called alternative proteins (altProts). EVIDENCE PIPELINE * Conservation evidence: for every ORF annotated, OpenProt identifies orthologs and paralogs (across the 10 species currently supported by OpenProt). * Translation evidence: Publicly available ribosome profiling datasets are re-analysed using the Price algorithm. This gathers translation evidence for any ORF annotated in OpenProt. * Expression evidence: Publicly available mass spectrometry datasets are re-analysed using multiple search engines. This gathers expression evidence for any ORF annotated in OpenProt. ACKNOWLEDGEMENTS We would like to thank the tools and servers that made it possible to create the new version of OpenProt * MMseqs2: which enabled us to perform large-scale multiple sequence alignments * AlphaFold: For three-dimensional structure prediction of proteins with msa greater than 30 * OmegaFold: For predicting the three-dimensional structure of proteins from their protein sequences * flDPnn: For predicting the intrinsically disordered regions (IDRs) * The Eukaryotic Linear Motif Ressource (ELM): Which enabled us to find short linear motifs in our protein sequences * Deeploc2.0: For predicting the subcellular localization from their protein sequences * Compomics: For their tools SearchGUI, PeptideShaker, and Ms2Rescore used at the core of our pipeline * Interpro: for InterProScan wich enabled us to predict domains in the proteins * GTEx: for their expression profile in different tissues * PRICE: used to generate the RIBO score of our proteins * InParanoid: For the identification of ortholog and paralog groups * jBrowse: For the genome browser in the summary tab. * PRIDE: used to find and download most of our ms studies OLDER VERSIONS OF OPENPROT Openprot 1.6 OpenProt is supported by Publications & How to Cite License & Disclaimer ©Xavier Roucou & Marie Brunet, Université de Sherbrooke