string-db.org
Open in
urlscan Pro
2606:4700:3108::ac42:2b73
Public Scan
URL:
https://string-db.org/cgi/info
Submission: On October 12 via api from BE — Scanned from DE
Submission: On October 12 via api from BE — Scanned from DE
Form analysis
0 forms found in the DOMText Content
* Version: * 11.5 * Login * Register * Survey STRINGSTRING * Search * Download * Help * My Data INFO Scores Use scenarios FAQs Cookies/Privacy Interaction Scores THE BASIC PRINCIPLE In STRING, each protein-protein interaction is annotated with one or more 'scores'. Importantly, these scores do not indicate the strength or the specificity of the interaction. Instead, they are indicators of confidence, i.e. how likely STRING judges an interaction to be true, given the available evidence. All scores rank from 0 to 1, with 1 being the highest possible confidence. A score of 0.5 would indicate that roughly every second interaction might be erroneous (i.e., a false positive). TRANSFER SCORES For most types of evidence, there are two types of scores: the 'normal' score, and the 'transferred' score. The latter is computed from data that is not originally observed in the organism of interest, but instead in some other organism and then transferred via homology/orthology. All potential source organisms are searched for evidence, but the actual transfers to the receiving organism are made non-redundant (according to 'clades' of closely related organisms in the tree of life). A TYPICAL ORGANISM As an example, the model organism 'Escherichia coli K12 MG1655' is shown below — indicating the number of interactions per score type, at a confidence of 'medium' or better (score >= 0.400); gene neighborhood, normal: 7851 interactionsgene neighborhood, transferred: 11177 interactionsgene fusion: 514 interactionsgene cooccurrence: 35497 interactionsgene coexpression, normal: 12376 interactionsgene coexpression, transferred: 3154 interactionsexperiments/biochemistry, normal: 5301 interactionsexperiments/biochemistry, transferred: 4113 interactionsannotated pathways, normal: 6726 interactionsannotated pathways, transferred: 1727 interactionstextmining, normal: 27445 interactionstextmining, transferred: 7119 interactionscombined-score, total: 210914 interactions Scientific Use Scenarios Below is a selection of published examples of large-scale scientific use of STRING network data. Apart from the ad-hoc use of the website (in order to learn about individual proteins or to find out about functional enrichments), the large-scale use cases below signify another important benefit of STRING: the availability of unified, scored, genome-wide interaction data, for a number of organisms. 1.) Researching protein-networks in the context of early immune system establishment In this study, the impact of post-natal colonization of the body with microbes is researched by transiently colonizing pregnant female mice. It is shown that the maternal microbiota shapes the immune system of the offspring. After performing RNA-seq of whole small intestinal mucosal RNA from neonates at day 14 (control and gestation-only colonized dams) and identification of differentially expressed genes, the authors use STRING to deduce involved protein networks. (Ganal-vonArburg, SC et al. Science. 2017: "The maternal microbiota drives early postnatal innate immune development.")PubMed 2.) Highly connected proteins have stable steady-state distribution of gene expression This study develops a thermodynamic-like theoretical framework to analyze protein networks and gene expression patterns. Using this methodology, they find a dependence of the steady-state stability of transcript levels and the connectivity in STRING networks. The findings agree with the observation that essential genes have a low variability of expression and emphasize the role of stochasticity and robustness in the control of expression. The authors suggest that genes can be grouped into two categories, high and low expression, which are stable, versus adaptable to biological stimuli. (Kravchenko-Balasha, N et al. PNAS. 2012: "On a fundamental structure of gene networks in living cells")PubMed 3.) Searching for candidate genes involved in the immune response to gluten Celiac disease (CD) is an auto-immune condition which may cause gastrointestinal and nutritional problems. The authors of this review article use STRING to look for interactions of genes that are known to be involved in CD. This results in 40 candidate genes that are likely to be involved in the progression of the disease. Since the levels the marker genes of CD is heterogeneous, several different genes may be the cause of the condition. (Abadie, V et al. Annu. Rev. Immunol. 2011: "Integration of genetic and immunological insights into a model of celiac disease pathogenesis")PubMed 4.) Identifying candidates for unknown enzyme in a pathway Bacillithiol (BSH) is a low-molecular-weight thiol in bacteria (Bacilli family). It is synthesized by a not fully characterized pathway. The authors used STRING to identify candidates for an unknown enzyme using known components of the pathway as input query. The co-occurence and the fusion channel revealed a potential candidate for the enzyme. Experiments could then confirm that the functionality indeed was essential for the pathway. (Gaballa, A et al. PNAS. 2010: "Biosynthesis and functions of bacillithiol, a major low-molecular-weight thiol in Bacilli")PubMed 5.) Using STRING to narrow the search space for two-locus epistatis The aim of this study was to search for combinations of pairs of SNPs that cause disease (two-locus epistatis). Testing all combinations is computationally expensive. By limiting the number of search possibilities to known protein-protein interactions from STRING the search space was drastically reduced. Furthermore, by only accessing likely candidates of protein interactions, low significance of interaction due to correcting for multiple comparisons is alleviated. (Emily, M et al. Eur J Hum Genet. 2009: "Using biological networks to search for interacting loci in genome-wide association studies")PubMed 6.) Using STRING to show network connectivity Lysine acetylation is a post-translational modification that regulate gene expression. This study show that lysine acetylation preferentially targets large macro-molecular complexes and has a broad regulatory scope comparable with other post-translational modifications. By using STRING the authors show that the acetylome has significantly higher network connectivity than random: namely roughly six interactions per node, whereas the random expectation would be less than three. (Choudhary, C et al. Science. 2009: "Lysine acetylation targets protein complexes and co-regulates major cellular functions")PubMed 7.) STRING as a general purpose database In this study the evolutionary history of CDC25 homology domain was investigated. The STRING database was used to acquire the sequence information for a number of genomes, showing how STRING can be used as a as general database. This is particularly useful if the user downloads the entire dataset by signing the academic license agreement. (van Dam, TJ et al. Cell Signal. 2009: "Phylogeny of the CDC25 homology domain reveals rapid differentiation of Ras pathways between early animals and fungi")PubMed 8.) STRING to guide experiments This study is a characterization of the Rod-derived Cone Viability Factor (RdCVFL) signaling pathway involved in neuronal cell death mediated by oxidative stress. STRING was used to identify 90 proteins interacting with RdCVFL. These were examined for interactions using a cell-based assay. The authors show that RdCVFL inhibits the phosphorylation of the microtubule binding protein Tau. In vitro, RdCVFL protects Tau from oxidative damage, which is implicated in retinal degeneration. (Fridlich, R et al. Mol Cell Proteomics. 2009: "The thioredoxin-like protein rod-derived cone viability factor (RdCVFL) interacts with TAU and inhibits its phosphorylation in the retina")PubMed 9.) Prioritizing functional assignments in RNAi screens using interaction network data RNA interference (RNAi) screening can be used to infer the functionally of genes in an organism. The results from such screens often contain errors. Wang et al. suggest a method based on a scoring function for integrating STRING network information to indicate false positives and false negatives associated with RNAi screens. Thereby, suggesting optimal candidates for follow-up experimental validation. (Wang, L et al. BMC Genomics. 2009: "A network-based integrative approach to prioritize reliable hits from multiple genome-wide RNAi screens in Drosophila")PubMed Frequently Asked Questions Q: How can I obtain the complete data set? STRING has recently changed its licensing model, at the request of the ELIXIR initiative. This means that all its data is now freely available, from the Download section. The licensing model is CC BY 4.0, requesting proper attribution as the only condition for usage. Q: How are scores computed? The 'combined scores' are computed by integrating the probabilities from the various different types of evidence ('evidence channels'), while correcting for the probability of randomly observing an interaction. For a more detailed description, please refer to von Mering et al., NAR 2005. Q: I am interested in downloading a limited set of interactions, for one or a few proteins only. How can I do that? There are basically two options for this: a) enter the protein(s) as usual into STRING and proceed to the network, then select the 'Tables / Exports' button below the network. From there, you can download the interactions in your current network, in a number of formats. b) alternatively, you can use one of our Application Programming Interfaces (APIs), especially if this might become a recurring task in the future. Using the APIs, you can create a small script/program that will download the interactions for you. For more details, see here. Q: How can I save a certain network? Below any given STRING network in the browser window, there is always a button labeled 'Tables/Exports'. There, you can save your current network in a variety of formats. (Bitmap Images, Scalable Vector Graphics, XML Summary (Proteomics Standards Initiative), Graph Layout Coordinates, Protein sequences in FASTA format, and Textual Summaries of interaction scores). Q: For my latest manuscript, I would like to use a network image produced by STRING. Must I ask for permission? No, permission is not required. But do we appreciate if you could cite us; please choose from among any of our published references (see here). Q: How can I trace the origin of the different evidences for a given interaction? Most of this information is available upon clicking on an edge of the graph in the network view. Furthermore, below each network you will find the button 'Evidence'; from there you can proceed to evidence views that each summarize evidence of a single type, for your current network. Q: How can I cite STRING? We do appreciate citations very much — as for many other online databases, citations are the main benchmark by which our funders decide whether we are 'worth the money'. So, yes, please cite us ... using any of the references here. Q: Which databases does STRING extract experimental/biochemical data from? Currently, these are: DIP, BioGRID, HPRD, IntAct, MINT, and PDB. Q: From which databases does STRING extract curated data? Currently, these are: Biocarta, BioCyc, Gene Ontology, KEGG, and Reactome. Q: How do I extract purely experimental data? Below each network, there is a button labeled 'Data Settings'. There, you can specify which type of evidence you want to contribute to your network. By un-checking all boxes except 'Experiments', you would get a network based purely on experimental evidence. Q: I need PPIs for a given species, but only from experimental data and not transferred from other species. For that, you will need to download a file with the full score details, and parse out the information you need. First, you should sign the license agreement, wait for the password and then download the file: 'protein.links.full.v10.txt.gz'. Use the file to get the direct experimental evidence, for example by, printing the columns for protein1 protein2 and experiments (i.e., columns 1,2,10) and grep for the 'species_id' (e.g., 9606 for human). zgrep ^"9606\." protein.links.full.v10.txt.gz | awk '($10 != 0) { print $1, $2, $10 }' > ~/result.txt Q: I want to differentiate physical interactions from functional ones within STRING. For this, you would have to use the database dumps (after academic licensing). You can use the table 'network.actions' to query for records that have a direct physical interaction annotated in their 'mode' column. If this is 'binding' then you can be fairly sure there is a direct physical interactions. If it is something other than 'binding' it could still be direct, physical but more likely is an indirect, functional interaction. Note that the 'actions' annotations in STRING are work in progress; there will likely be a fairly large fraction of interactions missing (false negatives). SELECT * FROM network.actions WHERE mode = 'binding'; Q: STRING is said to be 'locus-based' and only a single translated protein per locus is stored. What does this mean? STRING represents each protein-coding gene locus by only a single, representative protein. If there is more than one isoform per gene annotated, we usually select the longest isoform, unless we have information to suggest that another isoform is better supported (e.g., proteins selected in the CCDS database). Q: Does STRING contain any pathway or Gene Ontology information? I see that there is a table called 'funcats' ... ? The 'funcats' database table contains the functional categories as defined for the COG database. We do import the Gene Ontology annotations and use these for inferring interactions and for reporting enrichments. However, they are not yet represented in a database table — this will likely come in a future version. Q: Is there any phenotype or disease information contained in STRING? Not directly, but the fulltext-search capabilities at the start page will often turn up proteins which have already been annotated for a certain function, phenotype or disease. For example, searching for the word "wing" in Drosophila will return proteins that have been annotated/described as having a functional role in the wing. Q: Does the database provide a PubMed Reference ID for each interaction? Interactions that stem only from computational predictions do not have a PMID. Text-mining evidence may also stem from other sources, such as OMIM. Apart from the above exceptions, interactions mostly do come with at least one pubmed reference id. Some cases have several different PMIDs, and yet others share the same PMID (e.g., for external repositories, the interactions share the PMID of the publication of the database). Q: Regarding the 'sets', pathways and complexes ... what is the difference between a "set" and a "collection"? The different types of "sets" in STRING describe annotated pathways, complexes, and PDB structures having more than one protein. The "sets_items" describe memberships in the evidence sets. An interaction exists if two proteins share at least one set_id. The "sets" contain information about the set_ids, for example, from which "collection" they originate from. The "collections" are the different resources of data from which STRING imports data (for the channels 'experiments' and 'databases'). Cookies and Privacy HOW WE USE COOKIES Cookies are small text files that are stored on your computer when you visit websites. We use cookies to help identify you once you have logged in, and also to remember any parameter settings that you might have changed in STRING. You can delete/disable cookies already stored on your computer at any time (here's how); this will simply restore all settings to default and the STRING website will continue working. HOW WE COUNT USAGE We allow Google to count, anonymously and in aggregated form, how many users are visiting the STRING pages. This is required by our funders, who need to make sure that STRING is actually used by real users. The tracking is done via the Google Analytics framework. Note: we do not use the "UserId" feature offered by GoogleAnalytics; hence, we do not track users across different devices or websites. WHAT THE SERVER STORES • The server stores standard web-logs, including IP numbers, for technical purposes and debugging. • The server allows you to login, using credentials that you already have elsewhere ('Social Login', via auth0.com) • Once logged in, the server will store the pages you visited, and also allows you to upload gene sets and other data. • We will not share any of the above information with third parties, apart from confidential reporting to our funders. HOW TO DELETE YOUR USER DATA Once you have logged onto STRING, there are two ways to delete your user data: • you can revoke the permissions for the social login connection • you can delete all user data stored at the STRING server itself • for both options, refer to the subsection 'Your Privacy Controls', at the bottom of the 'My Data' page, here. © STRING CONSORTIUM 2022 * SIB - Swiss Institute of Bioinformatics * CPR - Novo Nordisk Foundation Center Protein Research * EMBL - European Molecular Biology Laboratory CREDITS * Funding * Datasources * Partners * Software ACCESS * Versions * APIs * Licensing * Usage INFO * Scores * Use scenarios * FAQs * Cookies/Privacy ABOUT * Content * References * People * Statistics STRING is part of the ELIXIR infrastructure: it is one of ELIXIR's Core Data Resources. Learn more >