string-db.org Open in urlscan Pro
2606:4700:3108::ac42:2b73  Public Scan

URL: https://string-db.org/cgi/info
Submission: On October 12 via api from BE — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

 * Version:
 * 11.5

 * Login
 * Register
 * Survey

STRINGSTRING
 * Search
 * Download
 * Help
 * My Data




INFO

Scores 
Use scenarios 
FAQs 
Cookies/Privacy 
Interaction Scores


THE BASIC PRINCIPLE

In STRING, each protein-protein interaction is annotated with one or more
'scores'.
Importantly, these scores do not indicate the strength or the specificity of the
interaction. Instead, they are indicators of confidence, i.e. how likely STRING
judges an interaction to be true, given the available evidence. All scores rank
from 0 to 1, with 1 being the highest possible confidence. A score of 0.5 would
indicate that roughly every second interaction might be erroneous (i.e., a false
positive).


TRANSFER SCORES

For most types of evidence, there are two types of scores: the 'normal' score,
and the 'transferred' score. The latter is computed from data that is not
originally observed in the organism of interest, but instead in some other
organism and then transferred via homology/orthology. All potential source
organisms are searched for evidence, but the actual transfers to the receiving
organism are made non-redundant (according to 'clades' of closely related
organisms in the tree of life).


A TYPICAL ORGANISM

As an example, the model organism 'Escherichia coli K12 MG1655' is shown below —
indicating the number of interactions per score type, at a confidence of
'medium' or better (score >= 0.400);


gene neighborhood, normal:  7851 interactionsgene neighborhood,
transferred:  11177 interactionsgene fusion:  514 interactionsgene
cooccurrence:  35497 interactionsgene coexpression, normal:  12376
interactionsgene coexpression, transferred:  3154
interactionsexperiments/biochemistry, normal:  5301
interactionsexperiments/biochemistry, transferred:  4113 interactionsannotated
pathways, normal:  6726 interactionsannotated pathways, transferred:  1727
interactionstextmining, normal:  27445 interactionstextmining,
transferred:  7119 interactionscombined-score, total:  210914 interactions

Scientific Use Scenarios
Below is a selection of published examples of large-scale scientific use of
STRING network data. Apart from the ad-hoc use of the website (in order to learn
about individual proteins or to find out about functional enrichments), the
large-scale use cases below signify another important benefit of STRING: the
availability of unified, scored, genome-wide interaction data, for a number of
organisms.
 

1.) Researching protein-networks in the context of early immune system
establishment

In this study, the impact of post-natal colonization of the body with microbes
is researched by transiently colonizing pregnant female mice. It is shown that
the maternal microbiota shapes the immune system of the offspring. After
performing RNA-seq of whole small intestinal mucosal RNA from neonates at day 14
(control and gestation-only colonized dams) and identification of differentially
expressed genes, the authors use STRING to deduce involved protein networks.
(Ganal-vonArburg, SC et al. Science. 2017: "The maternal microbiota drives early
postnatal innate immune development.")PubMed
2.) Highly connected proteins have stable steady-state distribution of gene
expression

This study develops a thermodynamic-like theoretical framework to analyze
protein networks and gene expression patterns. Using this methodology, they find
a dependence of the steady-state stability of transcript levels and the
connectivity in STRING networks. The findings agree with the observation that
essential genes have a low variability of expression and emphasize the role of
stochasticity and robustness in the control of expression. The authors suggest
that genes can be grouped into two categories, high and low expression, which
are stable, versus adaptable to biological stimuli. (Kravchenko-Balasha, N et
al. PNAS. 2012: "On a fundamental structure of gene networks in living
cells")PubMed
3.) Searching for candidate genes involved in the immune response to gluten

Celiac disease (CD) is an auto-immune condition which may cause gastrointestinal
and nutritional problems. The authors of this review article use STRING to look
for interactions of genes that are known to be involved in CD. This results in
40 candidate genes that are likely to be involved in the progression of the
disease. Since the levels the marker genes of CD is heterogeneous, several
different genes may be the cause of the condition. (Abadie, V et al. Annu. Rev.
Immunol. 2011: "Integration of genetic and immunological insights into a model
of celiac disease pathogenesis")PubMed
4.) Identifying candidates for unknown enzyme in a pathway

Bacillithiol (BSH) is a low-molecular-weight thiol in bacteria (Bacilli family).
It is synthesized by a not fully characterized pathway. The authors used STRING
to identify candidates for an unknown enzyme using known components of the
pathway as input query. The co-occurence and the fusion channel revealed a
potential candidate for the enzyme. Experiments could then confirm that the
functionality indeed was essential for the pathway. (Gaballa, A et al. PNAS.
2010: "Biosynthesis and functions of bacillithiol, a major low-molecular-weight
thiol in Bacilli")PubMed
5.) Using STRING to narrow the search space for two-locus epistatis

The aim of this study was to search for combinations of pairs of SNPs that cause
disease (two-locus epistatis). Testing all combinations is computationally
expensive. By limiting the number of search possibilities to known
protein-protein interactions from STRING the search space was drastically
reduced. Furthermore, by only accessing likely candidates of protein
interactions, low significance of interaction due to correcting for multiple
comparisons is alleviated. (Emily, M et al. Eur J Hum Genet. 2009: "Using
biological networks to search for interacting loci in genome-wide association
studies")PubMed
6.) Using STRING to show network connectivity

Lysine acetylation is a post-translational modification that regulate gene
expression. This study show that lysine acetylation preferentially targets large
macro-molecular complexes and has a broad regulatory scope comparable with other
post-translational modifications. By using STRING the authors show that the
acetylome has significantly higher network connectivity than random: namely
roughly six interactions per node, whereas the random expectation would be less
than three. (Choudhary, C et al. Science. 2009: "Lysine acetylation targets
protein complexes and co-regulates major cellular functions")PubMed
7.) STRING as a general purpose database

In this study the evolutionary history of CDC25 homology domain was
investigated. The STRING database was used to acquire the sequence information
for a number of genomes, showing how STRING can be used as a as general
database. This is particularly useful if the user downloads the entire dataset
by signing the academic license agreement. (van Dam, TJ et al. Cell Signal.
2009: "Phylogeny of the CDC25 homology domain reveals rapid differentiation of
Ras pathways between early animals and fungi")PubMed
8.) STRING to guide experiments

This study is a characterization of the Rod-derived Cone Viability Factor
(RdCVFL) signaling pathway involved in neuronal cell death mediated by oxidative
stress. STRING was used to identify 90 proteins interacting with RdCVFL. These
were examined for interactions using a cell-based assay. The authors show that
RdCVFL inhibits the phosphorylation of the microtubule binding protein Tau. In
vitro, RdCVFL protects Tau from oxidative damage, which is implicated in retinal
degeneration. (Fridlich, R et al. Mol Cell Proteomics. 2009: "The
thioredoxin-like protein rod-derived cone viability factor (RdCVFL) interacts
with TAU and inhibits its phosphorylation in the retina")PubMed
9.) Prioritizing functional assignments in RNAi screens using interaction
network data

RNA interference (RNAi) screening can be used to infer the functionally of genes
in an organism. The results from such screens often contain errors. Wang et al.
suggest a method based on a scoring function for integrating STRING network
information to indicate false positives and false negatives associated with RNAi
screens. Thereby, suggesting optimal candidates for follow-up experimental
validation. (Wang, L et al. BMC Genomics. 2009: "A network-based integrative
approach to prioritize reliable hits from multiple genome-wide RNAi screens in
Drosophila")PubMed
Frequently Asked Questions
Q: How can I obtain the complete data set?

STRING has recently changed its licensing model, at the request of the ELIXIR
initiative. This means that all its data is now freely available, from the
Download section. The licensing model is CC BY 4.0, requesting proper
attribution as the only condition for usage.
Q: How are scores computed?

The 'combined scores' are computed by integrating the probabilities from the
various different types of evidence ('evidence channels'), while correcting for
the probability of randomly observing an interaction. For a more detailed
description, please refer to von Mering et al., NAR 2005.
Q: I am interested in downloading a limited set of interactions, for one or a
few proteins only. How can I do that?

There are basically two options for this:
a) enter the protein(s) as usual into STRING and proceed to the network, then
select the 'Tables / Exports' button below the network. From there, you can
download the interactions in your current network, in a number of formats.
b) alternatively, you can use one of our Application Programming Interfaces
(APIs), especially if this might become a recurring task in the future. Using
the APIs, you can create a small script/program that will download the
interactions for you. For more details, see here.
Q: How can I save a certain network?

Below any given STRING network in the browser window, there is always a button
labeled 'Tables/Exports'. There, you can save your current network in a variety
of formats. (Bitmap Images, Scalable Vector Graphics, XML Summary (Proteomics
Standards Initiative), Graph Layout Coordinates, Protein sequences in FASTA
format, and Textual Summaries of interaction scores).
Q: For my latest manuscript, I would like to use a network image produced by
STRING. Must I ask for permission?

No, permission is not required. But do we appreciate if you could cite us;
please choose from among any of our published references (see here).
Q: How can I trace the origin of the different evidences for a given
interaction?

Most of this information is available upon clicking on an edge of the graph in
the network view. Furthermore, below each network you will find the button
'Evidence'; from there you can proceed to evidence views that each summarize
evidence of a single type, for your current network.
Q: How can I cite STRING?

We do appreciate citations very much — as for many other online databases,
citations are the main benchmark by which our funders decide whether we are
'worth the money'. So, yes, please cite us ... using any of the references here.
Q: Which databases does STRING extract experimental/biochemical data from?

Currently, these are: DIP, BioGRID, HPRD, IntAct, MINT, and PDB.
Q: From which databases does STRING extract curated data?

Currently, these are: Biocarta, BioCyc, Gene Ontology, KEGG, and Reactome.
Q: How do I extract purely experimental data?

Below each network, there is a button labeled 'Data Settings'. There, you can
specify which type of evidence you want to contribute to your network. By
un-checking all boxes except 'Experiments', you would get a network based purely
on experimental evidence.
Q: I need PPIs for a given species, but only from experimental data and not
transferred from other species.

For that, you will need to download a file with the full score details, and
parse out the information you need. First, you should sign the license
agreement, wait for the password and then download the file:
'protein.links.full.v10.txt.gz'. Use the file to get the direct experimental
evidence, for example by, printing the columns for protein1 protein2 and
experiments (i.e., columns 1,2,10) and grep for the 'species_id' (e.g., 9606 for
human).
zgrep ^"9606\." protein.links.full.v10.txt.gz | awk '($10 != 0) { print $1, $2,
$10 }' > ~/result.txt
Q: I want to differentiate physical interactions from functional ones within
STRING.

For this, you would have to use the database dumps (after academic licensing).
You can use the table 'network.actions' to query for records that have a direct
physical interaction annotated in their 'mode' column. If this is 'binding' then
you can be fairly sure there is a direct physical interactions. If it is
something other than 'binding' it could still be direct, physical but more
likely is an indirect, functional interaction. Note that the 'actions'
annotations in STRING are work in progress; there will likely be a fairly large
fraction of interactions missing (false negatives).
SELECT * FROM network.actions WHERE mode = 'binding';
Q: STRING is said to be 'locus-based' and only a single translated protein per
locus is stored. What does this mean?

STRING represents each protein-coding gene locus by only a single,
representative protein. If there is more than one isoform per gene annotated, we
usually select the longest isoform, unless we have information to suggest that
another isoform is better supported (e.g., proteins selected in the CCDS
database).
Q: Does STRING contain any pathway or Gene Ontology information? I see that
there is a table called 'funcats' ... ?

The 'funcats' database table contains the functional categories as defined for
the COG database. We do import the Gene Ontology annotations and use these for
inferring interactions and for reporting enrichments. However, they are not yet
represented in a database table — this will likely come in a future version.
Q: Is there any phenotype or disease information contained in STRING?

Not directly, but the fulltext-search capabilities at the start page will often
turn up proteins which have already been annotated for a certain function,
phenotype or disease. For example, searching for the word "wing" in Drosophila
will return proteins that have been annotated/described as having a functional
role in the wing.
Q: Does the database provide a PubMed Reference ID for each interaction?

Interactions that stem only from computational predictions do not have a PMID.
Text-mining evidence may also stem from other sources, such as OMIM. Apart from
the above exceptions, interactions mostly do come with at least one pubmed
reference id. Some cases have several different PMIDs, and yet others share the
same PMID (e.g., for external repositories, the interactions share the PMID of
the publication of the database).
Q: Regarding the 'sets', pathways and complexes ... what is the difference
between a "set" and a "collection"?

The different types of "sets" in STRING describe annotated pathways, complexes,
and PDB structures having more than one protein. The "sets_items" describe
memberships in the evidence sets. An interaction exists if two proteins share at
least one set_id. The "sets" contain information about the set_ids, for example,
from which "collection" they originate from. The "collections" are the different
resources of data from which STRING imports data (for the channels 'experiments'
and 'databases').
Cookies and Privacy


HOW WE USE COOKIES

Cookies are small text files that are stored on your computer when you visit
websites. We use cookies to help identify you once you have logged in, and also
to remember any parameter settings that you might have changed in STRING. You
can delete/disable cookies already stored on your computer at any time (here's
how); this will simply restore all settings to default and the STRING website
will continue working.


HOW WE COUNT USAGE

We allow Google to count, anonymously and in aggregated form, how many users are
visiting the STRING pages. This is required by our funders, who need to make
sure that STRING is actually used by real users. The tracking is done via the
Google Analytics framework. Note: we do not use the "UserId" feature offered by
GoogleAnalytics; hence, we do not track users across different devices or
websites.


WHAT THE SERVER STORES

  • The server stores standard web-logs, including IP numbers, for technical
purposes and debugging.
  • The server allows you to login, using credentials that you already have
elsewhere ('Social Login', via auth0.com)
  • Once logged in, the server will store the pages you visited, and also allows
you to upload gene sets and other data.
  • We will not share any of the above information with third parties, apart
from confidential reporting to our funders.



HOW TO DELETE YOUR USER DATA

Once you have logged onto STRING, there are two ways to delete your user data:
  • you can revoke the permissions for the social login connection
  • you can delete all user data stored at the STRING server itself
  • for both options, refer to the subsection 'Your Privacy Controls', at the
bottom of the 'My Data' page, here.



© STRING CONSORTIUM 2022

 * SIB - Swiss Institute of Bioinformatics
 * CPR - Novo Nordisk Foundation Center Protein Research
 * EMBL - European Molecular Biology Laboratory


CREDITS

 * Funding
 * Datasources
 * Partners
 * Software


ACCESS

 * Versions
 * APIs
 * Licensing
 * Usage


INFO

 * Scores
 * Use scenarios
 * FAQs
 * Cookies/Privacy


ABOUT

 * Content
 * References
 * People
 * Statistics

STRING is part of the ELIXIR infrastructure: it is one of ELIXIR's Core Data
Resources.  Learn more >