golgi.sandbox.google.com Open in urlscan Pro
2a00:1450:400c:c0b::451 Public Scan

Back to summary
Submitted URL:
http://alphafoldserver.com/
Effective URL:
https://golgi.sandbox.google.com/welcome
Submission: On May 10 via api (May 10th 2024, 2:28:21 pm UTC) from US — Scanned from DE
Form analysis
0 forms found in the DOM

Text Content

ALPHAFOLD SERVER

BETA
Server

About

FAQs

feedbacklight_mode


Server
About
FAQs



ALPHAFOLD SERVER


POWERED BY ALPHAFOLD 3

googleContinue with Google

AlphaFold 3 model is a Google DeepMind and Isomorphic Labs collaboration


HOW DOES ALPHAFOLD SERVER WORK?

AlphaFold Server is a web-service that can generate highly accurate biomolecular
structure predictions containing proteins, DNA, RNA, ligands, ions, and also
model chemical modifications for proteins and nucleic acids in one platform.
It’s powered by the newest AlphaFold 3 model.




TAKE A LOOK AT SOME EXAMPLES

Protein-RNA-Ion: PDB 8AW3

Protein-Glycan-Ion: PDB 7BBV

Protein-DNA-Ion: PDB 7RCE


TERMS OF USE AND ATTRIBUTION

AlphaFold Server is for non-commercial use only, subject to AlphaFold Server
Terms of Service. AlphaFold Server output cannot be used in docking or screening
tools or to train machine learning models or related technology for biomolecular
structure prediction similar to AlphaFold Server.

If you use an AlphaFold Server prediction, please cite our paper: Abramson, J et
al. Accurate structure prediction of biomolecular interactions with AlphaFold 3.
Nature (2024).


FREQUENTLY ASKED QUESTIONS


WHAT BIOLOGICAL MOLECULE TYPES CAN BE MODELED WITH ALPHAFOLD SERVER?

expand_more

You can model a structure consisting of one or more of the following biological
molecule types:

 * Proteins
 * DNA
 * RNA
 * Biologically common ligands: ATP, ADP, AMP, GTP, GDP, FAD, NADP, NADPH, NDP,
   heme, heme C, myristic acid, oleic acid, palmitic acid, citric acid,
   chlorophylls A and B, bacteriochlorophylls A and B
 * Biologically common ions: Ca2+, Co2+, Cu2+, Fe3+, K+, Mg2+, Mn2+, Na+, Zn2+,
   Cl-

 * Biologically common post-translational modifications (PTMs) of amino acid
   residues
   
   * Phosphorylation of serine, threonine, tyrosine and histidine residues
   * Acetylation of lysine residues
   * Methylation of lysine and arginine residues
   * Malonylation of cysteine residues
   * Hydroxylation of proline, lysine and asparagine residues
   * Palmitoylation of cysteine residues
   * Succinylation of asparagine residues
   * S-nitrosylation of cysteine residues
   * Formylation of tryptophan residues
   * Crotonylation of lysine residues
   * Citrullination of lysine and arginine residues
   * Glycan chains (including branched chains) composed of certain sugars:
     alpha/beta-D-glucose, alpha/beta-D-mannose, alpha-L-fucose,
     beta-D-galactose, N-acetyl-beta-D-glucosamine

 * Biologically common chemical modifications of the nucleic acids:
   
   * DNA
     
     * Methylation of cytosine, guanine, and adenine
     * Carboxylation of cytosine
     * Oxidation of guanine
     * Formylation of cytosine
   
   * RNA
     
     * Methylation of cytosine, guanine, adenine, and uracil
     * Isomerisation of uridine into pseudouridine
     * Formylation of cytosine

The modeled structure can be composed of multiple proteins, nucleic acids,
ligands, and ions. Each protein and nucleic acid chain can have any number of
chemical modifications, subject to the token limit.


WHAT IS THE MAXIMUM JOB SIZE ALLOWED?

expand_more

The total size of the job is limited by the number of 'tokens' in the structure
- the limit is 5,000 tokens. Tokens are counted in the following way:

 * Proteins: 1 token per standard amino acid residue
 * DNA, RNA: 1 token per input nucleotide base
 * Ligands: 1 token per atom in the ligand
 * Ions: 1 token per ion
 * Modifications (excluding glycans): 1 token per atom for all atoms of the
   modified amino acid residue or nucleotide
 * Glycan PTMs: 1 token per atom in the glycan (in addition to the 1 token for
   the residue the glycan is attached to)

Note that each protein chain and nucleotide chain must contain at least 4 amino
acids or nucleotides, respectively.


HOW MANY JOBS CAN I RUN ON ALPHAFOLD SERVER?

expand_more

10 jobs per day. If you don’t have enough quota you can save your job and submit
it when your quota refreshes. We plan to explore other approaches for quota
allocation in the future, including weekly or monthly allocations.


HOW SHOULD I DEFINE INPUTS FOR ALPHAFOLD SERVER?

expand_more
 * For a protein, enter the single-letter amino acid sequence or paste in the
   contents of a FASTA file including a comment line(s). Use only standard
   single letter codes (nonstandard ones like B, J, O, U and X are unsupported).
 * For DNA, enter the single-letter nucleotide sequence in standard notation
   (5’-3’). Use only standard single letter codes (A, C, T, and G). For
   double-stranded DNA, please select the “+ Reverse complement” option from
   more_vert of your DNA entry to add the complementary strand.
 * For RNA, similarly enter the single-letter nucleotide sequence in standard
   notation (5’-3’). Use only standard single letter codes for RNA nucleotides
   (A, C, U, and G).
 * For ligands, ions, and post-translational modifications, select the desired
   entity from the list of supported types. The three letter codes displayed in
   the UI come from Protein Data Bank’s Chemical Component Dictionary.
 * If multiple copies of an entity are present (for example, a homomultimeric
   protein), indicate this by setting the number of copies in the corresponding
   field.


ARE THERE ANY RESTRICTIONS ON THE PROTEIN SEQUENCES THAT ARE ALLOWED?

expand_more
 * Yes, we are currently restricting sequences from a small number of viral
   pathogens. If you run a job that encounters the filter and have any questions
   please get in touch with the AlphaFold team feedback.
 * Based on our detailed external consultations we consider the release of
   AlphaFold 3 capabilities through the Server to be low risk, and we are using
   AlphaFold Server as a test-bed to explore how to develop robust filters for
   future biological AI models.
 * The current restricted sequences are not meant to cover a comprehensive set
   of all possible pathogens, instead it is a sample used to develop a filter.
   We plan to evolve this, including possible changes to what we restrict in the
   Server, through our active engagement with experts and the community. We
   share more details on Google DeepMind’s commitment to biosafety and
   responsible deployment of AlphaFold in our blog.


HOW CAN I INTERPRET CONFIDENCE METRICS TO CHECK THE ACCURACY OF STRUCTURES?

expand_more

Similar to AlphaFold2 and AlphaFold-Multimer, outputs include confidence
metrics. The main metrics are:

 * pLDDT: a per-atom confidence estimate on a 0-100 scale where a higher value
   indicates higher confidence. pLDDT aims to predict a modified LDDT score that
   only considers distances to polymers. For proteins this is similar to the
   lDDT-Cα metric but with more granularity as it can vary per atom not just per
   residue. For ligand atoms the modified LDDT considers the errors only between
   the ligand atom and polymers not other ligand atoms, and for DNA/RNA a wider
   radius of 30A is used for the modified LDDT instead of 15A. The pLDDT is
   shown as color outputs in the image of the structure, using the same value to
   color mapping as in AFDB.
 * PAE (predicted aligned error): estimate of the error in the relative position
   and orientation between two tokens in the predicted structure. Higher values
   indicate higher predicted error and therefore lower confidence. For proteins
   and nucleic acids, PAE score is essentially the same as AlphaFold2, where the
   error is measured relative to frames constructed from the protein backbone.
   For small molecules and post-translational modifications, a frame is
   constructed for each atom from its closest neighbors from a reference
   conformer.
 * pTM and ipTM scores: the predicted template modeling (pTM) score and the
   interface predicted template modeling (ipTM) score are both derived from a
   measure called the template modeling (TM) score. This measures the accuracy
   of the entire structure (Zhang and Skolnick, 2004; Xu and Zhang, 2010). A pTM
   score above 0.5 means the overall predicted fold for the complex might be
   similar to the true structure. ipTM measures the accuracy of the predicted
   relative positions of the subunits within the complex. Values higher than 0.8
   represent confident high-quality predictions, while values below 0.6 suggest
   likely a failed prediction. ipTM values between 0.6 and 0.8 are a gray zone
   where predictions could be correct or incorrect. TM score is very strict for
   small structures or short chains, so pTM assigns values less than 0.05 when
   fewer than 20 tokens are involved; for these cases PAE or pLDDT may be more
   indicative of prediction quality.

For detailed description of these confidence metrics see our paper. For protein
components, the AlphaFold: A Practical guide course for structures provides
additional tutorials on the confidence metrics.

If you are interested in a specific entity or interaction, then there are
confidences available in the downloadable outputs that are specific to each
chain or chain-pair, as opposed to the full complex. See json section for more
details on all the confidence metrics that are returned.


HOW MANY PREDICTIONS ARE RETURNED WHEN I RUN A JOB?

expand_more

The model samples five predictions per seed. The top ranked prediction is
displayed on the result page and all samples along with their associated
confidences are available to download as a zip file, via the Download button.

For ranking of the full complex use the ranking_score (higher is better). This
score uses overall structure confidences (pTM and ipTM), but also includes terms
that penalize clashes and encourage disordered regions not to have spurious
helices - these extra terms mean the score should only be used to rank
structures.

If you are interested in a specific entity or interaction, you may want to rank
by a metric specific to that chain or chain-pair, as opposed to the full
complex. In that case, use the per chain or per chain-pair confidence metrics
described in the json section for ranking.


HOW DO I INTERPRET ALL THE OUTPUTS IN THE DOWNLOADED JSON FILES?

expand_more

For each predicted sample we provide two JSON files. One contains summary
metrics - summaries for either the whole structure, per chain or per chain-pair
- and the other contains full 1D or 2D arrays.

Summary outputs:

 * ptm: A scalar in the range 0-1 indicating the predicted TM-score for the full
   structure.
 * iptm: A scalar in the range 0-1 indicating predicted interface TM-score
   (confidence in the predicted interfaces) for all interfaces in the structure.
 * fraction_disordered: A scalar in the range 0-1 that indicates what fraction
   of the prediction structure is disordered, as measured by accessible surface
   area, see our paper for details.
 * has_clash: A boolean indicating if the structure has a significant number of
   clashing atoms (more than 50% of a chain, or a chain with more than 100
   clashing atoms).
 * ranking_score: A scalar in the range [-100, 1.5] that can be used for ranking
   predictions, it incorporates ptm, iptm, fraction_disordered and has_clash
   into a single number with the following equation:
   
   
   0.8 × ipTM + 0.2 × pTM + 0.5 × disorder − 100 × has_clash

 * chain_pair_pae_min: A [num_chains, num_chains] array. Element (i, j) of the
   array contains the lowest PAE value across rows restricted to chain i and
   columns restricted to chain j. This has been found to correlate with whether
   two chains interact or not, and in some cases can be used to distinguish
   binders from non-binders.
 * chain_pair_iptm: A [num_chains, num_chains] array. Off-diagonal element (i,
   j) of the array contains the ipTM restricted to tokens from chains i and j.
   Diagonal element (i, i) contains the pTM restricted to chain i. Can be used
   for ranking a specific interface between two chains, when you know that they
   interact, e.g. for antibody-antigen interactions.
 * chain_ptm: A [num_chains] array. Element i contains the pTM restricted to
   chain i. Can be used for ranking individual chains when the structure of that
   chain is most of interest, rather than the cross-chain interactions it is
   involved with.
 * chain_iptm: A [num_chains] array that gives the average confidence (interface
   pTM) in the interface between each chain and all other chains. Can be used
   for ranking a specific chain, when you care about where the chain binds to
   the rest of the complex and you do not know which other chains you expect it
   to interact with. This is often the case with ligands.

Full array outputs:

 * full_pae: A [num_tokens, num_tokens] array. Element (i, j) indicates the
   predicted error in the position of token j, when the prediction is aligned to
   the ground truth using the frame of token i.
 * atom_plddts: A [num_atoms] array, element i indicates the predicted local
   distance difference test (pLDDT) for atom i in the prediction.
 * contact_probs: A [num_tokens, num_tokens] array. Element (i, j) indicates the
   predicted probability that token i and token j are in contact (8Å between the
   representative atom for each token), see our paper for details.
 * token_chain_ids: A [num_tokens] array indicating the chain ids corresponding
   to each token in the prediction.
 * atom_chain_ids: A [num_atoms] array indicating the chain ids corresponding to
   each atom in the prediction.


WHAT SHOULD I DO IF I HAVE UNKNOWN RESIDUES OR NUCLEOTIDES IN MY PROTEIN, DNA OR
RNA SEQUENCE?

expand_more

AlphaFold Server was not designed to model unknown residues or nucleotides (e.g.
X for the unknown residues and N for unknown nucleotides). Please substitute by
one of the standard residues or nucleotides that is appropriate for your
particular case. In general, consider following substitutions:

 * Proteins: replace unknown protein residues with alanine (A)
 * DNA: replace unknown nucleotides by poly-T (T), but other nucleotides are
   also suitable
 * RNA: replace unknown nucleotides by poly-U (U), but other nucleotides are
   also suitable


WHAT ARE SEEDS AND HOW ARE THEY SET?

expand_more

The model uses a 'seed' for internal random number generation. Normally this
seed is sampled automatically, and will be resampled when cloning a job. Running
multiple different seeds of the model and ranking over all the predictions can
lead to improved accuracy. The seed used is saved into the output information
per run.

To set a specific seed, turn off auto seed selection in the preview screen
(after clicking the 'Continue and Preview job' button). The seed can be any
integer between 0 and 4,294,967,295. When cloning a job where the seed was set,
the seed will return to being automatically chosen by default.


CAN I IMPORT JOB FILES INTO ALPHAFOLD SERVER?

expand_more

Yes, we support efficiently importing multiple draft jobs by uploading JSON
files with up to 100 jobs per file. Please note that you have a storage capacity
of up to 500 saved drafts in your history, so be mindful to manage your uploads
to stay within the limit.

To create a JSON file: please refer to this example for the JSON file syntax.
Inside each .zip file with modeling results, you'll find a JSON file named
’job_name_job_request.json’ containing the job inputs. These files offer a
convenient starting point for generating new jobs as they are easily editable in
standard text editors or in a programming system like Google Colab notebooks.

Once your file is prepared click the 'Upload JSON’ button to upload your JSON
files. Imported jobs will appear as saved drafts in your job history and you can
click on more_vert of your job to edit or run them.


HOW CAN I RUN A MODELING JOB AGAIN?

expand_more

Select the “Clone and reuse” option in the more_vert of your job history. This
option also allows further modification of the job before running.

Or, alternatively, upload the JSON file “job_name_job_request.json” that is part
of the .zip file containing modeling results. Press the “Upload JSON” button and
specify the corresponding JSON file; the imported job will appear as a saved
draft in your job history. The JSON files can be shared with other users who
want to reproduce your job on the Server.

Note that exact reproducibility is not guaranteed over time, due to changes in
underlying compiler optimisations.


WHAT IS DIFFERENT ABOUT THE NEW ALPHAFOLD 3 MODEL COMPARED TO ALPHAFOLD2?

expand_more

AlphaFold 3 can predict many biomolecules in addition to proteins. AlphaFold 2
predicts structures of proteins and protein-protein complexes. AlphaFold 3 can
generate predictions containing proteins, DNA, RNA, ions,ligands, and chemical
modifications. The new model also improves the protein complex modelling
accuracy. Please refer to our paper for more information on performance
improvements.

AlphaFold 2 generally produces looping “ribbon-like” predictions for disordered
regions. AlphaFold 3 also does this, but will occasionally output segments with
secondary structure within disordered regions instead, mostly spurious alpha
helices with very low confidence (pLDDT) and inconsistent position across
predictions.


WHAT ARE SOME LIMITATIONS OF THE ALPHAFOLD 3 MODEL?

expand_more

The accuracy of the model varies across biomolecules and interface types; model
confidence outputs are correlated with prediction accuracy, and the strength of
the correlation varies per molecule type. In some cases, optimal model
performance can only be achieved by running multiple seeds and taking the top
ranked sample; this is particularly the case for antibody-antigen interactions.
See our paper for more details on the model and its limitations.

The model occasionally produces overlapping atoms in the predictions, and in
some cases homomers where entire chains have been observed to overlap. Clashes
mostly occur for protein-nucleic acid complexes with both greater than 100
nucleotides and greater than 2,000 residues in total.

The model can produce spurious structural order in disordered regions. These
regions are typically marked as very low confidence, but they can lack the
distinctive ribbon-like appearance that AlphaFold 2 produces in disordered
regions. The presence of disordered regions affects nearby pLDDT - removing
disordered tails can give a clearer picture of confidence in ordered regions.

Model outputs do not always have the correct chirality but this will vary across
predictions, making it possible to select predictions with correct chirality in
most cases.


WHAT MOLECULE TYPES ARE NOT SUPPORTED VIA ALPHAFOLD SERVER?

expand_more

AlphaFold Server does not support ligands, ions and modifications that are not
in the molecule list section above. Additionally, AlphaFold Server is not
capable of predicting water molecules or hydrogen atoms, and is not aware of
membrane planes for membrane proteins


WHAT TERMS OF USE APPLY TO ALPHAFOLD SERVER PREDICTIONS?

expand_more

AlphaFold Server predictions are provided for non-commercial use only, under and
subject to AlphaFold Server Output Terms of Use.

 * You cannot use AlphaFold Server outputs in docking or screening tools or to
   train machine learning models or related technology for biomolecular
   structure prediction.
 * You can publish, share and adapt AlphaFold Server output in accordance with
   AlphaFold Server Terms of Service, including the requirement to provide clear
   notice that ongoing use is subject to AlphaFold Server Output Terms of Use
   and of any modifications you make.


HOW SHOULD I CITE ALPHAFOLD SERVER?

expand_more

Please reference our paper: Abramson, J et al. Accurate structure prediction of
biomolecular interactions with AlphaFold 3. Nature (2024)


WHO SHOULD I CONTACT WITH ENQUIRIES AND FEEDBACK?

expand_more

Please get in touch with the AlphaFold Server team via the feedback button and
we’ll be happy to assist you with questions. Reporting an issue from the result
page automatically includes the associated job ID.

We're working hard to answer all inquiries but there may be a short delay in our
response due to the high volume we are receiving.


WHAT IS THE DIFFERENCE BETWEEN ALPHAFOLD SERVER AND THE ALPHAFOLD DATABASE?

expand_more

AlphaFold Database is a large collection of precomputed protein predictions,
generated with the AlphaFold2 model. It covers a significant proportion of the
proteins in UniProt, and entries can be quickly downloaded including in bulk.
However, the predictions are always single chains (even if the protein forms
multimers in nature) and contain only the protein part (no ligands or
co-factors). AlphaFold Database is free for commercial and non-commercial use,
and requires no registration to access the structures.

AlphaFold Server is a web-service that offers customized biomolecular structure
prediction. It makes several newer AlphaFold 3 capabilities available, including
support for a wider range of molecule types (DNA, RNA, ions, ligands, chemical
modifications). The service is free for non-commercial use and requires a simple
sign up involving accepting non-commercial use terms.


HOW CAN I INCREASE THE DIVERSITY OF MY PREDICTIONS?

expand_more

Run again with different seeds (it will be chosen automatically if not set).
Users of AlphaFold2 have had success in generating diverse predictions by
customizing MSA and/or template inputs to the model - this is not currently
possible in the server but we hope to provide the ability to do similar
customisations soon.


CAN I MODEL GLYCOSYLATED PROTEINS?

expand_more

To describe the glycan chains, we are using 3-letter CCD codes (Chemical
Components in the PDB) of the corresponding glycans. Please note that
stereoisomers are described by different CCD codes, e.g. mannose (C6H12O6) could
be described as MAN for alpha-D-mannose and BMA for beta-D-mannose

 * The Server supports the following glycan residues to be attached to a protein
   residue
   
   * N (Asparagine): BGC, BMA, GLC, MAN, NAG
   * T (Threonine): BGC, BMA, FUC, GLC, MAN, NAG
   * S (Serine): BGC, BMA, FUC, GLC, MAN, NAG

 * Branched glycans can be constructed in the form of a tree with either one or
   two downstream connections per glycan, attached to a protein residue. Up to 8
   glycan residues in total are supported. Here are some examples that
   demonstrate how to input branching glycans:
   
   * NAG: NAG is a single glycan residue.
   * NAG(BMA): NAG has a single child which is BMA.
   * NAG(BMA(BGC)): NAG has 1 child which is BMA; BMA has one child which is
     BGC.
   * NAG(FUC)(NAG): NAG has 2 children which are FUC and NAG.
   * NAG(NAG(MAN(MAN(MAN)))): linear glycan chain.
   * NAG(NAG(MAN(MAN(MAN)(MAN(MAN)(MAN))))): branched ligand chain.

Glycan - glycan connections should also be chemically valid. For example,
GLC(NAG)(MAN) is not a valid branched glycan because NAG and MAN can’t form
glycosidic bonds to GLC.

The Server assumes that glycosidic bonds are formed between atoms at positions
that have the highest frequency of occurrence in similar bonds from the PDB -
this might lead to different bond positions in the modeled structure than
expected. Specifying exact atoms for the glycosidic bond is not currently
supported.


RELATED POSTS


DOMAIN-SPECIFIC TECHNOLOGY


ALPHAFOLD 3 PREDICTS THE STRUCTURE AND INTERACTIONS OF ALL OF LIFE’S MOLECULES

Introducing AlphaFold 3, an AI model developed by Google DeepMind and Isomorphic
Labs. By accurately predicting the structure of proteins, DNA, RNA, ligands and
more, and how they interact, we expect it to transform our understanding of the
biological world and drug discovery.
arrow_forward


TECHNOLOGY


ACCELERATING RESEARCH IN NEARLY EVERY FIELD OF BIOLOGY

By solving a decades-old scientific challenge, our AI system is helping to solve
crucial problems like treatments for disease or breaking down single-use
plastics. One day, it might even help unlock the mysteries of how life itself
works.
arrow_forward
About Google Google products Terms Output Terms Privacy Prohibited use policy
golgi.sandbox.google.com Open in urlscan Pro 2a00:1450:400c:c0b::451 Public Scan

Form analysis 0 forms found in the DOM

Text Content

golgi.sandbox.google.com Open in urlscan Pro
2a00:1450:400c:c0b::451 Public Scan

Form analysis
0 forms found in the DOM