Journal of Molecular Biology
A Comparative Study of the Relationship Between Protein Structure and β-Aggregation in Globular and Intrinsically Disordered Proteins
Introduction
Protein aggregation has long been thought of as an unspecific process caused by the formation of non-native contacts between protein folding intermediates. Recent work, however, shows that often aggregation is a much more specific process than previously expected and that, accordingly, it can be reliably correlated to a combination of simple physico-chemical parameters.1, 2, 3 In particular, several models for aggregation were postulated that all involve the formation of an intermolecular β-sheet initiated by amino acid sequences that act as nuclei for β-aggregation.4, 5, 6, 7, 8 According to these models, aggregation is initiated when amino acid segments having a high hydrophobicity, a good β-sheet propensity and a low net charge are solvent-exposed so that they can associate. As a result one would then expect aggregating protein segments to be buried in the folded state and not to be exposed to the solvent. This is confirmed by the experimental finding that in many globular proteins, aggregation occurs during refolding or under conditions in which denatured or partially folded states are significantly populated, i.e. at high concentration or as a result of destabilizing conditions or mutations.9 Based on these findings we recently developed the computer programme TANGO10 to predict β-aggregating stretches in proteins, based on a statistical mechanics algorithm that considers the physico-chemical parameters described above but also competition between different structural conformations: β-turn, α-helix, β-sheet aggregates and the folded state. The algorithm is based on the assumption that in the ordered β-aggregates the nucleating regions end up fully buried, paying maximal desolvation energy as well as entropy, while satisfying their H-bonding potential. The energy contributions are derived from the FOLD-X force field.11 In a blind test involving 174 peptides from over 20 proteins, TANGO achieved an accuracy of 95% in predicting aggregating sequences, as well as the effect of point mutations on the aggregation tendency of proteins.10 Many intrinsically disordered proteins (IDPs) have been discovered in all kingdoms of life, but especially in higher eukaryotes.12, 13, 14 These are proteins or domains that, in their native state, are either completely disordered or contain large disordered regions.15, 16 More than 180 such proteins are known to date, including prions, CREB, Tau, MAPs and p53.16 These polypeptides perform important regulatory functions and are widespread in eukaryotic cells and tissue. Some acquire structure upon binding to another protein or DNA, others act as structural anchors in large protein–protein and protein–RNA complexes, making use of extended interaction surfaces that are simply not available in more compact conformations.12 Furthermore, many globular proteins contain disordered segments acting as functional modules, e.g. post-translational modification sites and domain ligands. Importantly, many IDPs are involved in key cellular processes and some of them are related to major protein conformational diseases, e.g. prions (BSE), Tau (Alzheimer's disease), and synuclein (Parkinson's disease). The uniting factor associating the above proteins to their disease states is a high degree of aggregation or amylogenicity. Amylogenicity is not itself a direct result of β-aggregation but it is often found in association with and can be strongly promoted by β-aggregation.17 On the other hand, as mentioned above, it is often found experimentally that unstructured proteins are resistant to aggregation, even under harsh treatments such as incubation at high temperature.16 In fact, heat-exposure of cell-extracts is an effective protocol for purification of several recombinantly expressed unstructured proteins.16 It is therefore important to investigate the relationship between intrinsic disorder and aggregation to gain further insight into the potential of IDPs to be implicated into protein conformational diseases. The TANGO algorithm offers the opportunity to compare the aggregation propensities of IDPs and globular proteins, not only by considering average aggregation-related physico-chemical properties, but also by directly comparing the nature and frequency of aggregation-promoting nucleation stretches. This analysis should therefore allow us to test whether disorder does correlate with aggregation, as some cases of disease association suggest, or whether it anti-correlates with aggregation as residue compositional biases of IDPs suggest.
In order to deal with this issue we have used TANGO to compare the aggregation tendency of a non-redundant set of globular proteins derived from the SCOP database (the ASTRAL40 set, see Materials and Methods),18 a set of proteins that were experimentally shown to be unstructured16, 19 as well as a set of predicted disordered protein sequences. Data sets of experimentally verified disordered proteins are scarce and rather error-prone, hence we have collected and cured a set of 296 experimentally verified and published, IDP sequences. This is to our knowledge the largest dataset available to the community. The datasets of predicted disordered segments or proteins were predicted by the DisEMBL20 and GlobPlot21 algorithms and divided into sequences of low (∼50%) and average sequence complexity.
Our analysis clearly shows that aggregation-prone segments are much less frequent in IDPs than in globular proteins, thus accounting for their good solubility. Although more frequent in globular proteins, β-aggregating segments are generally part of the hydrophobic core. These observations show that the compositional bias observed in IDPs reduces secondary and tertiary structure as well as aggregation because both structure and aggregation rely on similar physico-chemical properties. As previously observed,12, 16 IDPs are not completely devoid of structure, as should be expected if some degree of functional specificity has to be obtained, but they perform their particular cellular functions by achieving a low degree of order, retaining only structural propensities that are devoid of aggregation-promoting features.
Section snippets
TANGO score for aggregation and accuracy of the TANGO algorithm
The TANGO algorithm was calibrated using data found in the scientific literature on the aggregation of 174 peptides corresponding to sequence fragments of 21 different proteins, studied by various research groups using circular dichroism (CD) or nuclear magnetic resonance (NMR). Of the peptides in our set, 70 were experimentally observed to aggregate in the concentration range between 100 μM and 1 mM, while the others remained soluble in this concentration range. A detailed description of our
Conclusions
TANGO is an algorithm to predict β-aggregation nucleating regions in proteins. Here, we used TANGO to compare the β-aggregation propensities of globular proteins and intrinsically unstructured proteins. In globular proteins we found similar amounts of β-aggregating nucleation regions in all-α, all-β and mixed α/β proteins. This demonstrates that globular proteins do display a certain degree of structural frustration and can at the same time display propensities for both α and β conformations
Datasets
Here we have used datasets that cover globular and IDPs. Both predicted and experimentally verified datasets are described and for each dataset we have split the data into a low and a normal-complexity set.
Acknowledgements
This work was supported, in part, by EU grant QLRI-CT2-2002-00241. J.W.H.S. and F.R. were supported by International Prize Traveling Fellowships from the Wellcome Trust. Thanks to Sara Quirk for reading this manuscript. We are grateful to Lars Juhl Jensen for the human proteome set.
References (39)
- et al.
Synchrotron X-ray studies suggest that the core of the transthyretin amyloid fibril is a continuous beta-sheet helix
Structure
(1996) - et al.
Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations
J. Mol. Biol.
(2002) - et al.
Intrinsically disordered protein
J. Mol. Graph. Model.
(2001) - et al.
Coupling of folding and binding for unstructured proteins
Curr. Opin Struct. Biol.
(2002) Intrinsically unstructured proteins
Trends Biochem. Sci.
(2002)- et al.
Protein disorder prediction: implications for structural proteomics
Structure (Cambridge)
(2003) - et al.
Protein folding funnels: the nature of the transition state ensemble
Fold. Des.
(1996) Protein misfolding, evolution and disease
Trends Biochem. Sci.
(1999)- et al.
A neuronal isoform of the aplysia CPEB has prion-like properties
Cell
(2003) - et al.
Insights into the origin of the tendency of the PI3-SH3 domain to form amyloid fibrils
J. Mol. Biol.
(2002)
Conformational analysis of peptides corresponding to beta-hairpins and a beta-sheet that represent the entire sequence of the alpha-spectrin SH3 domain
J. Mol. Biol.
Elucidating the folding problem of helical peptides using empirical parameters. II. Helix macrodipole effects and rational modification of the helical content of natural peptides
J. Mol. Biol.
Protein-misfolding diseases: getting out of shape
Nature
Kinetic partitioning of protein folding and aggregation
Nature Struct. Biol.
Rationalization of the effects of mutations on peptide and protein aggregation rates
Nature
A molecular model of the amyloid fibril
Ciba Found. Symp.
The molecular basis of amyloidosis
Cell Mol. Life Sci.
Molecular structure of a fibrillar Alzheimer's A beta fragment
Biochemistry
De novo designed peptide-based amyloid fibrils
Proc. Natl Acad Sci. USA
Cited by (0)
- †
R.L., J.S. and F.R. contributed equally to this work.