resprox molecules image


Figure 1. An outline of the ResProx algorithm. ResProx starts by assessing multiple parameters of protein quality using sub-programs such as VADAR (Willard et al. 2003), MolProbity (Chen et al. 2010), RosettaHoles (Sheffler and Baker 2009) and PROSESS (Berjanskii et al. 2010). The resulting quality scores are used to predict equivalent resolution with a support vector regression model, which was trained on a set of high-quality X-ray structures. Additionally, mean values and standard deviations of the quality parameters for a database of high-resolution structures are used to generate Z-scores, which are consequently converted to equivalent resolution value via a Z-Mean protocol. Finally, a decision making module selects one of the two equivalent resolution values as the final result, based on the difference between the predicted values and raw scores of protein quality.

 

 

 

Figure 2. Correlation between ResProx equivalent resolution and X-ray experimental resolution for the ResProx training and testing sets. A) Final ResProx values for the ResProx training set. B) Final ResProx values for the ResProx testing set. C) Z-Mean equivalent resolution for the ResProx training set. D) Z-Mean equivalent resolution for the ResProx testing set. E) SVR predictions for the ResProx training set. F) SVR predictions for the ResProx testing set. R and Err parameters indicate Pearson correlation coefficient and absolute mean error of resolution prediction, respectively.

 

 

 

 


 

Figure 3. Correlation between equivalent resolution and X-ray experimental resolution as calculated by Procheck-NMR, MolProbity, and RosettaHoles2. (A) Procheck-NMR equivalent resolution for the ResProx training set. (B) Procheck-NMR equivalent resolution for the ResProx testing set. (C) RosettaHoles2 SRESL equivalent resolution for the ResProx training set. (D) RosettaHoles2 SRESL for the ResProx testing set. (E) MolProbity score for the ResProx training set. (F) MolProbity score for the ResProx testing set. R and Err parameters indicate Pearson correlation coefficient and absolute mean error of resolution prediction, respectively.

 


 

 

 

Figure 4. Correlation between completeness of experimental information (distance restraints) and equivalent resolution of ubiquitin. (A) ResProx score. (B) Procheck-NMR equivalent resolution. (C) RosettaHoles2 SRESL. (D) MolProbity score. Different measures of the completeness of the distance restraints was achieved by randomly removing 5 distance restraints from the total restraint set. Distance restraints consisted of NOE-based distance restraints and hydrogen bond distance restraints of the ubiquitin NMR ensemble 1D3Z.

 

 

 

 

 


Figure 5. Correlation between equivalent resolution and the ensemble precision of ubiquitin. (A) ResProx score. (B) Procheck-NMR equivalent resolution. (C) RosettaHoles2 SRESL. (D) MolProbity score. Ensemble precision was assessed by calculating backbone RMSD of ubiquitin NMR ensembles with MolMol (Koradi et al. 1996). Spearman rank-order correlation coefficient is 0.95, 0.69, 0.84, and 0.90 for ResProx, Procheck-NMR, MolProbity, and RosettaHoles2, respectively.

 

 

 


Figure 6. Correlation of equivalent resolution with backbone proton chemical shifts (A) ResProx score. (B) Procheck-NMR equivalent resolution. (C) RosettaHoles2 SRESL. (D) MolProbity score. The agreement between ubiquitin models and backbone proton chemical shifts was assessed by predicting the chemical shifts from different NMR models with ShiftX2 (Han et al. 2011) and calculating the mean absolute difference between predicted and experimentally measured chemical shifts. Spearman rank-order correlation coefficient is 0.95, 0.73, 0.85, and 0.95 for ResProx, Procheck-NMR, MolProbity, and RosettaHoles2, respectively.

 

 

 

 


Figure 7. Correlation between equivalent resolution of ubiquitin and the number of distance violations. (A) ResProx score (B) Procheck-NMR equivalent resolution. (C) RosettaHoles2 SRESL. (D) MolProbity score.

 

 

 

 

 


Figure 8. Correlation between the equivalent resolution of ubiquitin and model accuracy. (A) ResProx resolution (B) Procheck-NMR equivalent resolution. (C) RosettaHoles2 SRESL. (D) MolProbity score. Model accuracy was measured by calculating backbone RMSD of ubiquitin models with respect to the ubiquitin X-ray structure 1UBQ. NMR models of ubiquitin with different distance restraint violations were analyzed (see text for details).

 

 

 

Table 1. Correlation coefficients and mean absolute errors of ResProx, Procheck-NMR, MolProbity, and RosettaHoles2 for obsolete and current PDB entries of NMR structures..

 

Protein

Version

PDB

ResProx

(Å)

Procheck

(Å)

MolProbity

(Å)

RosettaHoles2

(Å)

AbrB N-terminal domain

Obsolete

1EKT

5.14

3.20

4.73

3.58

Current

1Z0R

2.68

1.95

3.76

2.62

Ets-1

Obsolete

1ETC

6.29

3.00

5.03

3.63

Current

1R36

2.77

1.78

3.53

2.74

CcmE

Obsolete

1LIZ

4.91

2.42

3.24

2.20

Current

1SR3

2.22

2.40

2.94

2.14

Domain IV from the YbbR

Obsolete

2KPS

3.22

2.05

2.90

2.66

Current

2L3U

2.86

1.75

2.78

2.62

SH3 of phospholipase C-gamma

Obsolete

1HSP

5.80

2.90

4.40

2.94

Current

2HSP

4.78

3.13

4.16

2.85

MRF-2 DNA-Binding Domain

Obsolete

1BMY

5.05

3.13

4.62

3.47

Current

1IG6

1.80

1.60

1.88

2.34

E. coli thioredoxin

Obsolete

1TRX

2.03

1.50

2.07

2.25

Current

1XOB

1.41

1.35

1.27

1.8



Table 2. Improvements in the quality of water refined models - Comparison between ResProx values and DRESS Z-scores.

 

 

Protein

PDB

Refined

DRESS

Z-score

ResProx

(Å)

Intestinal fatty acid-binding protein

1A57

-

-4.46

5.50

+

-2.72

2.56

Designed protein G core variant

1FD6

-

-1.4

2.49

+

0.33

1.42

Rho GDP-dissociation inhibitor

1AJW

-

-2.79

2.90

+

-1.24

2.03

Nudix enzyme hydrolase

1F3Y

-

-2.27

3.19

+

-1.31

2.29

MTH1175

1EO1

-

-3.31

3.69

+

-1.63

2.42



Table 3. Structure quality parameters used in the calculation of ResProx's equivalent resolution.

 

 

Score Name

Correlation Coefficient1

Logarithm form2

Lower Bound3

Upper Bound4

Z-score for Z-Mean5

Source

Description6

Standard deviation of χ1 pooled

0.78

Yes

0

25

Both

Vadar

Standard deviation of the χ1 angles among all 3 (gauche-, gauche+, and trans) configurations.

Clash score

0.77

Yes

0

250

Positive

MolProbity

Number of non-hydrogen bond atomic overlaps > 0.4 Å per thousand atoms.

Percentage of < 1% side-chain rotamer outliers

0.77

Yes

0

1200

Not used

MolProbity

Percentage of residues with side-chain rotamers that lie outside of 99% of side-chain rotamer distribution in Richardson penultimate rotamer library (Lovell et al. 2000).

Ramachandran outside most favored

0.77

Yes*

0.5

None

Positive

GeNMR

Percentage of residues outside of the most favored regions of the Ramachandran plot.

Ramachandran outliers

0.75

Yes

0

500

Positive

MolProbity

Fraction of residues in the Ramachandran plot that are stereo-chemically not allowed or not observed in high quality structures.

RosettaHoles score

0.71

No

None

None

Negative

Rosetta

A measure of underpacking in the protein core.

Mean trans χ1 angle

0.68

Yes

145

180

Not used

Vadar

Average of χ1 angles in trans configuration.

Deviation of Θ angles

0.68

Yes

10

35

Positive

PROSESS

Standard deviation of angle between the C-O bond vector of the H-bond acceptor and theO-H(N) bond vector.

Rama score

0.65

Yes

0

10

Not used

GeNMR

Fraction of residues in the most favored regions of the Ramachandran plot multiplied by a weighting coefficient.

Radius gyration score

0.53

Yes

0

900

Positive

GeNMR

Scaled difference between the expected radius of gyration and the observed one. The expected radius of gyration is determined using: Rg = 0.395*N**0.6 + 7.257.

χ1 score

0.43

No

-1.3

-0.6

Positive

PROSESS

Scaled difference between the standard deviation of the observed χ1 angles and the expected one obtained from high quality protein structures.

Score Name

Correlation Coefficient

Logarithm form

Lower Bound

Upper Bound

Z-score for Z-Mean

Source

Description

Percentage of 95% buried residues

0.42

No

0

2

Both

Vadar

Percentage of residues with fractional accessible areas < 0.05. This score reports the extent of residue burial. Most globular proteins must have a fraction >0.05 to be stable. Divided by the expected value.

Bump score

0.35

Yes

0

1

Positive

GeNMR

The bump score is calculated from the total number of non-bonded atom contacts below 1.3 Å, divided by the total number of non-bonded contacts in the protein.

Mean gau- χ1 angle

0.34

Yes

40

90

Not used

Vadar

This is the average χ1 angle for residues (excluding Proline) having χ1 angles that are closest to -60° (the gauche- conformation). Higher quality structures have χ1 angles very close to the canonical -60°, +60° and 180° values.

Mean H-bond energy

0.34

No

-2.5

-0.5

Both

Vadar

The average hydrogen bond energy is calculated using the H-bond energy function used in DSSP program.

Mean Κ (kappa) angle

0.33

Yes

15

50

Not used

PROSESS

Kappa angle measures the angle between the plane of the C=O peptide bond of the H-bond acceptor and the vector formed by the H-O bond of the H-bond donor. The closer the Kappa angle is to 25°, the better.

Percentage of packing defects

0.33

Yes

0

800

Positive

Vadar

This is the percentage of residues with fractional residue volumes greater than 1.20 or less than 0.80. Packing defects indicate the presence of cavities or compressions that are not natural.

Percentage of bad bond angles

0.29

Yes

0

45

Positive7

MolProbity

This parameter is calculated as the number of bond angles (divided by the total number of bond angles in the polypeptde) that exceed, by more than 5 standard deviations, the typical bond angles seen in high resolution, high quality structures.

Score Name

Correlation Coefficient

Logarithm form

Lower Bound

Upper Bound

Z-score for Z-Mean

Source

Description

Mean gau+ χ1 angle

0.28

No

-80

-50

Not used

Vadar

This is the average χ1 angle for residues (excluding Proline) having χ1 angles that are closest to 60° (the gauche+ conformation). Higher quality structures have χ1 angles very close to the canonical -60°, +60° and 180° values.

Percentage ogenerously allowed Ω angles

0.25

Yes

0

30

Positive

Vadar

This corresponds to the percentage of residues having Ω (omega) angles within 15° to 20° of the ideal trans (180°) and cis (0°).

Percentage oburied charges

0.15

Yes

0

45

Not used

Vadar

Percentage of charged residues that have fractional accessible areas below 0.05.

Deviation of Κ (kappa) angles

0.08

Yes

0

14

Not used

PROSESS

This parameter reports standard deviation of the angle between the plane of the C=O peptide bond of the H-bond acceptor and the vector formed by the H-O bond of the H-bond donor.

Percentage of disallowed Ω angles

0.08

Yes

0

20

Not used

Vadar

This corresponds to the percentage of residues having Ω (omega) angles more than 20° from the ideal trans (180° and cis (0°) values. Structures with a high proportion of residues with disallowed omega angles have poor geometry and stereo-chemistry.

Percentage of bad bond lengths

0.01

Yes

0

30

Positive6

MolProbity

This is calculated as the number of backbone bond lengths (divided by the total number of backbone distances in the polypeptde) that exceed, by more than 5 standard deviations, the typical bond lengths seen in high resolution, high quality structures.

Percentage of Ω angles < 90°

0.01

Yes

0

12

Not used

Vadar

Percentage of Ω (omega) angles below 90°. This identifies the fraction of residues that have a cis-peptide bond.

1 - Coefficient of correlation between the score and X-ray resolution for ResProx training set.

2 - This column specifies whether scores were used in its logarithm form ("Yes") or not ("No"). Star (*) indicates the scores, whose

logarithm was taken 16 times.

3,4 - Lower and upper bounds indicate the minimal and the maximal values, respectively, that scores were allowed to have in ResProx calculations.

5 - This column specifies whether a score Z-value was used for Z-Mean calculations and, if so, what score Z-value were considered: only positive, only negative, or both positive and negative (see text for more details).

6 - More information about scores can be found in corresponding publications and/or on websites of RosettaHoles (Sheffler and Baker 2009), PROSESS (Berjanskii et al. 2010), GeNMR(Berjanskii et al. 2009), and MolProbity (Chen et al. 2010; Davis et al. 2007).

7 - The percentages of bad bond lengths and bad bond angles are used only when their values exceed 4 standard deviatio


 

 

 

 

 

Figure 9. Resolution histogram of ResProx training/testing set. Proteins were grouped in 0.25Å bins. At least, 100 structures per resolution bin were placed in each bin, spanning the range between 1.0 Å and 3.75 Å.

 

 

 

 

Figure 10. Relationship between X-ray resolution and several ResProx protein quality scores for the ResProx training set. (A) Standard deviation of χ1 pooled from VADAR. (B) Clash Score from MolProbity; (C) Percent of <1% side-chain rotamer outliers from MolProbity.(D) RAMA score from GeNMR. (E) Ramachandran outliers from MolProbity. (F) RosettaHoles score. (G) Deviation of Kappa angles from PROSESS. (H) Percentage of disallowed Ω angles from VADAR.

 

 

 

Figure 11. Curve-fitting of a plot of X-ray resolution vs. average absolute Z score. Only the linear part of the plot, spanning the range of mean absolute Z-scores from 0 to 1.2 was used for curve-fitting. The curve-fitting was done with QtiPlot (Vasilief 2011).

 

 

 

 

 

Figure 12. GeNMR-based threshold for detecting poor-quality protein structures. The total GeNMR knowledge-based score, excluding radius of gyration score, is shown with blue diamonds for 50000 protein structures from the PDB. The solid line indicates selected threshold that separates 99.9% of the structures from a few poor-quality outliers.


 

 

 

 

Figure 13. Equivalent resolution of "intact" and "broken" models of obsolete NMR ensemble of the E. coli heme chaperone CcmE, 1LIZ. (A) "Intact" model 1 of 1LIZ. (B) "Broken" model 3 of 1LIZ. The misplaced Glu105 residue is colored green. Vectors of broken bonds between Glu105 and adjacent residues are shown with red lines. The figure was generated using MolMol (Koradi et al. 1996).


 

 

 

Figure 14. Histogram of ResProx equivalent resolution for NMR models and experimental resolution for X-ray structures. 500 NMR ensembles and 500 X-ray structures were randomly selected from the PDB.


 

 


References:

 

Berjanskii M, Liang Y, Zhou J, Tang P, Stothard P, Zhou Y, Cruz J, MacDonell C, Lin G, Lu P, Wishart DS (2010) PROSESS: a protein structure evaluation suite and server. Nucleic Acids Res 38 (Web Server issue):W633-640

Berjanskii M, Tang P, Liang J, Cruz JA, Zhou J, Zhou Y, Bassett E, MacDonell C, Lu P, Lin G, Wishart DS (2009) GeNMR: a web server for rapid NMR-based protein structure determination. Nucleic Acids Res 37 (Web Server issue):W670-677

Chen VB, Arendall WB, 3rd, Headd JJ, Keedy DA, Immormino RM, Kapral GJ, Murray LW, Richardson JS, Richardson DC (2010) MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr D Biol Crystallogr 66 (Pt 1):12-21

Davis IW, Leaver-Fay A, Chen VB, Block JN, Kapral GJ, Wang X, Murray LW, Arendall WB, 3rd, Snoeyink J, Richardson JS, Richardson DC (2007) MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Res 35 (Web Server issue):W375-383

Koradi R, Billeter M, Wuthrich K (1996) MOLMOL: a program for display and analysis of macromolecular structures. J Mol Graph 14 (1):51-55, 29-32

Lovell SC, Word JM, Richardson JS, Richardson DC (2000) The penultimate rotamer library. Proteins 40 (3):389-408

Sheffler W, Baker D (2009) RosettaHoles: rapid assessment of protein core packing for structure prediction, refinement, design, and validation. Protein Sci 18 (1):229-239

Vasilief I (2011) QtiPlot - Data Analysis and Scientific Visualisation. http://soft.proindependent.com/qtiplot.html, 0.9.8.4 edn.,

Willard L, Ranjan A, Zhang H, Monzavi H, Boyko RF, Sykes BD, Wishart DS (2003) VADAR: a web server for quantitative evaluation of protein structure quality. Nucleic Acids Res 31 (13):3316-3319