![]() |
From the Bax Group at the National Institutes of Health ...
TALOS+: Prediction of Protein Backbone Torsion Angles from NMR Chemical Shifts |
As described in the paper:
TALOS+: A hybrid method for predicting protein backbone torsion angles from NMR chemical shifts Yang Shen, Frank Delaglio, Gabriel Cornilescu, and Ad Bax, J. Biomol. NMR, 44, 213-223 (2009) |
TALOS+ was developed by Dr. Yang Shen in the Ad Bax group, and is now installed as part of the NMRPipe System. A detailed download and installation instructions can be found at NMRPipe download page. There is now a Web-Based version of TALOS+ which can be used directly without installing NMRPipe. A Java version viewer (JRAMA+) is also available to display TALOS+ results without installing NMRPipe. You can access this Web-based system, along with other facilities for manipulating chemical shifts, dipolar couplings, and molecular structures at the Bax Group NMR Server site:
TALOS+ replaces all earlier versions of TALOS, but is used in much the same way. The original TALOS web page can be found here.
Contents
What is TALOS+?
TALOS+ is a hybrid system for empirical prediction of protein phi and psi backbone torsion angles using a combination of six kinds (HN, HA, CA, CB, CO, N) of chemical shift assignments for a given residue sequence. TALOS+ is an enhanced version of the earlier TALOS system which improves upon the original TALOS database mining approach by including a neural network classification scheme, as well as a larger database of 200 proteins. This improved approach allows TALOS+ to make a larger number of useful backbone angle predictions, 88% of residues in a given protein on average. The original TALOS approach is an extension of the well-known observation that many kinds of secondary chemical shifts (i.e. differences between chemical shifts and their corresponding random coil values) are highly correlated with aspects of protein secondary structure. The goal of TALOS+ is to use secondary shift and sequence information in order to make quantitative predictions for the protein backbone angles phi and psi, and to provide a measure of the uncertainties in these predictions. In the original TALOS approach, we search a high-resolution structural database for the 10 best matches to the secondary chemical shifts of given residue in a target protein along with its two flanking neighbors (a residue triplet). If there is a consensus of phi and psi angles among the 10 best database matches, then we use these database triplet structures to form a prediction for the backbone angles of the target residue. The TALOS+ approach adds an artificial neural network (ANN) classification scheme to this database mining approach. The neural network analyzes the chemical shifts and sequence to estimate the likelihood of a given residue being in a sheet, helix, or loop conformation. This ANN classification information is combined with the database mining results to increase the number of residues where useful backbone angle predictions can be made. In addition, TALOS+ also offers several new
features compared to the original TALOS program:
A flowchart for TALOS+ database search procedure is shown below:
As with TALOS, the reliability of the TALOS+ approach was tested by a cross-validation "leave-one-out" procedure where each protein was removed from the database, and its phi and psi angles were predicted using the remaining protein data. For the purposes of testing, a prediction was considered "Good" if it fell in the same well-populated region of the Ramachandran map as the phi and psi values from the crystal structure. Conversely, a prediction was considered "Bad" or incorrect if it greatly deviated from the observed phi or psi angles from the crystal structure (see definition here). According to the tests:
As noted in (2) above, it must be remembered that TALOS+ will produce a small number of predictions which seem to be valid (because the best matches from the database are consistent) but which are nevertheless in error. It should also be noted that the tests above included only the most
well-defined parts of each protein; roughly 6% of the residues had first been
removed because they had high B factors (exceeding 1.5 times the average
B-factor for that protein) in the crystal structure or (for the original 78
TALOS proteins) because
they were known to be highly mobile in solution . Evaluation of the results indicates that
many of the "erroneous" predictions occur outside of regions of secondary
structure, where the X-ray and solution structures may actually differ from one
another, as evidenced by large differences between X-ray structures when
multiple such structures are available for the same protein. Therefore, the accuracy of TALOS+ will vary from protein to protein,
and tends to be lower for proteins with large flexible regions. A partial remedy
is to increase the S2 threshold for "dynamic" residues to 0.65, but this will
decrease the number of consensus predictions made. Components of the TALOS+ System The TALOS+ core database search system is implemented in the C++ language, and includes a graphical interface to inspect the prediction results. The graphical interface, called RAMA+, is implemented in the TCL/TK scripting language via the NMRPipe TCL interpreper called nmrWish.
The TALOS+ files are installed into a There are two major scripts comprising the TALOS+ system:
-help command-line
argument to generate a complete list of options. For backward-compatibility,
the script names: talos+ talos+.tcl talos.tcl can all used to
run TALOS+, and the script names: rama+ rama+.tcl rama.tcl
can all be used to run RAMA+.
Other files of the TALOS+ system include:
Use of TALOS+ is much the same as for earlier versions of TALOS:
Chemical shift data pre-check TALOS+ includes a feature that pre-checks chemical shift referencing and possible chemical shift errors talos+ -in myshifts.tab -check It checks the referencing for 13CA, 13CB, 1HA and 13C' chemical shifts, using the empirical correlation between certain sets of chemical shifts data (Wang et al., 2005 J Biol NMR, 32:13-22). The estimated chemical shift referencing offsets, as well as the chemical shifts which largely deviate from their expected ranges, will be printed with the following format: Chemical shift outlier checking... ... 64 E CB Secondary Shift: -3.800 Limit: -3.765 76 G C Secondary Shift: 4.250 Limit: 1.925 ! Chemical shift referencing checking... Estimated Referencing Offset for CA/CB: 0.795 +/- 0.104 ppm (Size: 66) Note that (1) a chemical shift referencing correction is likely required when ever the estimated referencing error approaches the average uncertainty in the database chemical shifts (~1.0 ppm for 13CA/CB and 13C' shifts; ~0.3 ppm for 1HA shifts), and/or the estimated referencing error larger than five times the average fitting errors; (2) chemical shift outliers, which fall far outside (>2-3 times of) the expected range of secondary chemical shifts (and marked by "!"), are unlikely to be correct (or like in the above example correspond to a C-terminal carboxylate instead of a backbone carbonyl) and need to be checked carefully. TALOS+ uses an option "-offset" to automatically apply chemical shift offset correction if needed: talos+ -in myshifts.tab -offset
talos+ -in myshifts.tab -iso
Exclusion of proteins from the database Excluding one or more proteins from the database during the TALOS+ database search can be performed by a command line such as: talos+ -in myshifts.tab -excl name1 name2 ... where "name1" and "name2" etc. are the names of the proteins to be excluded ( see the valid protein names in the database "talos.tab"). Chemical Shift Input Format Used by TALOS An example portion of the required shift table format is shown below. Full Example: ubiq.tab. Other examples can be found in the talosplus/shifts and talosplus/demo directories of an NMRPipe installation, or at the TALOS Server site. Specifically:
Example shift table (excerpt): REMARK Ubiquitin input for TALOS, HA2/HA3 assignments arbitrary. DATA FIRST_RESID 1 DATA SEQUENCE MQIFVKTLTG KTITLEVEPS DTIENVKAKI QDKEGIPPDQ QRLIFAGKQL DATA SEQUENCE EDGRTLSDYN IQKESTLHLV LRLRGG VARS RESID RESNAME ATOMNAME SHIFT FORMAT %4d %1s %4s %8.3f 1 M HA 4.23 1 M C 170.54 1 M CA 54.45 1 M CB 33.27 2 Q HN 8.90 2 Q N 123.22 2 Q HA 5.25 2 Q C 175.92 2 Q CA 55.08 2 Q CB 30.76 ... Inspecting and Refining the Prediction Results The final step in interpreting the results of the TALOS+ database search is to inspect and classify the matches so that useful predictions can be formed; however, in most cases, the initial automated classifications performed by the current version of the TALOS+ program should be acceptable with no manual adjustment needed. Refinement of predictions can be done via the graphical interface rama+, which is included in the package, or a web-based Java version of the RAMA+ Viewer (JRAMA+) The simplest invocation of rama+ is: rama+ -in myshifts.tab If a proposed structure is available, first run TALOS+ with it to generate a prediction summary: talos+ -in myshifts.tab -ref mystruct.pdb Then, invoke RAMA so that the reference structure is included in the display of prediction data: rama+ -in myshifts.tab -ref mystruct.pdb The various windows displayed by rama+ are shown below.
Sequence Window: displays the target protein sequence, with each residue colored according to its classification. Clicking on a residue with the mouse will select that residue for display and analysis in the other windows. The residues are colored according to this scheme:
Prediction Window: lists the statistics of the 10 best database matches for the currently selected residue in the target protein. The individual entries in this window can be toggled by a mouse click, to include or remove a particular match from the prediction.
Ramachandran window: graphs the phi/psi distributions of the 10 best database matches for the currently selected residue. It also displays the average and standard deviation of phi and psi for those matches which are selected (i.e. included in the prediction), as well ANN-predicted probability to find any given residue in the Alpha, Beta, or Positive-phi region. The shaded region of the map shows the most populated regions of the TALOS+ database for the residue type in question. In the graph, each match from the database is drawn as a small square at a particular phi/psi coordinate. The individual squares can be toggled by a mouse click, to include or remove the corresponding match from the prediction. The squares are colored according to this scheme:
The Ramachandran window also includes buttons to reclassify the overall prediction as "Good", "Ambiguous", etc., and to move to the next or previous residue in the sequence.
Secondary Structure and RCI-S2 Prediction Window: graphs the predicted order parameter S2 (upper panel) and ANN-predicted secondary structure (lower panel; aqua, beta-sheet; red, helix) for all residues. The height of the bars reflects the probability of the neural network secondary structure prediction. The RCI-S2 value and the probabilities of the 3-state [helix|sheet|loop] secondary structure prediction for the current residue (indicated by yellow vertical lines) are labeled above the corresponding panel, followed by the S2 and secondary structure probabilities for the "cursor-activated" residue (indicated by white vertical lines, not visible in this figure). For more about the RCI method for predicting order parameter from chemical
shifts, see:
Secondary Shift Window: (optional with the "-sd" argument) graphs the secondary shift distributions of the 10 best database matches for the currently selected residue.
Molecular Viewer Window: (optional with the "-ras" argument) Displays the three-dimensional structure given by the "-ref" argument, colorized according to the residue classification scheme above. This option assumes that the program RasMol is available as a viewer. For more about the RasMol molecular viewer program, see:
How to Select Consistent Predictions The old TALOS rules for defining consistent ("Good") predictions are based on
clustering of at least 9 out of the 10 best database matches in the same
region of the Ramachandran map. The TALOS+ rules for defining consistent
("Good") predictions are similar but slightly more strict:
All the cases with predicted S2 value <0.5 are likely to be "Dynamic", and will not be considered as unambiguous predictions. All other cases are considered "Ambiguous". When a reference structure is available, predictions will be flagged as "Bad" (automatically by talos+) if either of the following conditions applies:
Cases where |Phi(obs) - Phi(pred) + Psi(obs) - Psi(pred)| < 60 cause the peptide chain to continue in roughly the correct direction, and larger tolerance limits (up to +/-90 degrees) are accepted for phi and psi in these cases. In practice, this usually means that the standard deviation of phi and psi for the selected group of matches will be 35 degrees or less (12-13 degrees on average). When inspecting the phi/psi graphs to decide if matches are in a consistent region, keep in mind their "periodic" nature; i.e. angles at one edge of the graph are actually close to angles at the opposite edge. What About the Older, Original Version of TALOS? The original version of TALOS is still installed along with NMRPipe
for backward compatibility reasons. The TALOS files are installed in
the talos.tcl -old -in csObs.tab vina.tcl -old -in csObs.tab -ref xray.pdb -AUTO rama.tcl -old -in csObs.tab -ref xray.pdb -ras -sd
If the The original TALOS web page can be found here: http://spin.niddk.nih.gov/bax/software/TALOSORIG |