TALOS+ Protein Backbone Dihedral Angle Prediction Program

From the Bax Group at the National Institutes of Health ...
TALOS+: Prediction of Protein Backbone Torsion Angles from NMR Chemical Shifts

As described in the paper:
TALOS+: A hybrid method for predicting protein backbone torsion angles from NMR chemical shifts
Yang Shen, Frank Delaglio, Gabriel Cornilescu, and Ad Bax, J. Biomol. NMR, 44, 213-223 (2009)

Contact:	shenYang@niddk.nih.gov bax@nih.gov
Web:	https://spin.niddk.nih.gov/bax-apps/software/TALOS
Server:	https://spin.niddk.nih.gov/bax-apps/nmrserver/talos
JRAMA+ Viewer:	https://spin.niddk.nih.gov/bax-apps/software/TALOS+/JRAMA+/

How to Get TALOS+

TALOS+ was developed by Dr. Yang Shen in the Ad Bax group, and is now installed as part of the NMRPipe System. A detailed download and installation instructions can be found at NMRPipe download page.

There is now a Web-Based version of TALOS+ which can be used directly without installing NMRPipe. A Java version viewer (JRAMA+) is also available to display TALOS+ results without installing NMRPipe. You can access this Web-based system, along with other facilities for manipulating chemical shifts, dipolar couplings, and molecular structures at the Bax Group NMR Server site:

https://spin.niddk.nih.gov/bax-apps/nmrserver

TALOS+ replaces all earlier versions of TALOS, but is used in much the same way. The original TALOS web page can be found here.

Contents

What is TALOS+?
Reliability of TALOS+
Components of the TALOS+ System
How to Use TALOS+
Chemical Shift Input Format Used by TALOS
Inspecting and Refining the Prediction Results
How to Select Consistent Predictions
What About the Older, Original Version of TALOS?

What is TALOS+?

TALOS+ is a hybrid system for empirical prediction of protein phi and psi backbone torsion angles using a combination of six kinds (HN, HA, CA, CB, CO, N) of chemical shift assignments for a given residue sequence.

TALOS+ is an enhanced version of the earlier TALOS system which improves upon the original TALOS database mining approach by including a neural network classification scheme, as well as a larger database of 200 proteins. This improved approach allows TALOS+ to make a larger number of useful backbone angle predictions, 88% of residues in a given protein on average.

The original TALOS approach is an extension of the well-known observation that many kinds of secondary chemical shifts (i.e. differences between chemical shifts and their corresponding random coil values) are highly correlated with aspects of protein secondary structure. The goal of TALOS+ is to use secondary shift and sequence information in order to make quantitative predictions for the protein backbone angles phi and psi, and to provide a measure of the uncertainties in these predictions. In the original TALOS approach, we search a high-resolution structural database for the 10 best matches to the secondary chemical shifts of given residue in a target protein along with its two flanking neighbors (a residue triplet). If there is a consensus of phi and psi angles among the 10 best database matches, then we use these database triplet structures to form a prediction for the backbone angles of the target residue.

The TALOS+ approach adds an artificial neural network (ANN) classification scheme to this database mining approach. The neural network analyzes the chemical shifts and sequence to estimate the likelihood of a given residue being in a sheet, helix, or loop conformation. This ANN classification information is combined with the database mining results to increase the number of residues where useful backbone angle predictions can be made.

In addition, TALOS+ also offers several new features compared to the original TALOS program:

In order to expand the program's ability to predict backbone torsion angles, TALOS+ now also considers the frequently encountered cases where residue assignments are lacking. Although the fraction of such residues for which unambiguous predictions can be made tends to be significantly lower, the reliability of such predictions remains high.
For convenience, and in order to prevent assignment of backbone torsion angles to regions that are dynamically disordered, TALOS+ also reports an estimated backbone order parameter S² derived from the chemical shifts in a way recently described by Berjanskii and Wishart (J. Am. Chem. Soc. 127: 14970-14971).
TALOS+ also provides ANN-predicted secondary structure information from the chemical shifts, with about 89% prediction accuracy.

A flowchart for TALOS+ database search procedure is shown below:

TALOS flowchart

top

Reliability of TALOS+

As with TALOS, the reliability of the TALOS+ approach was tested by a cross-validation "leave-one-out" procedure where each protein was removed from the database, and its phi and psi angles were predicted using the remaining protein data. For the purposes of testing, a prediction was considered "Good" if it fell in the same well-populated region of the Ramachandran map as the phi and psi values from the crystal structure. Conversely, a prediction was considered "Bad" or incorrect if it greatly deviated from the observed phi or psi angles from the crystal structure (see definition here). According to the tests:

TALOS+ makes consistent predictions for, on average, for about 88% of the residues.
(IMPORTANT!) Over all 200 database proteins, about 2.5% of the unambiguous predictions made by TALOS+ were incorrect relative to the corresponding crystal structure. However, a substantial fraction of this 2.5% appears to reflect genuine differences relative to the crystalline state, and the true error rate therefore is believed to be below 2.5%.
On average, the uncertainty as reported by TALOS+ for the consensus predictions was 12.6 degrees for phi, and 12.3 degrees for psi.
The actual RMSD of the "correct" predictions relative to the crystal structures was about 13.5 degrees for phi, and 12.9 degrees for psi.

As noted in (2) above, it must be remembered that TALOS+ will produce a small number of predictions which seem to be valid (because the best matches from the database are consistent) but which are nevertheless in error.

It should also be noted that the tests above included only the most well-defined parts of each protein; roughly 6% of the residues had first been removed because they had high B factors (exceeding 1.5 times the average B-factor for that protein) in the crystal structure or (for the original 78 TALOS proteins) because they were known to be highly mobile in solution . Evaluation of the results indicates that many of the "erroneous" predictions occur outside of regions of secondary structure, where the X-ray and solution structures may actually differ from one another, as evidenced by large differences between X-ray structures when multiple such structures are available for the same protein. Therefore, the accuracy of TALOS+ will vary from protein to protein, and tends to be lower for proteins with large flexible regions. A partial remedy is to increase the S2 threshold for "dynamic" residues to 0.65, but this will decrease the number of consensus predictions made.

top

Components of the TALOS+ System

The TALOS+ core database search system is implemented in the C++ language, and includes a graphical interface to inspect the prediction results. The graphical interface, called RAMA+, is implemented in the TCL/TK scripting language via the NMRPipe TCL interpreper called nmrWish.

The TALOS+ files are installed into a talosplus subdirectory of an NMRPipe installation. The NMRPipe initialization commands will establish an environment variable TALOSP_DIR which will give the full path to the talosplus directory.

There are two major scripts comprising the TALOS+ system:

TALOS+ (script name: talos+)
Performs the database search and secondary structure classifications, then summarizes the predictions.
RAMA+ (script name: rama+)
Interactive display and refinement of the predictions.

Both of these scripts can be invoked with the -help command-line argument to generate a complete list of options. For backward-compatibility, the script names: talos+ talos+.tcl talos.tcl can all used to run TALOS+, and the script names: rama+ rama+.tcl rama.tcl can all be used to run RAMA+.

Other files of the TALOS+ system include:

talosplus/demo
A directory with example chemical shift input data and scripts for a demo of TALOS+.

talosplus/tab/talos.tab
The compiled database of residue triplets with their corresponding secondary shifts and PHI/PSI values.

talosplus/tab/randcoil.tab
The table of random coil shifts used in the prediction process.

talosplus/tab/homology.tab
The residue type homology factors used in the prediction process.

talosplus/tab/weight.tab
The weighting factors of the 18 secondary shifts used in the prediction process.

talosplus/tab/*level*.tab
The weighting factors and biases of the neural network used in the prediction process.

talosplus/bin/TALOS+.*
The compiled TALOS+ binary files for multiple platforms, such as Linux (TALOS+.linux), MacOS (TALOS+.mac), SGI (TALOS+.sgi6x) and WindowsXP (TALOS+.winxp).

talosplus/rama.gif
Image of the populated regions of the TALOS/TALOS+ database, used as a background for the Ramachandran plot display.

talosplus/com
Contains some example utility scripts for file format conversion, which can be copied to a working directory and used as needed:

talos2dyana.com
This script generates a Dyana/Cyana format torsion angle restraint file from a standard TALOS/TALOS+ output file.
talos2xplor.com
This script generates an XPLOR format torsion angle restraint file from a standard TALOS/TALOS+ output file.

The standard NMRPipe installation also includes scripts star2cs.tcl and shift2tab.tcl for converting NMR-Star and PIPP format shifts to TALOS input format, talos2xplor.tcl and talos2dyn.tcl for converting TALOS+ prediction output to XPLOR and DYNAMO torsion restraint format.

top

How to Use TALOS+

Use of TALOS+ is much the same as for earlier versions of TALOS:

Create a directory for the prediction session; all subsequent commands will be executed from this directory.
Prepare the input table of shift assignments (for example "myshifts.tab"), according to the format given below.
Run TALOS+ (talos+) to perform the database searches. Most commonly, this will simply require a command such as:
```
talos+ -in myshifts.tab
```
During the database search, a summary file "predAll.tab" will be created to store the 10 best database matches for all residues in the target protein. Before exiting, a file "pred.tab" will also be created, which includes an initial summary of the prediction results. Additionally, three files "predAdjCS.tab", "predABP.tab" and "predSS.tab" will be created to store the calculated secondary chemical shifts used for prediction, the ANN-predicted 3-state phi/psi distribution (Alpha, Beta and Positive-Phi) information and the predicted secondary structure, respectively. The database search will typically take about 15-20 sec per 100 residues.

In the original TALOS System, the classification step was performed by the VINA application (vina.tcl). This classification is now part of the TALOS+ database search procedure, and the VINA application is no longer used.
Run RAMA (rama+) or JRAMA+ to inspect and adjust the predictions. The simplest RAMA+ invocations are:
```
rama+ -in myshifts.tab
rama+ -in myshifts.tab -ref mystruct.pdb
```
During this inspection, you will:
- Examine the phi/psi distributions of the center residues of the best 10 database matches for a given query residue, and decide which ones should be included in the prediction, and which are "outliers". (NOTE: in the vast majority of cases, the initial automated classifications performed by the current version of the TALOS+ program should be acceptable with no manual adjustment needed).
- Classify the results for a given residue as "Good", "Ambiguous", or (if a reference structure is known) "Bad".
The file "predAll.tab" will be adjusted along the way to reflect any changes made interactively, and a new "pred.tab" summary file will be created on exiting. When the above steps are completed, the final "pred.tab" file will include the classification ("Good" etc) and predictions (averages and standard deviations) for phi and psi at each residue.
Convert TALOS+ results to other formats, for use as structural restraints, etc. TALOS+ package includes shell scripts such as "talos2dyana.com" and "talos2xplor.com" for this purpose, examples for using them are:
```
$TALOSP_DIR/com/talos2dyana.com pred.tab > talos.aco
$TALOSP_DIR/com/talos2xplor.com pred.tab > talos.tbl 
```
JRAMA+ offers similar features in its menu bar ("Tools").

Chemical shift data pre-check

TALOS+ includes a feature that pre-checks chemical shift referencing and possible chemical shift errors

   talos+ -in myshifts.tab -check

It checks the referencing for 13CA, 13CB, 1HA and 13C' chemical shifts, using the empirical correlation between certain sets of chemical shifts data (Wang et al., 2005 J Biol NMR, 32:13-22). The estimated chemical shift referencing offsets, as well as the chemical shifts which largely deviate from their expected ranges, will be printed with the following format:


   Chemical shift outlier checking...
     ...
     64 E CB Secondary Shift: -3.800 Limit: -3.765
     76 G  C Secondary Shift:  4.250 Limit:  1.925 !

   Chemical shift referencing checking...
      Estimated Referencing Offset for CA/CB: 0.795 +/- 0.104 ppm (Size: 66)

Note that (1) a chemical shift referencing correction is likely required when ever the estimated referencing error approaches the average uncertainty in the database chemical shifts (~1.0 ppm for 13CA/CB and 13C' shifts; ~0.3 ppm for 1HA shifts), and/or the estimated referencing error larger than five times the average fitting errors; (2) chemical shift outliers, which fall far outside (>2-3 times of) the expected range of secondary chemical shifts (and marked by "!"), are unlikely to be correct (or like in the above example correspond to a C-terminal carboxylate instead of a backbone carbonyl) and need to be checked carefully.

TALOS+ uses an option "-offset" to automatically apply chemical shift offset correction if needed:

   talos+ -in myshifts.tab -offset

and an option "-iso" to apply 2H Isotope correction to CA/CB chemical shifts collected from a perdeuterated protein sample:

   talos+ -in myshifts.tab -iso

Exclusion of proteins from the database

Excluding one or more proteins from the database during the TALOS+ database search can be performed by a command line such as:

   talos+ -in myshifts.tab -excl name1 name2 ...

where "name1" and "name2" etc. are the names of the proteins to be excluded ( see the valid protein names in the database "talos.tab").

top

Chemical Shift Input Format Used by TALOS

An example portion of the required shift table format is shown below. Full Example: ubiq.tab. Other examples can be found in the talosplus/shifts and talosplus/demo directories of an NMRPipe installation, or at the TALOS Server site. Specifically:

The TALOS chemical shift table uses the general-purpose NMRPipe table format.
13C chemical shifts for CA, CB, and CO used as input for TALOS/TALOS+ should be referenced relative to TSP. The 15N chemical shifts used as input for TALOS/TALOS+ should be referenced relative to liquid ammonia at 25 degrees C.
Use the optional DATA FIRST_RESID line to specify the first residue ID number of the sequence. By default, residue numbering is assumed to begin at 1.
The protein sequence should be given as shown, using one or more DATA SEQUENCE lines. Space characters in the sequence will be ignored. Use "c" for oxidized CYS (CB ~ 42.5 ppm) and "C" for reduced CYS (CB ~ 28 ppm), "h" for protonated HIS and and "H" for unprotonated HIS, in both the sequence header and the shift table. Use X for residues other than the usual 20 amino acids.
The table must include columns for residue ID, one-character residue name, atom name, and chemical shift.
The table must include a "VARS" line which labels the corresponding columns of the table.
The table must include a "FORMAT" line which defines the data type of the corresponding columns of the table.

Atom names are always given exactly as:

HA for H-alpha of all residues except glycine

HA2 for the first H-alpha of glycine residues

HA3 for the second H-alpha

C for C' (CO)

CA for C-alpha

CB for C-beta

N for N-amide

HN for H-amide

As noted, there is an exception for naming glycine assignments, which should use HA2 and HA3 instead of HA. In the case of glycine HA2/HA3 assignments, TALOS/TALOS+ will use the average value of the two, so that it is not necessary to have these assigned stereo specifically ; for use of TALOS/TALOS+, the assignment can be arbitrary. Note however that the assignment must be given exactly as either "HA2" or "HA3" rather than "HA2|HA3" etc.
Other types of assignments may be present in the shift table; they will be ignored.
TALOS now also has the option to use chemical shift input in the BMRB NMR-Star format. If NMR-Star format input is used, the input must contain shifts for a single protein chain only. It must also contain complete sequence information for the protein. Specifically, the NMR-Star format table must contain a sequence section with _Residue_seq_code and _Residue_label values, and a chemical shift section with values for _Residue_seq_code _Residue_label _Atom_name _Atom_type and _Chem_shift_value. Example: ubiq_bmr6457_1D3Z.str.

Example shift table (excerpt):

REMARK Ubiquitin input for TALOS, HA2/HA3 assignments arbitrary.

   DATA FIRST_RESID 1

   DATA SEQUENCE MQIFVKTLTG KTITLEVEPS DTIENVKAKI QDKEGIPPDQ QRLIFAGKQL
   DATA SEQUENCE EDGRTLSDYN IQKESTLHLV LRLRGG

   VARS   RESID RESNAME ATOMNAME SHIFT
   FORMAT %4d   %1s     %4s      %8.3f

     1 M           HA                  4.23
     1 M           C                 170.54
     1 M           CA                 54.45
     1 M           CB                 33.27
     2 Q           HN                  8.90
     2 Q           N                 123.22
     2 Q           HA                  5.25
     2 Q           C                 175.92
     2 Q           CA                 55.08
     2 Q           CB                 30.76
   ...

 
top

Inspecting and Refining the Prediction Results

The final step in interpreting the results of the TALOS+ database search is to inspect and classify the matches so that useful predictions can be formed; however, in most cases, the initial automated classifications performed by the current version of the TALOS+ program should be acceptable with no manual adjustment needed.

Refinement of predictions can be done via the graphical interface rama+, which is included in the package, or a web-based Java version of the RAMA+ Viewer (JRAMA+) The simplest invocation of rama+ is:

   rama+ -in myshifts.tab

If a proposed structure is available, first run TALOS+ with it to generate a prediction summary:

   talos+ -in myshifts.tab -ref mystruct.pdb

Then, invoke RAMA so that the reference structure is included in the display of prediction data:

   rama+ -in myshifts.tab -ref mystruct.pdb

The various windows displayed by rama+ are shown below.

TALOS Sequence Window

Sequence Window: displays the target protein sequence, with each residue colored according to its classification. Clicking on a residue with the mouse will select that residue for display and analysis in the other windows. The residues are colored according to this scheme:

Green Unambiguous/Good prediction (no outlier)

Yellow Ambiguous; no prediction

Blue Dynamic; no prediction

Red Bad prediction relative to a known structure

Gray No classification yet

TALOS Prediction Window

Prediction Window: lists the statistics of the 10 best database matches for the currently selected residue in the target protein. The individual entries in this window can be toggled by a mouse click, to include or remove a particular match from the prediction.

TALOS Ramachandran window

Ramachandran window: graphs the phi/psi distributions of the 10 best database matches for the currently selected residue. It also displays the average and standard deviation of phi and psi for those matches which are selected (i.e. included in the prediction), as well ANN-predicted probability to find any given residue in the Alpha, Beta, or Positive-phi region. The shaded region of the map shows the most populated regions of the TALOS+ database for the residue type in question.

In the graph, each match from the database is drawn as a small square at a particular phi/psi coordinate. The individual squares can be toggled by a mouse click, to include or remove the corresponding match from the prediction. The squares are colored according to this scheme:

Green This match is included in the prediction.

Red Outlier; not included in the prediction

Blue Reference (phi/psi taken from "-ref" structure)

The Ramachandran window also includes buttons to reclassify the overall prediction as "Good", "Ambiguous", etc., and to move to the next or previous residue in the sequence.

TALOS Secondary Structure Prediction Window

Secondary Structure and RCI-S² Prediction Window: graphs the predicted order parameter S² (upper panel) and ANN-predicted secondary structure (lower panel; aqua, beta-sheet; red, helix) for all residues. The height of the bars reflects the probability of the neural network secondary structure prediction. The RCI-S² value and the probabilities of the 3-state [helix|sheet|loop] secondary structure prediction for the current residue (indicated by yellow vertical lines) are labeled above the corresponding panel, followed by the S² and secondary structure probabilities for the "cursor-activated" residue (indicated by white vertical lines, not visible in this figure).

For more about the RCI method for predicting order parameter from chemical shifts, see:
Berjanskii MV and Wishart DS (2005) A simple method to predict protein flexibility using secondary chemical shifts. J. Am. Chem. Soc. 127: 14970-14971

TALOS Secondary Shift Window

Secondary Shift Window: (optional with the "-sd" argument) graphs the secondary shift distributions of the 10 best database matches for the currently selected residue.

TALOS Molecular Viewer Window

Molecular Viewer Window: (optional with the "-ras" argument) Displays the three-dimensional structure given by the "-ref" argument, colorized according to the residue classification scheme above. This option assumes that the program RasMol is available as a viewer.

For more about the RasMol molecular viewer program, see:

Roger Sayle and E. James Milner-White
RasMol: Biomolecular graphics for all
Trends in Biochemical Sciences (TIBS), September 1995, Vol. 20, No. 9, p. 374.
http://www.umass.edu/microbio/rasmol

top

How to Select Consistent Predictions

The old TALOS rules for defining consistent ("Good") predictions are based on clustering of at least 9 out of the 10 best database matches in the same region of the Ramachandran map. The TALOS+ rules for defining consistent ("Good") predictions are similar but slightly more strict:

All 10 best database matches fall in a "consistent" region of the Ramachandran map, i.e., in a consistent Alpha, Beta or Positive-Phi region, and
The confidence of the ANN 3-state Phi/Psi distribution prediction for this residue (defined as the difference between the probabilities of the two most favored predicted states) must be above 0.6. (0.7 for residues with "Positive-Phi" prediction), and
The RCI-predicted order parameter S² value > 0.5.

All the cases with predicted S² value <0.5 are likely to be "Dynamic", and will not be considered as unambiguous predictions.

All other cases are considered "Ambiguous".

When a reference structure is available, predictions will be flagged as "Bad" (automatically by talos+) if either of the following conditions applies:

[ |Phi(obs) - Phi(pred)| > 60 or |Psi(obs) - Psi(pred)| > 60] and [ |Phi(obs) - Phi(pred) + Psi(obs) - Psi(pred)| > 60]
|Phi(obs) - Phi(pred)| > 90 or |Psi(obs) - Psi(pred)| > 90

Cases where |Phi(obs) - Phi(pred) + Psi(obs) - Psi(pred)| < 60 cause the peptide chain to continue in roughly the correct direction, and larger tolerance limits (up to +/-90 degrees) are accepted for phi and psi in these cases.

In practice, this usually means that the standard deviation of phi and psi for the selected group of matches will be 35 degrees or less (12-13 degrees on average).

When inspecting the phi/psi graphs to decide if matches are in a consistent region, keep in mind their "periodic" nature; i.e. angles at one edge of the graph are actually close to angles at the opposite edge.

top

What About the Older, Original Version of TALOS?

The original version of TALOS is still installed along with NMRPipe for backward compatibility reasons. The TALOS files are installed in the talos directory, which is specified by the TALOS_DIR environment variable. The components of the original TALOS, (TALOS, VINA, and RAMA) can be accessed by including the -old command-line flag, for example:


   talos.tcl -old -in csObs.tab
   vina.tcl  -old -in csObs.tab -ref xray.pdb -AUTO
   rama.tcl  -old -in csObs.tab -ref xray.pdb -ras -sd

If the -old command-line flag is not used, the newer TALOS+ versions will be executed instead. Note that the VINA application is no longer used in the TALOS+ system.

The original TALOS web page can be found here: https://spin.niddk.nih.gov/bax-apps/software/TALOSORIG

top

* All documents in PDF format require the free Adobe Acrobat Reader application for viewing

[ Home ] [ NIH ] [ NIDDK ] [ Terms of Use ]
last update: May 23 2012 / sy