Important Note: The web page describes the original version of TALOS, which has been replaced by the improved TALOS+ system, now the default version. The older version of TALOS is provided along with TALOS+ for backward-compatibility. This web page describes the use of the older version of TALOS. For information about the newer, improved version, TALOS+ see:

      https://spin.niddk.nih.gov/bax-apps/software/TALOS

The components of the older version of TALOS can now be accessed by including the -old option on the command line; if the -old option is not included, components from the newer TALOS+ will be used instead:


     talos.tcl -old ...
     vina.tcl  -old ...
     rama.tcl  -old ...

TALOS: Torsion Angle Likelihood Obtained from Shift and sequence similarity

As described in the paper:

Backbone angle restraints from searching a database for chemical shift and sequence homology
Gabriel Cornilescu, Frank Delaglio, and Ad Bax
J. Biomol. NMR, 13, 289-302 (1999).

The TALOS software is part of the NMRPipe package, download information can be found at: https://spin.niddk.nih.gov/bax-apps/NMRPipe

The current TALOS database includes 186 proteins, providing more than 24,000 residue triplets.

What is TALOS?
Reliability of TALOS
Components of the TALOS System
How to Use TALOS
Preparing the Input Shift Data
Inspecting and Refining the Prediction Results
How to Select "Good" Predictions
Adding New Proteins to the Database
About the Name TALOS

What is TALOS?

TALOS is a database system for empirical prediction of phi and psi backbone torsion angles using a combination of five kinds (HA, CA, CB, CO, N) of chemical shift assignments for a given protein sequence. The TALOS approach is an extension of the well-known observation that many kinds of secondary chemical shifts (i.e. differences between chemical shifts and their corresponding random coil values) are highly correlated with aspects of protein secondary structure. The goal of TALOS is to use secondary shift and sequence information in order to make quantitative predictions for the protein backbone angles phi and psi, and to provide a measure of the uncertainties in these predictions.

CA and CB chemical shifts distribution

TALOS uses the secondary shifts of a given residue to predict phi and psi angles for that residue. TALOS also includes the information from the next and previous residues when making predictions for a given residue. So, in practice, TALOS uses data for three consecutive residues simultaneously (i.e. 15 total secondary shifts and 3 residue types) to make predictions for the central residue in a triplet.

The idea behind TALOS is that if one can find some triplet of residues in a protein of known structure with similar secondary shifts and sequence to a triplet in a target protein, then the phi and psi angles in the known structure will be useful predictors for the angles in the target.

protein backbone cartoon

The similarity is measured with a score based on the weighted sum of squares differences between the shifts in the target protein and the database entries, so that lower scores indicated high similarity. In order to take advantage of the correlations between residue type and secondary structure, the score also includes a small, qualitative residue type term which biases the matching towards roughly similar sequences.

In practice, TALOS searches a database for the 10 best matches to a given triplet in the target protein. If these 10 matches indicate consistent values for phi and psi, then their averages and standard deviations are used as a prediction. However, if the 10 best matches have mutually inconsistent values of phi and psi, the matches are declared ambiguous, and no prediction is made for the central residue. In the TALOS approach, an initial classification of good vs ambiguous is performed automatically, and the classifications are then adjusted interactively through a graphical interface which is part of the TALOS system.

The TALOS database, while small, was constructed using the most well-defined parts of high resolution (2.2 Angstroms or better) X-ray crystal structures to define the phi and psi angles. It originally included data from 21 proteins, representing around 3,000 triplets. The current database includes data from 186 proteins, representing over 24,000 triplets.

Reliability of TALOS

The reliability of the TALOS approach was tested by a cross-validation procedure where each protein was temporarily removed from the database, and its phi and psi angles were predicted using the remaining protein data. For the purposes of testing, a prediction was considered "good" if it fell in the same well-populated region of the ramachandran map as the phi and psi values from the crystal structure. Conversely, a prediction was considered "bad" or incorrect if it fell in a different region than the crystal structure. According to the tests:

TALOS makes no predictions for 20% to 45% of the residues in a protein.

TALOS makes predictions for about 72% of the residues on average.

In 45 out of 186 proteins studied, the TALOS results included no bad predictions ("bad" meaning substantially different from the crystal structure).

(IMPORTANT!) Over all 186 proteins, about 1.8% of the predictions made by TALOS were incorrect relative to the corresponding crystal structure.

On average, the uncertainty as reported by TALOS for the consensus predictions was 13.5 degrees for phi, and 12.2 degrees for psi.
The actual RMSD of the "correct" predictions relative to the crystal structures was about 12.9 degrees for phi, and 12.4 degrees for psi.

As noted in (4) above, it must be remembered that TALOS will produce a small number of predictions which seem to be valid (because the best matches from the database are consistent) but which are nevertheless in error.

It should also be noted that the tests above included only the most well-defined parts of each protein; roughly 6% of the residues had first been discounted because they had high B factors in the crystal structure or because they were known to be mobile in solution. Therefore, TALOS results for other cases may be less reliable than indicated by the tests, especially for proteins with flexible regions.

Components of the TALOS System

The TALOS system is implemented using NMRWish, a version of the Tcl/Tk window shell "wish" which has been customized to include spectral display and analysis facilities as well as a database engine for manipulation of spectral information and molecular coordinates. NMRWish is a companion to the NMRPipe System for multidimensional spectral processing and analysis.

There are three NMRWish Tcl scripts comprising the TALOS system:

TALOS (talos.tcl -old)

Searches the database for shift matches.

VINA (vina.tcl -old)

Performs automated initial summary and classification.

RAMA (rama.tcl -old)

Used to display and refine the predictions.

Any of the scripts can be invoked with the "-help" command-line argument to generate a complete list of options, for example:


   talos.tcl -old -help

The TALOS files are installed into a talos subdirectory of an NMRPipe installation. The NMRPipe initialization commands will establish an environment variable TALOS_DIR which will give the full path to the talos directory. Some files of the TALOS system include:

talos/test/*
A directory with example chemical shift input data and scripts for a demo of TALOS.
talos/tab/talos.tab
The compiled database of residue triplets with their corresponding secondary shifts and PHI/PSI values.
talos/tab/randcoil.tab
The table of random coil shifts used in the prediction process.
talos/tab/homology.tab
The residue type homology factors used in the prediction process.
talos.tab/weight.tab
The weighting factors of the 15 secondary shifts used in the prediction process.
talos/rama.gif
Image of the populated regions of the TALOS database, used as a background for the RAMA ramachandran plot display.
talos/vina.dat
NMRPipe format data file outlining populated regions of the TALOS database, used by VINA to determine whether a collection of predictions falls in a consistent region.
talos/shifts/*.tab
The chemical shift tables for the proteins in the database. The shift table format is the same used for prediction input, as described below. The sequence and residue numbering in the shift tables must be exactly consistent with the corresponding structures in the TALOS pdb directory. Furthermore, the names of these shift files must be exactly consistent with the corresponding structures in the TALOS pdb directory. The files in this directory are only used when compiling a new database (e.g. adding new proteins into the database). When compiling a new database, only shift tables ending with the ".tab" extension will be used.
talos/pdb/*.pdb
The PDB structures for the proteins in the database. The sequence and residue numbering must be exactly consistent with the corresponding assignments in the TALOS shifts directory. Furthermore, the names of these files must be exactly consistent with the corresponding chemical shift tables in the TALOS shifts directory. The files in this directory are only used when compiling a new database (e.g. adding new proteins into the database).
talos/bin/TALOS.*
The binary executables used for "fast" versions of the TALOS database search. If no suitable executable is available, TALOS will perform its database search via TCL script, which takes longer, but produces similar results.

How to Use TALOS

Use of TALOS to predict phi and psi angles involves the following steps:

1. Create a directory for the prediction session; all subsequent commands will be executed from this directory.

2. Prepare the input table of shift assignments (for example "myshifts.tab"), according to the format given below.

3. Run TALOS (talos.tcl -old) to perform the database searches. Most commonly, this will simply require a command such as:

talos.tcl -old -in myshifts.tab

During the database search, a series of files "pred/res*.tab" will be created. Each one of these files tallies the 10 best database matches for a given residue in the target protein. Before exiting, a file "pred.tab" will also be created, which includes an initial summary of the prediction results. The database search will typically take about 10-60 sec per residue in the target.

4. Run VINA (vina.tcl -old) to summarize the results. Most commonly, this will be done with one of the following commands, depending on whether a proposed structure is available:

vina.tcl -old -in myshifts.tab -auto
vina.tcl -old -in myshifts.tab -ref mystruct.pdb -auto

This will adjust the individual files "pred/res*.tab" to identify outliers in the database matches. It will also prepare a new summary file "pred.tab". Note that this step is optional, since the TALOS database search already produces an initial summary using VINA.

5. Run RAMA (rama.tcl -old) to inspect and adjust the predictions. The simplest invocations are:

rama.tcl -old -in myshifts.tab
rama.tcl -old -in myshifts.tab -ref mystruct.pdb

During this inspection, you will:

Examine the phi/psi distributions of the best 10 database matches for a given residue, and decide which ones should be used included in the prediction, and which are "outliers". (NOTE: in most cases, the initial automated classifications performed by the current version of the VINA program should be acceptable with no manual adjustment needed).

Classify the results for a given residue as "Good", "Ambiguous", or (if a reference structure is known) "Bad".

The files "pred/res*.tab" will be adjusted along the way to reflect any changes made interactively, and a new "pred.tab" summary file will be created on exiting. When the above steps are completed, the final "pred.tab" file will include the classification ("Good" etc) and predictions (averages and standard deviations) for phi and psi at each residue.

6. Convert TALOS results to other formats, for use as structural restraints, etc. NMRPipe includes scripts such as "talos2dyn.tcl" for this purpose.

Preparing the Input Shift Table Data

NOTE WELL!

The input shift table should be prepared carefully, so that it has the proper format, naming conventions, and shift referencing.

The 13C chemical shifts for CA, CB, and CO used as input for TALOS should be referenced relative to TSP.

The 15N chemical shifts used as input for TALOS should be referenced relative to liquid ammonia at 25 degrees C.

An example of the required shift table format is shown below. Complete examples can be found in the talos/shifts and talos/test directories. Specifically:

In the current version of TALOS, residue numbering must begin at 1.

The protein sequence should be given as shown by one or more "DATA SEQUENCE" lines. Space characters in the sequence will be ignored. Use "c" for oxidized CYS (CB ~ 42.5 ppm) and "C" for reduced CYS (CB ~ 28 ppm).

The table must include columns for residue ID, one-character residue name, atom name, and chemical shift.

The table must include a "VARS" line which labels the corresponding columns of the table.

The table must include a "FORMAT" line which defines the data type of the corresponding columns of the table.

Atom names are always given exactly as:

HA for H-alpha of all residues except glycine

HA2 for the first H-alpha of glycine residues

HA3 for the second H-alpha

C for C' (CO)

CA for C-alpha

CB for C-beta

N for N-amide

As noted, there is an exception for naming glycine assignments, which should use HA2 and HA3 instead of HA. In the case of glycine HA2/HA3 assignments, TALOS will use the average value of the two, so that it is not necessary to have these specifically assigned; for use of TALOS, the assignment can be arbitrary. Note however that the assignment must be given exactly as either "HA2" or "HA3" rather than "HA2|HA3" etc.

Other types of assignments may be present in the shift table; they will be ignored.

Example shift table (excerpts):

   REMARK Ubiquitin input for TALOS, HA2/HA3 assignments arbitrary.

   DATA SEQUENCE MQIFVKTLTG KTITLEVEPS DTIENVKAKI QDKEGIPPDQ QRLIFAGKQL
   DATA SEQUENCE EDGRTLSDYN IQKESTLHLV LRLRGG

   VARS   RESID RESNAME ATOMNAME SHIFT
   FORMAT %4d   %1s     %4s      %8.3f

     1 M           HA                  4.23
     1 M           C                 170.54
     1 M           CA                 54.45
     1 M           CB                 33.27
     2 Q           N                 123.22
     2 Q           HA                  5.25
     2 Q           C                 175.92
     2 Q           CA                 55.08
     2 Q           CB                 30.76
   ...
    10 G           N                 108.89
    10 G           HA2                 4.35
    10 G           HA3                 3.61
    10 G           C                 174.07
    10 G           CA                 45.46
   ...

Inspecting and Refining the Prediction Results

The final step in interpreting the results of the TALOS database search is to inspect and classify the matches so that useful predictions can be formed; however, in most cases, the initial automated classifications performed by the current version of the VINA program should be acceptable with no manual adjustment needed.

The refinement of predictions is done via the graphical interface RAMA. The simplest invocation of RAMA is:

rama.tcl -old -in myshifts.tab

If a proposed structure is available, first invoke VINA to update the prediction summary:

vina.tcl -old -in myshifts.tab -ref mystruct.pdb -auto

Then, invoke RAMA so that the reference structure is included in the display of prediction data:

rama.tcl -old -in myshifts.tab -ref mystruct.pdb

The various windows displayed by RAMA are shown below.

TALOS Sequence Window

Sequence Window: displays the target protein sequence, with each residue colorized according to its classification. Clicking on a residue with the mouse will select that residue for display and analysis in the other windows. The residues are colored according to this scheme:

Green Good prediction (at most one outlier)

Yellow Ambiguous; no prediction

Red Bad prediction relative to a known structure

Gray No classification yet

TALOS predictin window

Prediction Window: lists the statistics of the 10 best database matches for the currently selected residue in the target protein. The individual entries in this window can be toggled by a mouse click, to include or remove a particular match from the prediction.

TALOS Ramachandran Window

Ramachandran Window: graphs the phi/psi distributions of the 10 best database matches for the currently selected residue. It also graphs the average and standard deviation of phi and psi for those matches which are selected (i.e. included in the prediction). The graph is shaded to show the most populated regions of the TALOS database.

In the graph, each match from the database is drawn as a small square at a particular phi/psi coordinate. The individual squares can be toggled by a mouse click, to include or remove the corresponding match from the prediction. The squares are colored according to this scheme:

Green This match is included in the prediction.

Red Outlier; not included in the prediction

Blue Reference (phi/psi taken from "-ref" structure)

The Ramachandran window also includes buttons to reclassify the overall prediction as "Good", "Ambiguous", etc., and to move to the next or previous residue in the sequence.

TALOS secondary chemical shift window

Secondary Shift Window: (optional with the "-sd"argument) graphs the secondary shift distributions of the 10 best database matches for the currently selected residue.

Rasmol window

Molecular Viewer Window: (optional with the "-ras" argument) Displays the three-dimensional structure given by the "-ref" argument, colorized according to the residue classification scheme above. This option assumes that the program RasMol is available to use as a viewer.

For more about the RasMol molecular viewer program, see:

Roger Sayle and E. James Milner-White
RasMol: Biomolecular graphics for all
Trends in Biochemical Sciences (TIBS), September 1995, Vol. 20, No. 9, p. 374.
http://www.umass.edu/microbio/rasmol

How to Select "Good" Predictions

The empirical "rules of thumb" for defining "Good" predictions follow; all other cases are considered "Ambiguous":

If all 10 best database matches fall in a "consistent" region of the ramachandran map, the prediction is classified as "Good".

If 9 out of 10 of the best database matches fall in a consistent region with phi < 0, and the one outlier also lies in phi < 0 half of the map, the prediction is classified as "Good".

If 9 out of 10 of the best database matches fall in a consistent region with phi > 0, the prediction is classified as "Good".

In the above rules, "consistent" means in or near the same well-populated region of the ramachandran map. According to 99% of cases tested, this usually means that the standard deviation of of phi and psi for the selected group of matches will be 35 degrees or less.

NOTE WELL!

When inspecting the phi/psi graphs to decide if matches are in a consistent region, keep in mind their "periodic" nature; i.e. angles at one edge of the graph are actually close to angles at the opposite edge.

Adding New Proteins to the Database

New protein chemical shift and structure data can be added to the database. Note well that this should be done with great care and caution, to ensure that only reliable phi/psi data with consistently referenced and correct chemical shifts are included. Given this, the procedure for adding new proteins to the TALOS database is simple:

Create a chemical shift table for the new protein according to the format listed above. Copy the table to the "talos/shifts" directory; it must have a ".tab" extension in order to be used.

Place the corresponding PDB structure file into the "talos/pdb" directory; it must have a ".pdb" extension, and its file name, sequence, and residue numbering must correspond exactly with the shift table.

In the "talos" directory, execute the following command to compile a new database:

talos.tcl -old -compile

About the Name TALOS

In Greek mythology, Talos was a powerful artificial man of bronze forged by the god Hephaestus to guard the island of Crete. According to myth, Talos would hurl boulders at passing ships, and would destroy offenders by clutching them to his breast and then jumping into a fire.

The illustration used in the TALOS program window is of course unrelated to the Greek myth. It shows Star Trek's Capt. Christopher Pike on a visit to the planet Talos IV. In the illustration, the inhabitants of Talos IV are using their considerable mental powers to make Capt. Pike experience an unpleasant scene from his own imagination.

* All documents in PDF format require the free Adobe Acrobat Reader application for viewing

[ Home ] [ NIH ] [ NIDDK ] [ Terms of Use ]
last update: Sep 13 2010 / fd

HA	for H-alpha of all residues except glycine
HA2	for the first H-alpha of glycine residues
HA3	for the second H-alpha
C	for C' (CO)
CA	for C-alpha
CB	for C-beta
N	for N-amide

Green	Good prediction (at most one outlier)
Yellow	Ambiguous; no prediction
Red	Bad prediction relative to a known structure
Gray	No classification yet

Green	This match is included in the prediction.
Red	Outlier; not included in the prediction
Blue	Reference (phi/psi taken from "-`ref`" structure)