Important Note: The web page describes the original version of TALOS, which has
been replaced by the improved TALOS+ system, now the
default version. The older version of TALOS is provided
along with TALOS+ for backward-compatibility. This web page
describes the use of the older version of TALOS.
For information about the newer, improved version, TALOS+ see:
https://spin.niddk.nih.gov/bax-apps/software/TALOSThe components of the older version of TALOS can now be accessed by including the -old option on the command line; if the -old
option is not included, components from the newer TALOS+ will be used
instead:
talos.tcl -old ... vina.tcl -old ... rama.tcl -old ... |
![]() |
TALOS: Torsion Angle
Likelihood Obtained from Shift and sequence similarity
As described in the paper: Backbone angle restraints from searching a database for chemical shift and sequence homology The TALOS software is part of the NMRPipe package, download information can be found at:
https://spin.niddk.nih.gov/bax-apps/NMRPipe
The current TALOS database
includes 186 proteins, providing more than 24,000 residue triplets.
|
TALOS uses the secondary shifts of a given residue to predict phi and psi angles for that residue. TALOS also includes the information from the next and previous residues when making predictions for a given residue. So, in practice, TALOS uses data for three consecutive residues simultaneously (i.e. 15 total secondary shifts and 3 residue types) to make predictions for the central residue in a triplet.
The idea behind TALOS is that if one can
find some triplet of residues in a protein of known structure with similar
secondary shifts and sequence to a triplet in a target protein, then the
phi and psi angles in the known structure will be useful predictors for
the angles in the target.
The similarity is measured with a score based on the weighted sum of squares differences between the shifts in the target protein and the database entries, so that lower scores indicated high similarity. In order to take advantage of the correlations between residue type and secondary structure, the score also includes a small, qualitative residue type term which biases the matching towards roughly similar sequences.
In practice, TALOS searches a database for the 10 best matches to a given triplet in the target protein. If these 10 matches indicate consistent values for phi and psi, then their averages and standard deviations are used as a prediction. However, if the 10 best matches have mutually inconsistent values of phi and psi, the matches are declared ambiguous, and no prediction is made for the central residue. In the TALOS approach, an initial classification of good vs ambiguous is performed automatically, and the classifications are then adjusted interactively through a graphical interface which is part of the TALOS system.
The TALOS database, while small, was constructed using the most well-defined parts of high resolution (2.2 Angstroms or better) X-ray crystal structures to define the phi and psi angles. It originally included data from 21 proteins, representing around 3,000 triplets. The current database includes data from 186 proteins, representing over 24,000 triplets.
It should also be noted that the tests above included only the most well-defined parts of each protein; roughly 6% of the residues had first been discounted because they had high B factors in the crystal structure or because they were known to be mobile in solution. Therefore, TALOS results for other cases may be less reliable than indicated by the tests, especially for proteins with flexible regions.
There are three NMRWish Tcl scripts comprising the TALOS system:
talos.tcl -old -help
The TALOS files are installed into a talos
subdirectory of
an NMRPipe installation. The NMRPipe initialization commands will
establish an environment variable TALOS_DIR which will give the full
path to the talos
directory.
Some files of the TALOS system include:
talos/test/*
A directory with example chemical shift input data and scripts for a demo of TALOS.talos/tab/talos.tab
The compiled database of residue triplets with their corresponding secondary shifts and PHI/PSI values.talos/tab/randcoil.tab
The table of random coil shifts used in the prediction process.talos/tab/homology.tab
The residue type homology factors used in the prediction process.talos.tab/weight.tab
The weighting factors of the 15 secondary shifts used in the prediction process.talos/rama.gif
Image of the populated regions of the TALOS database, used as a background for the RAMA ramachandran plot display.talos/vina.dat
NMRPipe format data file outlining populated regions of the TALOS database, used by VINA to determine whether a collection of predictions falls in a consistent region.talos/shifts/*.tab
The chemical shift tables for the proteins in the database. The shift table format is the same used for prediction input, as described below. The sequence and residue numbering in the shift tables must be exactly consistent with the corresponding structures in the TALOS pdb directory. Furthermore, the names of these shift files must be exactly consistent with the corresponding structures in the TALOS pdb directory. The files in this directory are only used when compiling a new database (e.g. adding new proteins into the database). When compiling a new database, only shift tables ending with the ".tab" extension will be used.talos/pdb/*.pdb
The PDB structures for the proteins in the database. The sequence and residue numbering must be exactly consistent with the corresponding assignments in the TALOS shifts directory. Furthermore, the names of these files must be exactly consistent with the corresponding chemical shift tables in the TALOS shifts directory. The files in this directory are only used when compiling a new database (e.g. adding new proteins into the database).talos/bin/TALOS.*
The binary executables used for "fast" versions of the TALOS database search. If no suitable executable is available, TALOS will perform its database search via TCL script, which takes longer, but produces similar results.
1. Create a directory for the prediction session; all subsequent commands will be executed from this directory.
2. Prepare the input table of shift assignments (for example "myshifts.tab"), according to the format given below.
3. Run TALOS (talos.tcl -old) to perform the database searches. Most commonly, this will simply require a command such as:
talos.tcl -old -in myshifts.tabDuring the database search, a series of files "pred/res*.tab" will be created. Each one of these files tallies the 10 best database matches for a given residue in the target protein. Before exiting, a file "pred.tab" will also be created, which includes an initial summary of the prediction results. The database search will typically take about 10-60 sec per residue in the target.
4. Run VINA (vina.tcl -old) to summarize the results. Most commonly, this will be done with one of the following commands, depending on whether a proposed structure is available:
vina.tcl -old -in myshifts.tab -auto vina.tcl -old -in myshifts.tab -ref mystruct.pdb -autoThis will adjust the individual files "pred/res*.tab" to identify outliers in the database matches. It will also prepare a new summary file "pred.tab". Note that this step is optional, since the TALOS database search already produces an initial summary using VINA.
5. Run RAMA (rama.tcl -old) to inspect and adjust the predictions. The simplest invocations are:
rama.tcl -old -in myshifts.tab rama.tcl -old -in myshifts.tab -ref mystruct.pdbDuring this inspection, you will:
6. Convert TALOS results to other formats,
for use as structural restraints, etc. NMRPipe includes scripts such
as "talos2dyn.tcl" for this purpose.
The input shift table should be prepared
carefully, so that it has the proper format, naming conventions, and shift
referencing.
The 13C chemical shifts for CA, CB, and
CO used as input for TALOS should be referenced relative to TSP.
The 15N chemical shifts used as input for
TALOS should be referenced relative to liquid ammonia at 25 degrees C.
An example of the required shift table
format is shown below. Complete examples can be found in the talos/shifts
and talos/test directories. Specifically:
The refinement of predictions is done via
the graphical interface RAMA. The simplest invocation of RAMA is:
Sequence Window: displays the target
protein sequence, with each residue colorized according to its classification.
Clicking on a residue with the mouse will select that residue for display
and analysis in the other windows. The residues are colored according to
this scheme:
Prediction Window: lists the statistics
of the 10 best database matches for the currently selected residue in the
target protein. The individual entries in this window can be toggled by
a mouse click, to include or remove a particular match from the prediction.
Ramachandran Window: graphs the
phi/psi distributions of the 10 best database matches for the currently
selected residue. It also graphs the average and standard deviation of
phi and psi for those matches which are selected (i.e. included in the
prediction). The graph is shaded to show the most populated regions of
the TALOS database.
In the graph, each match from the database
is drawn as a small square at a particular phi/psi coordinate. The individual
squares can be toggled by a mouse click, to include or remove the corresponding
match from the prediction. The squares are colored according to this scheme:
Secondary Shift Window: (optional
with the "-sd"argument) graphs
the secondary shift distributions of the 10 best database matches for the
currently selected residue.
Molecular Viewer Window: (optional
with the "-ras" argument)
Displays the three-dimensional structure given by the "-ref"
argument, colorized according to the residue classification scheme above.
This option assumes that the program RasMol is available to use
as a viewer.
For more about the RasMol molecular
viewer program, see:
NOTE WELL!
When inspecting
the phi/psi graphs to decide
if matches are in a consistent region, keep in mind their "periodic" nature;
i.e. angles at one edge of the graph are actually close to angles at the
opposite edge.
NOTE WELL!
Preparing
the Input Shift Table Data
Example shift table (excerpts):
HA
for H-alpha of all residues except glycine
HA2
for the first H-alpha of glycine residues
HA3
for the second H-alpha
C
for C' (CO)
CA
for C-alpha
CB
for C-beta
N
for N-amide
REMARK Ubiquitin input for TALOS, HA2/HA3 assignments arbitrary.
DATA SEQUENCE MQIFVKTLTG KTITLEVEPS DTIENVKAKI QDKEGIPPDQ QRLIFAGKQL
DATA SEQUENCE EDGRTLSDYN IQKESTLHLV LRLRGG
VARS RESID RESNAME ATOMNAME SHIFT
FORMAT %4d %1s %4s %8.3f
1 M HA 4.23
1 M C 170.54
1 M CA 54.45
1 M CB 33.27
2 Q N 123.22
2 Q HA 5.25
2 Q C 175.92
2 Q CA 55.08
2 Q CB 30.76
...
10 G N 108.89
10 G HA2 4.35
10 G HA3 3.61
10 G C 174.07
10 G CA 45.46
...
Inspecting
and Refining the Prediction Results
The final step in interpreting the results
of the TALOS database search is to inspect and classify the matches so
that useful predictions can be formed; however, in most cases,
the initial automated classifications performed
by the current version of the VINA program should be acceptable
with no manual adjustment needed.
rama.tcl -old -in myshifts.tab
If a proposed structure is available, first
invoke VINA to update the prediction summary:
vina.tcl -old -in myshifts.tab -ref mystruct.pdb -auto
Then, invoke RAMA so that the reference structure
is included in the display of prediction data:
rama.tcl -old -in myshifts.tab -ref mystruct.pdb
The various windows displayed by RAMA are
shown below.
Green
Good prediction (at most one outlier)
Yellow
Ambiguous; no prediction
Red
Bad prediction relative to a known structure
Gray
No classification yet
The Ramachandran window also includes buttons
to reclassify the overall prediction as "Good", "Ambiguous", etc., and
to move to the next or previous residue in the sequence.
Green
This match is included in the prediction.
Red
Outlier; not included in the prediction
Blue
Reference (phi/psi taken from "-ref"
structure)
Roger Sayle and E. James Milner-White
RasMol: Biomolecular graphics for all
Trends in Biochemical Sciences (TIBS),
September 1995, Vol. 20, No. 9, p. 374.
The empirical "rules of thumb" for defining
"Good" predictions follow; all other cases are considered "Ambiguous":
How to
Select "Good" Predictions
In the above rules, "consistent" means in
or near the same well-populated region of the ramachandran map. According
to 99% of cases tested, this usually means that the standard deviation
of of phi and psi for the selected group of matches will be 35 degrees
or less.
New protein chemical shift and structure data
can be added to the database. Note well that this should be done with great
care and caution, to ensure that only reliable phi/psi data with consistently
referenced and correct chemical shifts are included. Given this, the procedure
for adding new proteins to the TALOS database is simple:
Adding New
Proteins to the Database
talos.tcl -old -compile
About the Name
TALOS
The illustration used in the TALOS program
window is of course unrelated to the Greek myth. It shows Star Trek's Capt.
Christopher Pike on a visit to the planet Talos IV. In the illustration,
the inhabitants of Talos IV are using their considerable mental powers
to make Capt. Pike experience an unpleasant scene from his own imagination.
![]() |