From the Bax Group at the National Institutes of Health ...
TALOS-N: Prediction of Protein Backbone and Sidechain Torsion Angles from NMR Chemical Shifts
The most common TALOS-N installation procedure on an unix system (linux, linux9, and mac) will involve:
There is also a Web-Based version of TALOS-N which can be used directly without installing TALOS-N. A Java version viewer (JRAMA) is also available to display TALOS-N results without installing TALOS-N. You can access this Web-based system, along with other facilities for manipulating chemical shifts, dipolar couplings, and molecular structures at the Bax Group NMR Server site:
What is TALOS-N?
TALOS-N is an artificial neural network (ANN) based hybrid system for empirical prediction of protein backbone φ/ψ torsion angles, sidechain χ1 torsion angles and secondary structure using a combination of six kinds (HN, Hα, Cα, Cβ, CO, N) of chemical shift assignments for a given residue sequence.
The original TALOS approach, and its successor TALOS+, is an extension of the well-known observation that many kinds of secondary chemical shifts (i.e. differences between chemical shifts and their corresponding random coil values) are highly correlated with aspects of protein secondary structure. The goal of TALOS-N is again to use secondary chemical shift and sequence information to make quantitative predictions for the protein backbone angles φ/ψ, and to provide a measure of the uncertainties in these predictions. In the original TALOS approach, we search a high-resolution structural database (for which experimental chemical shifts are available) for the 10 best matches to the secondary chemical shifts of a given residue in a target protein along with its two flanking neighbors (a residue triplet). If there is a consensus of φ and ψ angles among the 10 best database matches, then we use these database triplet structures to form a prediction for the backbone angles of the target residue. The later TALOS+ approach added an ANN classification scheme to this database mining approach. This ANN analyzed the chemical shifts and sequence to estimate the likelihood of a given residue being in a α, β, or positive-φ conformation. This ANN classification information was then combined with the database mining results, thereby increasing the number of residues where useful backbone angle predictions can be made.
TALOS-N relies far more extensively on the use of trained ANNs than TALOS+. In TALOS-N method, the ANN used to correlate the chemical shift and the backbone conformation is implemented upon a concept of defining the Ramachandran map in terms of 324 voxels, rather than the three groupings used by TALOS+. TALOS-N also improves upon the original TALOS and TALOS+ database mining approaches by relying on (1) a large database of over 9500 high quality X-ray structures to which chemical shift assignments were added by SPARTA+, and (2) an optimized database search procedure for 25 best matched database hepta-peptides (rather than 10 best matched database tri-peptides). The far greater reliance on ANN algorithms, as well as an optimized database mining approach, allows TALOS-N to predicting backbone torsion angles for a larger fraction (~90%) of residues in a given protein at improved precision.
TALOS-N also includes an ANN component to derive sidechain χ1 angle information, as the χ1 value is known to impact the backbone chemical shifts.
In addition, TALOS-N offers several important features:
It should also be noted that the tests above included only
the most well-defined parts of each protein; roughly 6% of the residues had
first been removed because they had high B factors (exceeding 1.5 times
the average B-factor for that protein) in the crystal structure or because
they were known to be highly mobile in solution. Evaluation of the results
indicates that many of the "Bad" predictions occur outside of
regions of secondary structure, where the X-ray and solution structures may
actually differ from one another, as evidenced by large differences between
X-ray structures when multiple such structures are available for the same protein.
Therefore, the accuracy of TALOS-N will vary from protein to protein,
and tends to be lower for proteins with large flexible regions. A partial remedy
is to increase the S2 threshold for "dynamic" residues to 0.65, but
this will decrease the number of consensus predictions made.
The TALOS-N core system is implemented in the C++ language, and includes a graphical interface to inspect the prediction results. The graphical interface, called jRAMA, is implemented in Java.
There are two major scripts comprising the TALOS-N system:
Other files of the TALOS-N system include:
The standard NMRPipe installation also includes scripts
Use of TALOS-N is much the same as for TALOS and TALOS+:
Chemical shift data pre-check
Similar to TALOS+, TALOS-N includes a feature that pre-checks chemical shift referencing and possible chemical shift errors
talosn -in myshifts.tab -check
It checks the referencing for 13Cα, 13Cβ, 1Hα and 13C' chemical shifts, using the empirical correlation between certain sets of chemical shifts data (Wang et al., 2005 J Biol NMR, 32:13-22). The estimated chemical shift referencing offsets, as well as the chemical shifts which largely deviate from their expected ranges, will be printed with the following format:
Chemical shift outlier checking... ... 64 E CB Secondary Shift: -3.800 Limit: -3.765 76 G C Secondary Shift: 4.250 Limit: 1.925 ! Chemical shift referencing checking... Estimated Referencing Offset for CA/CB: 0.795 +/- 0.104 ppm (Size: 66)
Note that (1) a chemical shift referencing correction is likely required when ever the estimated referencing error approaches the average uncertainty in the database chemical shifts (~1.0 ppm for 13Cα/Cβ and 13C' shifts; ~0.3 ppm for 1Hα shifts), and/or the estimated referencing error larger than five times the average fitting errors; (2) chemical shift outliers, which fall far outside (>2-3 times of) the expected range of secondary chemical shifts (and marked by "!"), are unlikely to be correct (or like in the above example correspond to a C-terminal carboxylate instead of a backbone carbonyl) and need to be checked carefully.
TALOS-N uses an option "
talosn -in myshifts.tab -offset
and an option "
talosn -in myshifts.tab -iso
Exclusion of proteins from the database
Excluding one or more proteins from the database during the TALOS-N database search can be performed by a command line such as:
talosn -in myshifts.tab -excl name1 name2 ...
Amino acid sequence based proten secondary structure prediction
By default, the amino acid sequence based proten secondary structure prediction module is seamlessly implemented in TALOS-N as a complement to the chemical shift based module and can bridge stretches in proteins that lack chemical shifts. This amino acid sequence based module can be performed separately by a command line such as:
TALOS-N requires an input chemical shift table of standard nmrPipe/TALOS format. An example portion of the required chemical shift table format is shown below (full example: ubiq.tab). Other examples can be found in the talosn/demo directories, or at the TALOS-N Server site. Specifically:
Example shift table (excerpt):
REMARK Ubiquitin input for TALOS, HA2/HA3 assignments arbitrary. DATA FIRST_RESID 1 DATA SEQUENCE MQIFVKTLTG KTITLEVEPS DTIENVKAKI QDKEGIPPDQ QRLIFAGKQL DATA SEQUENCE EDGRTLSDYN IQKESTLHLV LRLRGG VARS RESID RESNAME ATOMNAME SHIFT FORMAT %4d %1s %4s %8.3f 1 M HA 4.23 1 M C 170.54 1 M CA 54.45 1 M CB 33.27 2 Q HN 8.90 2 Q N 123.22 2 Q HA 5.25 2 Q C 175.92 2 Q CA 55.08 2 Q CB 30.76 ...
The final step in interpreting the results of the TALOS-N database search is to inspect and classify the matches so that useful predictions can be formed; however, in most cases, the initial automated classifications performed by the current version of the TALOS-N program should be acceptable with no manual adjustment needed.
Refinement of predictions can be made via the RAMA graphical interface
jrama -in pred.tab
If a proposed structure is available, first run TALOS-N with it to generate a prediction summary:
talosn -in myshifts.tab -ref mystruct.pdb
Then, invoke RAMA so that the reference structure is included in the display of prediction data:
jrama -in pred.tab -ref mystruct.pdb
The various windows displayed by
The original TALOS rules for defining consistent ("Good")
predictions were based on clustering of at least 9 out of the 10 best database matches
in the same region of the Ramachandran map. While the TALOS+ rules were based on clustering
of all 10 out of the 10 best database matches in the same region of the Ramachandran map.
The TALOS-N now searches for 25 best database matches and makes two different types of
consistent predictions, called "Strong" and "Generous" predictions.
The TALOS-N rules for defining consistent predictions are:
All other cases are considered "Ambiguous". Note that all the cases with predicted S2 value <0.6 are likely to be "Dynamic", and will not be considered as unambiguous predictions.
When a reference structure is available, predictions will be flagged as "Bad" (automatically by TALOS-N) if either of the following conditions applies:
|Phi(obs) - Phi(pred)|2 + |Psi(obs) - Psi(pred)|2 > 60*60
In practice, this usually means that the standard deviation of φ and ψ for the selected group of matches will be 35 degrees or less (12-13 degrees on average).
When inspecting the φ/ψ graphs to decide if matches are in a consistent region, keep in mind their "periodic" nature; i.e. angles at one edge of the graph are actually close to angles at the opposite edge.