From the Bax Group at the National Institutes of Health ...
SPARTA:
Shifts Predicted from Analogy in Residue type and Torsion Angle

sparta_logo

As described in the paper:

Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology
Yang Shen and Ad Bax
J. Biomol. NMR, 38, 289-302 (2007)

Contact:       shenyang@niddk.nih.gov; bax@nih.gov

Web:       http://spin.niddk.nih.gov/bax



DOWNLOAD

RedHat Linux (Fedora Core)/Mac/Win32 version (v1.01, last updated Oct 28, 2009, change log)

The download archive can be unpacked in unix with a command like the following:

   tar -zxvf sparta.tar.Z 

The archive can also be unpacked with a traditional Windows zip software.

A script "install.com" in the package can be used to set up the program.

NOTE that SPARTA+ now replaces all earlier version of SPARTA. [Go to SPARTA+]


What is SPARTA?
Reliability of SPARTA
Components of the SPARTA Package
How to Use SPARTA
Preparing the PDB Coordinates
Adding New Proteins to the Database
Compile the Source Code
About the Name SPARTA


What is SPARTA?

SPARTA is a database system for empirical prediction of backbone chemical shifts (N, HN, HA, CA, CB, CO) using a combination of backbone phi, psi torsion angles and sidechain chi1 angles from a given protein with known PDB coordinates. The SPARTA approach is an extension of the well-known observation that many kinds of secondary chemical shifts (i.e. differences between chemical shifts and their corresponding random coil values) are highly correlated with aspects of protein secondary structure. The goal of SPARTA is to use phi, psi, chi1 torsion angles and sequence information from proteins structure in order to make quantitative predictions for the backbone chemical shifts

SPARTA uses the phi, psi and chi1 angles of a given residue to predict secondary shifts for that residue. SPARTA also includes the information from the next and previous residues when making predictions for a given residue. So, in practice, SPARTA uses data for three consecutive residues simultaneously (i.e. 9 torsion angles and 3 residue types) to make predictions for the central residue in a triplet.

The idea behind SPARTA is that if one can find some triplet of residues in a protein of known structure with similar structure and sequence to a triplet in a target protein, then the backbone secondary chemical shifts for this protein will be useful predictors for the backbone secondary chemical shifts in the target.

The similarity is measured with a score based on the weighted sum of squares differences between the torsion angles in the target protein and the database entries, so that lower scores indicated high similarity. In order to take advantage of the correlations between residue type and secondary structure, the score also includes a small, qualitative residue type term which biases the matching towards roughly similar sequences.

In practice, SPARTA searches a database for the 20 best matches to a given triplet in the target protein. The weighted averages chemical chemical shifts (obtained by subtracting their corresponding random coil chemical shifts values and the adjustments values arising from the effects of neighboring residues) of the central residues of these 20 matches are used as a prediction for the secondary shift of the central residue. The SPARTA database was constructed using the most well-defined parts of high resolution (2.4 Angstroms or better) X-ray crystal structures to define the phi, psi and chi1 angles, as well as other structural information, such as hydrogen bonding and ring current shifts, which would be used to quantitatively correct the raw predicted shifts from database searching. This database currently includes data from 200 proteins, representing 24,166 triplets.


Reliability of SPARTA

The reliability of SPARTA approach was tested by a cross-validation procedure where each protein was temporarily removed from the database, and its backbone chemical shifts (N, HN, HA, CA, CB and CO) were predicted using the remaining protein data. The RMS deviations between the predicted and experimental shifts are 2.36, 0.46, 0.25, 0.88, 0.97 and 1.01 ppm, respectively. The same shifts prediction accuracies are also obtained for the proteins with known structures which are not contained in the database.

Importantly, it is found that the standard deviation for the secondary shifts of the center residue in the 20 matches are correlated with the shifts prediction errors. Checking the standard deviations in prediction summary file (pred/pred.tab) will provide an idea of the prediction reliability.

It should be noted that the global structural information, such as ring current shifts and hydrogen bonding, was also carefully considered in SPARTA. The secondary shifts in the SPARTA database are actually the corrected shifts using the calculated ring current shifts from PDB coordinates. As "compensation", the SPARTA predicted shifts for target protein are also corrected by adding the calculated ring current shifts from target protein. For HA and HN, the SPARTA-predicted secondary shifts are also corrected by using their hydrogen bond lengths and their relationship with the prediction errors, which were derived from above cross-validation. Therefore, accuracy of the target coordinates is critical to obtain the reliable hydrogen bond information and ring current shifts, and the final predicted shifts. The calculated hydrogen bond and ring current shifts information are stored in an input summary file (/pred/protein_in.tab).

It should also be noted that the protein backbone chemical shifts are extremely sensitive to the local conformation; therefore, SPARTA results for the residues in the flexible region or the with very large ring current shifts contribution may be less reliable, which was also indicated by the test.


Components of the SPARTA Package

The SPARTA program is implemented using C++. The compiled executable files ($SPARTA_DIR/src/SPARTA for Linux, $SPARTA_DIR/src/SPARTA.exe for Windows, $SPARTA_DIR/src/SPARTA.mac for Mac) or the starting script ("$SPARTA_DIR/sparta" for Linux/Mac) can be invoked with the "TALOS-like" command-line arguments. A complete list of options can be invoked and generated with a "-help" command-line argument.

Use of SPARTA requires definition of an environment variable "SPARTA_DIR" or one command-line argument "-spartaDir" to specify the SPARTA installation directory; it will be established automatically if run SPARTA from the starting script ("$SPARTA_DIR/sparta" in Linux/Mac), which includes the following lines:

   setenv SPARTA_DIR /disk1/SPARTA
   $SPARTA_DIR/src/SPARTA $argv[1-$#argv]
Note that the definition of $SPARTA_DIR in the starting script MUST be corresponded to the SPARTA installation directory in order to run the program.

Other files of the SPARTA package include:

$SPARTA_DIR/tab/sparta.tab
The compiled database of residue triplets with their corresponding PHI/PSI/CHI1 angles and secondary shifts.

$SPARTA_DIR/tab/randcoil.tab, rcadj.tab, rcprev.tab, rcnext.tab
The tables of random coil shifts, adjustments values from neighboring residues used in the shifts prediction process. (The same tables as used in TALOS, http://spin.niddk.nih.gov/NMRPipe/talos/)

$SPARTA_DIR/tab/homology.tab
The residue type homology factors used in the prediction process.

$SPARTA_DIR.tab/weight.tab
The weighting factors of PHI, PSI and CHI1 angles, and residue type homology used in the prediction process.

$SPARTA_DIR.tab/fitting.tab
The fitting parameters between the prediction accuracy and precision, which will be used after the prediction process to calculate the estimated prediction error.

$SPARTA_DIR/shifts/*.tab
The files in this directory are only used when compiling a new database. When compiling a new database, only shift tables ending with the ".tab" extension will be used. The files in this directory are the chemical shift tables in TALOS format for the proteins in the database and must be exactly consistent with the corresponding structures in the SPARTA pdb directory.

$SPARTA_DIR/pdb/*.pdb
The PDB coordinates files in this directory are only used along with the files in the SPARTA shifts directory when compiling a new database (e.g. adding new proteins into the database). The sequence and residue numbering must be exactly consistent with the corresponding assignments in the SPARTA shifts directory. Furthermore, the names of these files must be exactly consistent with those of the corresponding chemical shift tables in the SPARTA shifts directory.

$SPARTA_DIR/test/*
The contents of this "test" directory include the input files and results for a sample SPARTA analysis.


How to Use SPARTA

Use of SPARTA to predict backbone chemical shifts involves the following steps:

  1. Create a directory for the prediction session; all subsequent commands will be executed from this directory.

  2. Prepare an input PDB coordinate file (for example "protein.pdb"), according to the format given below.

  3. Run SPARTA ("$SPARTA_DIR/sparta") to perform the database search. Most commonly, this will simply require a command such as:
       sparta -in protein.pdb
    SPARTA will first generate a "pred/protein_in.tab" file from PDB coordinates, which contains of the phi, psi, chi1 angles, H-bonding information and ring current shifts. During the database search, a series of files "pred/X/res*.tab" (X = N, H, HA, CA, CB and C) will be created. Each one of these files tallies the 20 best database matches for a given residue in the target protein. Before exiting, a file "pred.tab" (defined by "-sum" option) will be created in a "pred" directory (defined by "-predDir" option), this file includes a summary of the prediction results. The database search will typically take about 12 seconds for a 100-residue protein on a Linux PC with a 2.8GHz CPU.

Use of SPARTA to correct chemical shift referencing problem:

If the experimental chemical shifts for a target protein are available (with a name "ref.tab", for example, and with TALOS format), SPARTA can be performed by a command such as:

   sparta -in protein.pdb -ref ref.tab
SPARTA will compare the predicted chemical shifts and the experimental shifts, a prediction summary file "pred/pred.tab" will be created, which contains both the experimental shifts, the SPARTA-predicted shifts, as well as the prediction errors. If the average prediction error for a given chemical shift type exceeds 3 times the expected errors (the standard deviation of the prediction errors divide the square root of the number of shifts), a warning will be printed and a reference correction will be applied to the experimental chemical shifts. The corrected experimental chemical shifts are stored in a new file "pred/ref.tab".


Preparing the Input PDB Coordinates

The input PDB coordinates should be prepared carefully, so that it has the proper format, naming conventions. SPARTA accepts standard PDB coordinates file, but ONLY the FIRST conformer/chain will be used if more than one exist. For the PDB coordinates without hydrogen atoms, the hydrogen atoms are required to be added (by using the programs such as DYNAMO, REDUCE, MOLMOL, or any other similar programs) in order to get the hydrogen bonding information and ring current shifts. The standard "HA2/HA3" names are required for the GLY HA atoms.

Examples of the PDB coordinate format can be found in the "$SPARTA_DIR/pdb" and "$SPARTA_DIR/test" directories.


Adding New Proteins to the Database

New protein chemical shift and structure data can be added to the SPARTA database. Note well that this should be done with great care and caution, to ensure that only the reliable phi/psi/chi1 data with consistently referenced and correct chemical shifts are included. It suggests that:

  1. The chemical shift assignments for each candidate protein are better validated by conducting the following SPARTA shift prediction using PDB coordinates.
       sparta -in protein.pdb -ref ref.tab
  2. Check the prediction summary table (pred/pred.tab) files, remove the experimental shifts which deviate from the SPARTA-predicted shifts by more than five standard deviations. In addition, the HA shift outliers that have larger than 1.5 ppm calculated ring current contributions and/or deviate from the SPARTA-predicted HA shifts by more than three standard deviations, are better removed.

  3. The chemical shifts should be referenced correctly. A quick check can be conducted by running above SPARTA prediction and inspecting the average SPARTA prediction errors, which are listed in the header of prediction summary table (pred/pred.tab). By default, SPARTA will apply a shift referencing correction if the average prediction error is larger than 3 times expected error (i.e., the standard deviation of the prediction errors divide the square root of the number of shifts), and store the "referencing-corrected" shifts in a new file "pred/ref.tab"

Given this, the procedure for adding new proteins to the SPARTA database is simple as:

  1. Create a chemical shift table for the new protein according to the TALOS format (http://spin.niddk.nih.gov/NMRPipe/talos/). Copy this shift table file to the "$SPARTA_DIR/shifts" directory; it must have a ".tab" extension in order to be used.

  2. Place the corresponding PDB structure file into the "$SPARTA_DIR/pdb" directory; it must have a ".pdb" extension, and its file name, sequence, and residue numbering must correspond exactly with the shift table.

  3. Prepare a table file, for example with a name of "list.tab", which only contains the name of proteins to be added into the database. This table must follow the example below:

          VARS   PDB_NAME
          FORMAT %24s
          bpti
          ubiquitin
          profilin
          ...
    

    Note that the "PDB_NAME" entries in this table file must consistent with the file names (without ".tab" and ".pdb" extension) in the SPARTA pdb and shifts directories.

  4. In the "SPARTA" directory, execute the following command to compile a new database:

       sparta -compile -pdbDir ./pdb -pdbList list.tab

  5. A new database "$SPARTA_DIR/tab/sparta.tab" will be generated using the files in the SPARTA pdb and shifts directories. Please backup the old database, which will be overwritten.


Compile the Source Code

The SPARTA program was implemented with standard C++ using Standard Template Library (STL). To compile the source codes (in /src directory), your system must have a compatible C++ compiler and STL library. Given this, compiling of SPARTA ex ecutable file is simple as:

   cd $SPARTA_DIR/src
   make -f Makefile.$YOUR_SYSTEM

Compiling of the SPARTA program has been tested on Linux (Linux 9 or newer), Windows (XP) and Mac OS X 10.3.4. The compiled executable files ("$SPARTA_DIR/src/SPARTA" for Linux, "$SPARTA_DIR/src/SPARTA.exe" for Windows, "$SPARTA_DIR/src/SPARTA.app" for Mac) are contained in the distributed SPARTA package.


About the Name SPARTA

SPARTA Statue

In antiquity Sparta was a Dorian Greek military state, originally centered in Laconia. As a city-state devoted to military training, Sparta possessed the most formidable army in the Greek world and regarded itself as the natural protector of Greece. Above figure shows a marble statue of a helmed hoplite (5th century BC), which is possibly Leonidas, a king of Sparta from 489 BC or 488 BC to 480 BC. (-- Wikipedia)



[ Home ] [ NIH ] [ NIDDK ] [ Terms of Use ]
last update: Apr 2 2012 / ys