As described in the paper:
Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homologyContact: shenyang@niddk.nih.gov; bax@nih.govYang Shen and Ad Bax
submitted
RedHat Linux /Fedora Core version
Win32 version
The download unix archive can be unpacked with a command like the following:
zcat sparta.linux.tar.Z | tar xvf -The win32 archive can be unpacked with a traditional Windows zip software.
Users are encouraged to email the author to be informed about updates and related software.
What is SPARTA?
Reliability of SPARTA
Components of the SPARTA Package
How to Use SPARTA
Preparing the PDB Coordinates
Adding New Proteins to the Database
Compile the Source Code
About the Name SPARTA
SPARTA uses the phi, psi and chi1 angles of a given residue to predict secondary shifts for that residue. SPARTA also includes the information from the next and previous residues when making predictions for a given residue. So, in practice, SPARTA uses data for three consecutive residues simultaneously (i.e. 9 torsion angles and 3 residue types) to make predictions for the central residue in a triplet.
The idea behind SPARTA is that if one can find some triplet of residues in a protein of known structure with similar structure and sequence to a triplet in a target protein, then the backbone secondary chemical shifts for this protein will be useful predictors for the backbone secondary chemical shifts in the target.
The similarity is measured with a score based on the weighted sum of squares differences between the torsion angles in the target protein and the database entries, so that lower scores indicated high similarity. In order to take advantage of the correlations between residue type and secondary structure, the score also includes a small, qualitative residue type term which biases the matching towards roughly similar sequences.
In practice, SPARTA searches a database for the 20 best matches to a given triplet in the target protein. The weighted averages chemical chemical shifts (obtained by subtracting their corresponding random coil chemical shifts values and the adjustments values arising from the effects of neighboring residues) of the central residues of these 20 matches are used as a prediction for the secondary shift of the central residue. The SPARTA database was constructed using the most well-defined parts of high resolution (2.4 Angstroms or better) X-ray crystal structures to define the phi, psi and chi1 angles, as well as other structural information, such as hydrogen bonding and ring current shifts, which would be used to quantitatively correct the raw predicted shifts from database searching. This database currently includes data from 200 proteins, representing 24,166 triplets.
The reliability of the SPARTA approach was tested by a cross-validation procedure where each protein was temporarily removed from the database, and its backbone chemical shifts (N, HN, HA, CA, CB and C’) were predicted using the remaining protein data. The RMS deviations between the predicted and experimental shifts are 2.36, 0.46, 0.25, 0.88, 0.97 and 1.01 ppm, respectively. The same shifts prediction accuracies are also obtained for the proteins with known structures which are not contained in the database.
Importantly, it is also found in the test that the standard deviation the shifts from the central residues of the 20 matches are correlated with the shifts prediction errors. By checking the standard deviations in the prediction summary file (pred/pred.tab) will provide an idea of the prediction reliability.
It should be noted that the global structural information, such as ring current shifts and hydrogen bonding, was also carefully considered in SPARTA. The secondary shifts in SPARTA database are actually the corrected shifts using the ring current shifts. As “compensation”, the SPARTA predicted shifts for target protein are also corrected by adding the calculated ring current shifts from target protein. For HA and HN, the predicted secondary shifts are also corrected by using the hydrogen bond length and their relationship with the prediction errors, which were derived from above cross-validation. Therefore, the accuracy of the coordinates of the target protein is critical to obtain the reliable hydrogen bond information and ring current shifts, and the final predicted shifts. The calculated hydrogen bond and ring current shifts information is stored in the input summary file (/pred/protein_in.tab).
It should also be noted that the protein backbone chemical shifts are extremely sensitive to the local conformation; therefore, SPARTA results for the residues in the flexible region or the with very large ring current shifts contribution may be less reliable, which was also indicated by the test.
The SPARTA system is implemented using C++. The compiled executable files ($SPARTA_DIR/src/SPARTA for Linux, $SPARTA_DIR/src/SPARTA.exe for Windows) or starting script ("$SPARTA_DIR/sparta" for Linux) can be invoked with "TALOS-like" command-line argument. A complete list of options can be invoked and generated with a "-help" command-line argument or simply typing in the executive files or starting script without any command-line arguments.
Running SPARTA requires definition of the environment variables "SPARTA_DIR"; this will be established automatically by the starting script ("$SPARTA_DIR/sparta" in Linux):
setenv SPARTA_DIR /disk1/SPARTA $SPARTA_DIR/src/SPARTA $argv[1-$#argv]Note that the default "$SPARTA_DIR" is the current directory if not specified.
Other files of the SPARTA package include:
$SPARTA_DIR/tab/sparta.tab
The compiled database of residue triplets with their corresponding PHI/PSI/CHI1 angles and secondary shifts.$SPARTA_DIR/tab/randcoil.tab, rcadj.tab, rcprev.tab, rcnext.tab
The table of random coil shifts, adjustments values from neighboring residues used in the shifts prediction process. (The same tables as used in TALOS, http://spin.niddk.nih.gov/NMRPipe/talos/)$SPARTA_DIR/tab/homology.tab
The residue type homology factors used in the prediction process, which is similar to the table used by TALOS.$SPARTA_DIR.tab/weight.tab
The weighting factors of PHI, PSI and CHI1 angles, and residue type homology used in the prediction process.$SPARTA_DIR.tab/fitting.tab
The fitting parameters between prediction accuracy and precision, which will be used after the prediction process to calculate the estimated prediction error.$SPARTA_DIR/shifts/*.tab
The files in this directory are only used when compiling a new database. When compiling a new database, only shift tables ending with the ".tab" extension will be used. The files in this directory are the chemical shift tables for the proteins in the database, which are in the same format as the TALOS shifts tables and must be exactly consistent with the corresponding structures in the SPARTA pdb directory.$SPARTA_DIR/pdb/*.pdb
The PDB coordinates files in this directory are only used along with the files in the SPARTA shifts directory when compiling a new database (e.g. adding new proteins into the database). The sequence and residue numbering must be exactly consistent with the corresponding assignments in the SPARTA shifts directory. Furthermore, the names of these files must be exactly consistent with the corresponding chemical shift tables in the SPARTA shifts directory.$SPARTA_DIR/test/*
The contents of this "test" directory include the input files and results for a sample SPARTA analysis.
Use of SPARTA to predict backbone chemical shifts involves the following steps:
sparta -in protein.pdbSPARTA will first generate an input "pred/protein_in.tab" file from PDB coordinates, which contains of the phi, psi, chi1 angles, H-bonding information and ring current shifts. During the database search, a series of files "pred/X/res*.tab" (X = N, H, HA, CA, CB and C) will be created. Each one of these files tallies the 20 best database matches for a given residue in the target protein. Before exiting, a file "pred.tab" will also be created in "pred" directory, which includes a summary of the prediction results. The database search will typically take about 25 sec for a 100-residue protein on a Linux PC with a 2.8GHz CPU.
sparta -in protein.pdb -ref ref.tabSPARTA would compare the predicted chemical shifts and experimental shifts before exiting, and a prediction summary file "pred/pred.tab" will be generated to store the comparison between the reference and predicted shifts, as well as the errors. If the average prediction error larger than 3 times of the expected errors (standard deviation of prediction errors / square root of number of shifts), a warning is printed and a reference correction will be applied to the experimental chemical shifts. The corrected reference chemical shifts are stored into a new file "pred/ref.tab"
The input PDB coordinates should be prepared carefully, so that it has the proper format, naming conventions. SPARTA accept the standard PDB coordinates file, but ONLY the FIRST conformer/chain if more than one exist. For PDB coordinates without hydrogen atoms, the hydrogen atoms are required to be added (by using programs DYNAMO, REDUCE, MOLMOL, or any other similar programs) in order to get the hydrogen bonding information and ring current shifts. For HA atoms of Gly, please use atom names of "HA1/HA2"
Examples of the required PDB coordinate format can be found in the "$SPARTA_DIR/pdb" and "$SPARTA_DIR/test" directories.
New protein chemical shift and structure data can be added to the database. Note well that this should be done with great care and caution, to ensure that only reliable phi/psi/chi1 data with consistently referenced and correct chemical shifts are included. It suggests that
sparta -in protein.pdb -ref ref.tab
Given this, the procedure for adding new proteins to the SPARTA database is simple as:
VARS PDB_NAME FORMAT %24s bpti ubiquitin profilin ...
Note that the "PDB_NAME" in the table file must consistent with the files names (with ".tab" and ".pdb" extension) in the SPARTA pdb and shifts directories.
sparta -compile -pdbDir ./pdb -pdbList list.tab
SPARTA was implemented with standard C++ using Standard Template Library (STL). To compile the source codes (in /src directory), your system must have a compatible C++ compiler and STL library. Given this, the compiling of SPARTA executable file is simple as:
cd $SPARTA_DIR/src make
The compiling of the SPARTA program has been tested on Windows (XP) and Linux (Linux 9 or newer). The compiled executable files ("$SPARTA_DIR/src/SPARTA" for Linux, or "$SPARTA_DIR/src/SPARTA.exe" for Windows) are contained in the distributed SPARTA package.