Contents
What is CS-ROSETTA?
Components of the CS-ROSETTA System
How to Use CS-ROSETTA
Chemical Shift Input Format Used by CS-ROSETTA
How to Select Consistent Predictions in CS-ROSETTA
![]() |
CS-ROSETTA: System for chemical shifts based protein structure prediction using ROSETTA |
References:
[more]
|
To date, interpretation of isotropic chemical shifts in structural terms is largely based on empirical correlations gained from the mining of protein chemical shifts deposited in the BMRB, in conjunction with the known corresponding 3D structures. Chemical-Shift-ROSETTA (CS-ROSETTA) is a robust protocol to exploit this relation for de novo protein structure generation, using as input parameters the 13Cα, 13Cβ, 13C', 15N, 1Hα and 1HN NMR chemical shifts. These shifts are generally available at the early stage of the traditional NMR structure determination procedure, prior to the collection and analysis of structural restraints. The CS-ROSETTA approach, as shown below, utilizes SPARTA-based selection of protein fragments from the PDB, in conjunction with a regular ROSETTA Monte Carlo assembly and relaxation procedure. Evaluation of 16 proteins, varying in size from 56 to 129 residues yielded full atom models that deviate by 0.7-1.8 Å backbone rnsd from the experimentally determined X-ray or NMR structures. The strategy also has been successfully applied in a blind manner a set of structural genomics targets with molecular weights up to 16 kDa, whose conventional NMR structure determination was conducted in parallel. top |
|
click [+]/[-] to see/hide the expand view and details for a given component |
CS-ROSETTA system is desiged to, utilizing majorly the backbone and 13Cβ chemical shifts, to preparing and applying CS-ROSETTA structure generation. To use these features, users need to follow the below procedures to properly prepare and inspect their NMR input data. CS-ROSETTA requires an input chemical shift table of standard nmrPipe/TALOS format. An example portion of the required chemical shift table format is shown below (full example: ubiq.tab). Other examples can be found in the CSROSETTA/demo directory, or at the CS-ROSETTA Server site. Click [+] to see/hide the full details of the requirements of the chemical shift table format
CS-ROSETTA can also use chemical shift input in the BMRB NMR-Star format. Two conversion Unix shell scripts, bmrb2talos_v21.com and bmrb2talos_v31.com, are included with the POMONA package and can be used to convert a NMR-Star format (V2.1 and V3.1 respectively) chemical shift table to TALOS format. Example command lines for using these scripts are: bmrb2talos_v21.com bmrb_v21.str > inCS.tab bmrb2talos_v31.com bmrb_v31.str See also here for more details regarding the NMRPipe/TALOS format and NMRStar format.
Use of NOE contraints in CS-Rosetta is limitted to the step of Rosetta structure generation, therefore the
the NOE contraints must be prepared with a Rosetta compatible format, see
here
for all allowed formats by Rosetta. In summary, a general format for NOE constrains can be defined by
AtomPair such as:
#AtomPair: Atom1_Name Atom1_ResNum Atom2_Name Atom2_ResNum Func_Type Func_Def AtomPair H 3 H 112 BOUNDED 1.500 2.910 0.300 AtomPair H 7 H 108 BOUNDED 1.500 2.720 0.300 AtomPair H 9 H 106 BOUNDED 1.500 3.070 0.300ambiguous NOE constrains can be defined by AmbiguousNMRDistance such as:
#AmbiguousNMRDistance: Atom1_Name Atom1_ResNum Atom2_Name Atom2_ResNum Func_Type Func_Def # Ambiguous Distance between Atom1 and Atom2. The difference from AtomPair Constraint is that # atom names are specially parsed to detect ambiguous hydrogens, which are either experimentally # ambiguous or rotationally identical (like methyl hydrogens). The constraint applies to any # hydrogens equivalent to the named hydrogen. The logic for determining which hydrogens are which # is in src/core/scoring/constraints/AmbiguousNMRDistanceConstraints.cc:parse_NMR_name As the major inputs to CS-ROSETTA approach, the quality of the chemical shifts is therefore critical to achieve expected performance. The pre-check module from the TALOS-N/TALOS+ program can be use to apply a quality inspection for the chemical shift inputs:
TALOS-N/TALOS+ can identify possible referencing problems with the 13Cα,
13Cβ, 13C' and 1Hα chemical shift
inputs and possbiel chemical shift outliers when running a typical TALOS-N/TALOS+ command with an
additional talosn -in inCS.tab -checkThis module first converts the chemical shifts of each residue to secondary chemical shifts, and subsequently evaluates these by correlating 13Cα, 13Cβ, 13C' and 1Hα to the reference-free entity, 13Cα-13Cβ. The estimated chemical shift referencing offsets, as well as their corresponding fitting error, will be printed for 13Cα, 13Cβ, 13C' and 1Hα; this pre-check module will also identify residues with unusual chemical shifts, for which secondary chemical shifts fall outside the expected range. An example output of this module is with the following format: Chemical shift outlier checking... ... 64 E CB Secondary Shift: -3.800 Limit: -3.765 76 G C Secondary Shift: 4.250 Limit: 1.925 ! Chemical shift referencing checking... Estimated Referencing Offset for CA/CB: 0.795 +/- 0.104 ppm (Size: 66) Note that:
applyOffsetCorrection.com inCS.tab 2H isotope correction for 13Cα/13Cβ chemical shifts (Maltsev et al. J.Biol.NMR, 2012, 54, 181-191) is also required for chemical shifts measured for per-deuterated protein samples. To do this, a script applyIsotopeCorrection2CACB.com included in the POMONA package can be used with a following syntax:
applyIsotopeCorrection2CACB.com inCS.tab Note that scripts
CS-Rosetta has a default option ( top |
For a query protein with known backbone and 13Cβ chemical shifts,
CS-ROSETTA is designed for (1) searching the selected protein structural database for the
best matched 3-residue and 9-residue protein fragments, (2) running a ROSETTA structure generation procedure,
and (3) evaluating and selecting the generated Rosetta structures. These features
can be performed by using the master script
runFragPick_Rosetta3.com is used, which simply follow a procedure listed below:
To perform the Rosetta structure modeling, users need to run the generated script runCSRosetta in runCSRosetta directory.
Note that the script runCSRosetta generally require a manual modification for the Rosetta installation environment defined in the begenning of the script,
or for performing parallel jobs on computing clusters.
#!/bin/csh # # ====== ***PLEASE verify & modify below definitions*** ====== set rosettaBinDir = /Your_Rosetta_Directory/main/source/bin/ set rosettaBin = $rosettaBinDir/minirosetta.default.linuxgccrelease set rosettaMPIBin = $rosettaBinDir/minirosetta.mpi.linuxgccrelease set rosettaDB = /Your_Rosetta_Directory/main/database # =========================================================== $rosettaBin @flags -database $rosettaDB -out:file:silent default.out -out:file:scorefile default.sc #MPI command #mpirun -np 10 $rosettaMPIBin @flags -database $rosettaDB -out:file:silent default.out -out:file:scorefile default.sc After the CS-ROSETTA structure generation job is done, users need to run the analysis script analyzeCSRosetta .
A directory ExtractedPDBs will be generated, which includes:
More under construction ... top |
Criteria for convergence and accepting models
After finishing CS-ROSETTA structure generation, users have to decide whether
the ROSETTA models are acceptable. For this purpose, it is convenient to plot
the "landscape" of (re-scored) ROSETTA full-atom energies of all models with
respect to their C_alpha RMSD values relative to the lowest-energy model, using
the data stored in a file "
Number of models required By using the current method implemented in CS-ROSETTA package, 5,000 to 20,000 predicted CS-ROSETTA models are generally required to obtain convergence. For small proteins (<= 90-100 amino acids), 1,000 to 5,000 CS-ROSETTA models often suffice. ROSETTA takes about 5-10 minutes to calculate one all-atom model on a single 2.4GHz CPU. The number of Rosetta models to be generated can be specified by modifying the -nstruct option listed in the flags file .
top |