How to Get CS-ROSETTA Package

A stable version CS-ROSETTA software package (version 2.x) can be downloaded below. This version is implemented in a different way to the previous version (see here), in order to apply the recent patches and to include the new developments. When downloading software from this website, you are agreeing to our Terms of Use, including the terms that there is no right to privacy on this system, and that the software from this website is not to be redistributed without permission from the authors. The CS-ROSETTA package provides the hardware & OS versions of linux9, and mac (see here for a definition of those hardware & OS versions by the NMRPipe system), and requires multiple Unix programs and other external software packages in order to use all its features and/or to perform it in an efficient way (see details below).

The most common CS-ROSETTA installation procedure on an Unix environment (linux9 and mac) will involve:

  1. Create a directory for the CS-ROSETTA installation [for example, type mkdir /disk1/CSROSETTA in an "xterm" terminal window].
  2. Go to the selected install directory [cd /disk1/CSROSETTA].
  3. Download and store the CS-ROSETTA installation files (csRosetta.tZ, install.com) into the selected install directory
    • Via a web browser: Right-click the download links below and select "Save Target As", "Save Link As" or "Download Linked File (As)" (depending on the browser type), and save the files into the selected install directory (Be sure to retain the exact file name shown below).
    • Or via the unix command "wget":
      wget http://spin.niddk.nih.gov/bax/software/CSROSETTA/csrosetta.tZ
  4. uncompress the package csrosetta.tZ:

    tar -zxvf csrosetta.tZ

  5. This will generate a "stand-alone" initialization script csrosetta_init.com, which stores all required and optional environment variables for running the program; it also recommend a common way to apply the initialization, i.e., adding the following lines to the ~/.cshrc file:
    if (-e /disk1/CSROSETTA/csrosetta_init.com) then
       source /disk1/CSROSETTA/csrosetta_init.com
    endif
    Note: The environment variables in csrosetta_init.com script are required to be input and inspected manually by the users before run the program. Please check here to see a full list of the environment variables defined in this initialization script.

There is also a Web-Based version of CS-ROSETTA which can be used directly without installing CS-ROSETTA. However, due to our limited computing resources, the most time-consuming procedure of Rosetta structure generation is not provided by our server. CS-ROSETTA server will provide all required inputs/scripts for running the CS-ROSETTA structure generation, users have to run ROSETTA strture generation on their own. You can access this Web-based system, along with other facilities for manipulating chemical shifts, dipolar couplings, and molecular structures at the Bax Group NMR Server site:

CS-ROSETTA Installation Files

CS-ROSETTA Web Server

(Version 2.01 Rev 2019.06)
csrosetta.tZ [size: 2MB]

nmrserver_logo


Other Software Programs

In order to run CS-ROSETTA structure generation procedure, multiple external software programs are needed, which are listed below:

Software
Note
URL
TALOS-N required to check and prepare inputs for CS-ROSETTA.
TALOS-N provides its outputs as the required inputs for generating fragment candidates

http://spin.niddk.nih.gov/bax/software/TALOS-N/
ROSETTA required to prepare and run CS-ROSETTA.
Note: Only Rosetta 3.5 and later versions are supported by current CS-Rosetta package.
Also see here for a tutorial on how to install and use the Rosetta program.

https://www.rosettacommons.org/
BLAST required to prepare de novo fragment candidates for CS-ROSETTA.
BLAST generates amino acid sequence profile from sequence alignments.
Note 1: The C++ version BLAST+ is not currently supportted, please use the legacy versions
Note 2: nr database is recommended

https://ftp.ncbi.nlm.nih.gov/blast/executables/
https://ftp.ncbi.nlm.nih.gov/blast/db/

 
About CS-ROSETTA

To date, interpretation of isotropic chemical shifts in structural terms is largely based on empirical correlations gained from the mining of protein chemical shifts deposited in the BMRB, in conjunction with the known corresponding 3D structures. Chemical-Shift-ROSETTA (CS-ROSETTA) is a robust protocol to exploit this relation for de novo protein structure generation, using as input parameters the 13Cα, 13Cβ, 13C', 15N, 1Hα and 1HN NMR chemical shifts. These shifts are generally available at the early stage of the traditional NMR structure determination procedure, prior to the collection and analysis of structural restraints. The CS-ROSETTA approach, as shown below, utilizes SPARTA-based selection of protein fragments from the PDB, in conjunction with a regular ROSETTA Monte Carlo assembly and relaxation procedure. Evaluation of 16 proteins, varying in size from 56 to 129 residues yielded full atom models that deviate by 0.7-1.8 Å backbone rnsd from the experimentally determined X-ray or NMR structures. The strategy also has been successfully applied in a blind manner a set of structural genomics targets with molecular weights up to 16 kDa, whose conventional NMR structure determination was conducted in parallel.

top
CS-ROSETTA Flowchart CS-ROSETTA flowchart

Components of the CS-ROSETTA System The CS-ROSETTA core system is implemented in the C++ language. Moreover, multiple Unix shell scripts are provided in the CS-ROSETTA package to evaluate and prepare the inputs, analyze the output, and so on. All files/scripts and directories of the CS-ROSETTA system include:
csrosetta
[+] master script to run CS-ROSETTA (click left to see all allowed options).
   
csrosetta_init.com
[+] initialization script to define all required and optional environment variables to run CS-ROSETTA master script csrosetta. (need to be filled by users after installation)
   
bin/ [+] directory for all compiled binary files for Linux (*.linux9, *.static.linux9) and MacOS (*.mac)
 
scripts/ [+] directory for all required utility scripts of CS-ROSETTA
   
demo/ [+] directory with example chemical shift input data and scripts for a demo of CS-ROSETTA.
   

click [+]/[-] to see/hide the expand view and details for a given component

top

NMR Input Data for CS-ROSETTA

CS-ROSETTA system is desiged to, utilizing majorly the backbone and 13Cβ chemical shifts, to preparing and applying CS-ROSETTA structure generation. To use these features, users need to follow the below procedures to properly prepare and inspect their NMR input data.

Chemical Shift Data Format and Requirements
CS-ROSETTA requires an input chemical shift table of standard nmrPipe/TALOS format. An example portion of the required chemical shift table format is shown below (full example: ubiq.tab). Other examples can be found in the CSROSETTA/demo directory, or at the CS-ROSETTA Server site.

Click [+]  to see/hide the full details of the requirements of the chemical shift table format

CS-ROSETTA can also use chemical shift input in the BMRB NMR-Star format. Two conversion Unix shell scripts, bmrb2talos_v21.com and bmrb2talos_v31.com, are included with the POMONA package and can be used to convert a NMR-Star format (V2.1 and V3.1 respectively) chemical shift table to TALOS format.  Example command lines for using these scripts are:
bmrb2talos_v21.com bmrb_v21.str > inCS.tab
bmrb2talos_v31.com bmrb_v31.str

See also here for more details regarding the NMRPipe/TALOS format and NMRStar format.

NOE Constraint Data Format [+]

RDC Constraint Data Format [+]

Chemical Shift Data Inspection

As the major inputs to CS-ROSETTA approach, the quality of the chemical shifts is therefore critical to achieve expected performance. The pre-check module from the TALOS-N/TALOS+ program can be use to apply a quality inspection for the chemical shift inputs:

TALOS-N/TALOS+ can identify possible referencing problems with the 13Cα, 13Cβ, 13C' and 1Hα chemical shift inputs and possbiel chemical shift outliers when running a typical TALOS-N/TALOS+ command with an additional -check option, for example by using the command line input argument:

talosn -in inCS.tab -check
This module first converts the chemical shifts of each residue to secondary chemical shifts, and subsequently evaluates these by correlating 13Cα, 13Cβ, 13C' and 1Hα to the reference-free entity, 13Cα-13Cβ. The estimated chemical shift referencing offsets, as well as their corresponding fitting error, will be printed for 13Cα, 13Cβ, 13C' and 1Hα; this pre-check module will also identify residues with unusual chemical shifts, for which secondary chemical shifts fall outside the expected range. An example output of this module is with the following format:
   Chemical shift outlier checking...
     ...
     64 E CB Secondary Shift: -3.800 Limit: -3.765
     76 G  C Secondary Shift:  4.250 Limit:  1.925 !

   Chemical shift referencing checking...
      Estimated Referencing Offset for CA/CB: 0.795 +/- 0.104 ppm (Size: 66)

Note that:

  • An offset correction generally is only needed when the estimated referencing offset exceeds the average fitting error by more than about five standard deviations. To apply the offset correction, a script applyOffsetCorrection.com included in the POMONA package can be used with a following syntax:
  • applyOffsetCorrection.com inCS.tab
    
  • The chemical shift outliers, especially those with highly unusual chemical shifts, for which secondary chemical shifts deviate from the expected range by more than 2 times of the normal range of secondary chemical shifts, may correspond to experimental errors, and need to be inspected carefully prior to using them. For example, as shown in the above example, the identified chemical shift outlier from residue 76 correspond to a C-terminal carboxylate instead of a backbone carbonyl.

2H isotope correction for 13Cα/13Cβ chemical shifts (Maltsev et al. J.Biol.NMR, 2012, 54, 181-191) is also required for chemical shifts measured for per-deuterated protein samples. To do this, a script applyIsotopeCorrection2CACB.com included in the POMONA package can be used with a following syntax:
applyIsotopeCorrection2CACB.com inCS.tab

Note that scripts applyOffsetCorrection.com and applyIsotopeCorrection2CACB.com will apply corections to the orginal chemical shift input files, while the orginal input file is re-named with a .orig suffix.

CS-Rosetta has a default option (-offset) to check the referencing offset and apply the possible correction for the chemical shift input, as well as an option (-iso) to apply the 2H isotope correction on the fly. However, it is still recommended to users to properly prepare and carefully inspect their chemical shifts prior to using them as input to CS-Rosetta.


Handling Flexible Tails/Loops [+]

top

How to Use CS-ROSETTA

For a query protein with known backbone and 13Cβ chemical shifts, CS-ROSETTA is designed for (1) searching the selected protein structural database for the best matched 3-residue and 9-residue protein fragments, (2) running a ROSETTA structure generation procedure, and (3) evaluating and selecting the generated Rosetta structures. These features can be performed by using the master script csrosetta and the scripts generated by it, for which the most common procedure on an Unix environment will involve:

  1. Create a directory for the prediction session; all subsequent commands will be executed from this directory.
  2. Prepare the input table of chemical shift assignments (for example "myshifts.tab") with a proper format (see the previous section); please also carefully inspect the chemical shifts for the possible referencing offset and outliers.
  3. Run CS-Rosetta master script csrosetta. Most commonly, this will simply require a command line such as:
    csrosetta -in myshifts.tab
    This will perform fragment generation, prepare inputs and script for running the Rosetta structure generation and the structure analysis, for which the details are listed below:
I. Protein Fragments Generation Originally, the MFR program was used by CS-Rosetta for finding the matched short fragments from a selected structural database. This step now is performed in CS-Rosetta by using a newer fragment picker integrated in the Rosetta Software Suite (Version 3.0 and newer)(see ref). To run the framgent searching using the CS-Rosetta master script, a Rosetta 3.0 fragment picker based script runFragPick_Rosetta3.com is used, which simply follow a procedure listed below:
    • TALOS+ or TALOS-N prediction is first performed for predicting various structural factors, such as the backbone torsion angles (pred.tab) and the secondary structure (predSS.tab), which are used as the addtional inputs for the following step of fragment picking.
    • The script makeBlastCheckpoint.com is then executed to (run Psi-blast program to) generate the amino acid sequence profile information, which is used as the required inputs for the Rosetta fragment picking procedure. It uses a FASTA sequence file t000_.fasta as input, generates a sequence profile file t000_.checkpoint and a homology file t000_.homologs.
    • A Rosetta command file flags Rosetta command file to run chemical shift based fragment picking.

      -database                 /Rosetta/database
      -in:file:vall             /Rosetta/vall.apr24.2008.extended
      -frags:n_frags            200
      -frags:frag_sizes         3 9
      -frags:scoring:config     scores.cfg
      -in:file:checkpoint       t000_.checkpoint
      -frags:denied_pdb         t000_.homologs
      -in:file:talos_cs         ubiq.tab
      -in:file:fasta            t000_.fasta
      -frags:ss_pred            predSS.tab talos
      -in:file:talos_phi_psi    pred.tab
      -frags:sigmoid_cs_A       2
      -frags:sigmoid_cs_B       4
      
      -frags:describe_fragments frags.fsc.score
      -out:file:frag_prefix     t000_
      
      and a Rosetta scoring definition file scores.cfg Rosetta scoring file to define the weights and priority of various inputs for running chemical shift based fragment picking.

      # score name       priority    wght   min_allowed  extras
      CSScore             400        1.5            -          
      ProfileScoreL1      300        1.5            -          
      TalosSSSimilarity   200        0.25           -    talos 
      RamaScore           100        1              -    talos 
      PhiPsiSquareWell     50        0.15           -          
      
      is generated, Rosetta fragment picking is then performed to generate two sets of de novo fragments t000_.200.3mers.gz and t000_.200.3mers.gz.
    • After fragment generation, the required files and scripts to run the following CS-ROSETTA structure generation procedure are generated, and stored in a directory of csRosetta which includes:
      csRosetta/
      [+]

    • More under construction ...
CS-ROSETTA Server
nmrserver_logo CS-ROSETTA Server can be used for the generating fragments and all required inputs to run CS-ROSETTA struture generation. Users need to submit their input chemical shift file to the server, the server will send the results back via email.


II. CS-ROSETTA Structure Generation
To perform the Rosetta structure modeling, users need to run the generated script runCSRosetta in runCSRosetta directory. Note that the script runCSRosetta generally require a manual modification for the Rosetta installation environment defined in the begenning of the script, or for performing parallel jobs on computing clusters.

Script runCSRosetta
#!/bin/csh
#

# ====== ***PLEASE verify & modify below definitions*** ======
set rosettaBinDir = /Your_Rosetta_Directory/main/source/bin/
set rosettaBin    = $rosettaBinDir/minirosetta.default.linuxgccrelease
set rosettaMPIBin = $rosettaBinDir/minirosetta.mpi.linuxgccrelease
set rosettaDB     = /Your_Rosetta_Directory/main/database
# ===========================================================

$rosettaBin @flags -database $rosettaDB -out:file:silent default.out -out:file:scorefile default.sc

#MPI command
#mpirun -np 10 $rosettaMPIBin @flags -database $rosettaDB -out:file:silent default.out -out:file:scorefile default.sc


After the CS-ROSETTA structure generation job is done, users need to run the analysis script analyzeCSRosetta. A directory ExtractedPDBs will be generated, which includes:
ExtractedPDBs/
[+]

More under construction ...

top


How to Select Consistent CS-ROSETTA Predictions
Criteria for convergence and accepting models

After finishing CS-ROSETTA structure generation, users have to decide whether the ROSETTA models are acceptable. For this purpose, it is convenient to plot the "landscape" of (re-scored) ROSETTA full-atom energies of all models with respect to their C_alpha RMSD values relative to the lowest-energy model, using the data stored in a file "name.rms.rescore.txt".

  1. Converged:
    If the 10 lowest energy models all differ by less than 2 Å Cα-RMSD from the model with the lowest (re-scored) energy (see the following example plot from the structure prediction of protein GB3), the structure prediction is deemed successful and the 10 lowest energy models are accepted. Although results where clustering around the lowest energy structure is less tight than 2 Å may still be useful for further analysis, such results should not be over-interpreted and could be in error.
    Convergence Plot for Protein gb3 Result for gb3
  2. Not Converged:
    If no clustering around low energy models is observed (see the following example plot generated for protein nsp1), the structure prediction has not converged and the low energy models can not be accepted at this stage
    Convergence Plot for Protein nsp1 Result for nsp1


Number of models required

By using the current method implemented in CS-ROSETTA package, 5,000 to 20,000 predicted CS-ROSETTA models are generally required to obtain convergence. For small proteins (<= 90-100 amino acids), 1,000 to 5,000 CS-ROSETTA models often suffice. ROSETTA takes about 5-10 minutes to calculate one all-atom model on a single 2.4GHz CPU. The number of Rosetta models to be generated can be specified by modifying the -nstruct option listed in the flags file .


top


* All documents in PDF format require the free Adobe Acrobat Reader application for viewing

[ Home ] [ NIH ] [ NIDDK ] [ Terms of Use ]

last update: Jun 6 2020 / sy