How to Get POMONA Package

A stable version POMONA software package can be downloaded below. When downloading software from this website, you are agreeing to our Terms of Use, including the terms that there is no right to privacy on this system, and that the software from this website is not to be redistributed without permission from the authors. The POMONA package provides the hardware & OS versions of linux9, and mac (see here for a definition of those hardware & OS versions by the NMRPipe system), and requires multiple Unix programs and other external software packages in order to use all its features and/or to perform it in an efficient way (see details below).

The most common POMONA installation procedure on an Unix environment (linux9 and mac) will involve:

  1. Create a directory for the POMONA installation [for example, type mkdir /disk1/POMONA in an "xterm" terminal window].
  2. Go to the selected install directory [cd /disk1/POMONA].
  3. Download and store the POMONA installation files (pomona.tZ, install.com) and database file (pomonaData.tZ) into the selected install directory
    • Via a web browser: Right-click the download links below and select "Save Target As", "Save Link As" or "Download Linked File (As)" (depending on the browser type), and save the files into the selected install directory (Be sure to retain the exact file name shown below).
    • Or via the unix command "wget":
      wget http://spin.niddk.nih.gov/bax/software/POMONA/pomona.tZ
      wget http://spin.niddk.nih.gov/bax/software/POMONA/pomonaData.tZ
      wget http://spin.niddk.nih.gov/bax/software/POMONA/install.com
  4. Execute the install.com script:
    • In most cases, no arguments will be required.
    • It will be sufficient to make the install scripts executable (by typing "chmod +x install.com"), then run the install.com script.
    • Use the command ./install.com +help to generate a list of install command-line options.
  5. The installation will generate a "stand-alone" initialization script pomona_init.com, which stores all required and optional environment variables for running the program; it also recommend a common way to apply the initialization, i.e., adding the following lines to the ~/.cshrc file:
    if (-e /disk1/POMONA/pomona_init.com) then
       source /disk1/POMONA/pomona_init.com
    endif
    Note: All required environment variables in the pomona_init.com script will be filled during the installation procedure, while the optional environment variables are required to be input manually by the users after the installation. Please check here to see a full list of the environment variables defined in this initialization script.

There is also a Web-Based version of POMONA which can be used directly without installing POMONA. However, due to our limited computing resources, the CS-RosettaCM structure generation, the most time-consuming procedure of the POMONA/CS-RosettaCM, is not provided by our server. POMONA server will provide all required inputs/scripts for running the CS-RosettaCM structure generation, users have to run CS-RosettaCM on their own. You can access this Web-based system, along with other facilities for manipulating chemical shifts, dipolar couplings, and molecular structures at the Bax Group NMR Server site:

POMONA/CS-RosettaCM Installation Files

POMONA Web Server

(Version 1.00 Rev 2014.301.14.56)
install.com [size: 9KB]
pomona.tZ [size: 24MB]
pomonaData.tZ [size: 2.8GB]

nmrserver_logo


Other Software Programs

By default, only the TALOS-N program is required by POMONA for searching structure alignments from the PDB. However, in order to run other optional modules and the CS-RosettaCM approach, multiple external software programs are needed, which are listed below:

Software
Note
URL
TALOS-N REQUIRED to prepare inputs for all POMONA alignments.
TALOS-N provides its outputs as the required inputs for running POMONA alignments

http://spin.niddk.nih.gov/bax/software/TALOS-N/
ROSETTA required to prepare and run CS-RosettaCM and CS-Rosetta approaches.
POMONA provides all required inputs/scripts for running the CS-RosettaCM comparative modeling, for which the RosettaCM module from Rosetta software suite is needed.
Note: Rosetta versions of 2014 or newer ONLY
Note: See here also for a tutorial on how to install and use the Rosetta program.

https://www.rosettacommons.org/
DSSP
required to prepare the database for pairwise POMONA alignments.
DSSP assigns secondary structure from PDB coordinates
http://swift.cmbi.ru.nl/gv/dssp/
REDUCE required to prepare the database for pairwise POMONA alignments.
REDUCE adds hydrogens to PDB coordinates
http://kinemage.biochem.duke.edu/software/reduce.php
BLAST required to prepare de novo fragment candidates for CS-Rosetta/CS-RosettaCM.
BLAST generates amino acid sequence profile from sequence alignments.
Note 1: Do NOT use the C++ version BLAST+
Note 2: nr database is recommended

ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy/
ftp://ftp.ncbi.nlm.nih.gov/blast/db/

 
About POMONA and CS-RosettaCM

POMONA is a newly designed NMR chemical shift guided protein structure alignment method to generate protein templates with the best matched local structure from the Protein Data Bank. It is based on experimental input data comprising 13Cα, 13Cβ, 13C', 15N, 1Hα, and 1HN NMR chemical shifts, plus sparse NOEs if available, and directly exploits the powerful bioinformatics algorithms previously developed for sequence-based homology modeling. POMONA is shown to identify structural homologues in the absence of significant sequence similarity.

When in combination with a subsequent chemical shift based Rosetta comparative modeling (CS-RosettaCM), the well selected POMONA structural templates enable the generation of full atom models that are demonstrated to match well to the corresponding structures experimentally derived from X-ray diffraction or NMR data. The POMONA/CS-RosettaCM protocol is proven as an alternate approach to protein structure determination from a minimal set of NMR input data, which is applicable to larger proteins representing a wide variety of folds.

top
POMONA/CS-RosettaCM Flowchart POMONA/CS-RosettaCM flowchart

Components of the POMONA System The POMONA core system is implemented in the C++ language. Moreover, multiple Unix shell scripts are provided in the POMONA package to evaluate and prepare the inputs, analyze the output, and so on. All files/scripts and directories of the POMONA system include:
pomona
[+] master script to run POMONA/CS-RosettaCM (click left to see all allowed options).
   
pomona_init.com
[+] initialization script to define all required and optional environment variables to run POMONA master script pomona.
   
bin/ [+] directory for all compiled binary files for Linux (*.linux9, *.static.linux9) and MacOS (*.mac)
 
scripts/ [+] directory for all required utility scripts of POMONA
   
data/ [+] directory for required data files of POMONA
   
demo/ [+] directory with example chemical shift input data and scripts for a demo of POMONA.
   

click [+]/[-] to see/hide the expand view and details for a given component

top

NMR Input Data for POMONA/CS-RosettaCM

POMONA system is desiged to, utilizing majorly the backbone and 13Cβ chemical shifts, to (1) identifying all possible structural homologues from the PDB, (2) generating the pairswise structural alignment to specific protein(s), and (3) preparing and applying CS-RosettaCM comparative modeling. To use these features of POMONA, users need to follow the below procedures to properly prepare and inspect their NMR input data.

Chemical Shift Data Format and Requirements
POMONA requires an input chemical shift table of standard nmrPipe/TALOS format. An example portion of the required chemical shift table format is shown below (full example: ubiq.tab). Other examples can be found in the POMONA/demo directory, or at the POMONA Server site.

Click [+]  to see/hide the full details of the requirements of the chemical shift table format

POMONA can also use chemical shift input in the BMRB NMR-Star format. Two conversion Unix shell scripts, bmrb2talos_v21.com and bmrb2talos_v31.com, are included with the POMONA package and can be used to convert a NMR-Star format (V2.1 and V3.1 respectively) chemical shift table to TALOS format. Example command lines for using these scripts are:
bmrb2talos_v21.com  bmrb_v21.str > inCS.tab
bmrb2talos_v31.com  bmrb_v31.str

See also here for more details regarding the NMRPipe/TALOS format and NMRStar format.

NOE Constraint Data Format [+]

Chemical Shift Data Inspection

As chemical shifts are the major inputs to POMONA/CS-Rosetta approach, their quality is therefore critical to achieve expected performance. The pre-check module from the TALOS-N prgram can be use to apply a quality inspection for the chemical shift inputs:

TALOS-N can identify possible referencing problems with the 13Cα, 13Cβ, 13C' and 1Hα chemical shift inputs and possible chemical shift outliers when running a typical TALOS-N command with an additional -check option, for example by using the command line input argument:

talosn -in inCS.tab -check
This module first converts the chemical shifts of each residue to secondary chemical shifts, and subsequently evaluates these by correlating 13Cα, 13Cβ, 13C' and 1Hα to the reference-free entity, 13Cα-13Cβ. The estimated chemical shift referencing offsets, as well as their corresponding fitting error, will be printed for 13Cα, 13Cβ, 13C' and 1Hα; this pre-check module will also identify residues with unusual chemical shifts, for which secondary chemical shifts fall outside the expected range. An example output of this module is with the following format:
   Chemical shift outlier checking...
     ...
     64 E CB Secondary Shift: -3.800 Limit: -3.765
     76 G  C Secondary Shift:  4.250 Limit:  1.925 !

   Chemical shift referencing checking...
      Estimated Referencing Offset for CA/CB: 0.795 +/- 0.104 ppm (Size: 66)

Note that:

  • An offset correction generally is only needed when the estimated referencing offset exceeds the average fitting error by more than about five standard deviations. To apply the offset correction, a script applyOffsetCorrection.com included in the POMONA package can be used with a following syntax:
  • applyOffsetCorrection.com inCS.tab
    
  • The chemical shift outliers, especially those with highly unusual chemical shifts, for which secondary chemical shifts deviate from the expected range by more than 2 times of the normal range of secondary chemical shifts, may correspond to experimental errors, and need to be inspected carefully prior to using them. For example, as shown in the above example, the identified chemical shift outlier from residue 76 correspond to a C-terminal carboxylate instead of a backbone carbonyl.

2H isotope correction for 13Cα/13Cβ chemical shifts (Maltsev et al. J.Biol.NMR, 2012, 54, 181-191) is also required for chemical shifts measured for per-deuterated protein samples. To do this, a script applyIsotopeCorrection2CACB.com included in the POMONA package can be used with a following syntax:
applyIsotopeCorrection2CACB.com inCS.tab

Note that scripts applyOffsetCorrection.com and applyIsotopeCorrection2CACB.com will apply corections to the orginal chemical shift input files, while the orginal input file is re-named with a .orig suffix.

POMONA has a default option (-offset) to check the referencing offset and apply the possible correction for the chemical shift input, as well as an option (-iso) to apply the 2H isotope correction on the fly. However, it is still recommended to users to properly prepare and carefully inspect their chemical shifts prior to using them as input to POMONA.


Handling Flexible Tails/Loops [+]

top

How to Use POMONA/CS-RosettaCM

For a query protein with known backbone and 13Cβ chemical shifts, POMONA is designed for (1) searching the Protein Data Bank (PDB) for proteins with best matched (local) structure, (2) aligning to any specific protein(s) according to their local structures, and (3) prepare all data and scripts for running a CS-RosettaCM structure generation procedure.

Protein Structure Database Searching by POMONA To use POMONA for searching structure alignments from the PDB, users can simply follow a procedure listed below:
  1. Create a directory for the prediction session; all subsequent commands will be executed from this directory.
  2. Prepare the input table of chemical shift assignments (for example "myshifts.tab") with a proper format (see the previous section); please also carefully inspect the chemical shifts for the possible referencing offset and outliers.
  3. Run POMONA master script (pomona) to perform the PDB database searching. Most commonly, this will simply require a command line such as:
    pomona -in myshifts.tab
    • It first runs TALOS-N program for predicting various structural factors, such as the backbone torsion angles and the secondary structure, which are used as inputs for the following step of POMONA database searching.
    • All protein chains (~225,000) in the PDB are divided into 11 subsets, each of which is used to construct a pre-defined POMONA PDB database (with a name of pdb_??.and stored at $POMONA_DIR/data/pdb/). During the PDB database searching, a summary file "pomona.pdb_??.tab" is created to store the best aligned structures obtained from searching a given PDB subset.
    • After searching all PDB subsets, a final clustering and summarizing step is performed for all identified alignments. By default, top 1000 alignments with the highest alignment score are kept, the number of selected alignments can be specified with an option -count POMONA option to specify the number of selected alignments for output, the alignments are selected based on their total alignment score.

      Default: 1000
      . For those selected alignments, a clustering procedure is performed to identify aligned structures with similar global conformation, by using a default distance cutoff value of the normalized Cα-RMSD of 4 Å (or a cutoff value speicified by an option of -rmsdCut POMONA option to specify the distance cutoff used for clustering the top selected alignments in terms of the similarity of their global conformation, which is measured with a metric of the normalized Cα-RMSD calculated for the aligned residues.

      Default: 4 Å
      ). Two final files pomona_sum.tab Default POMONA summary file, which stores the alignment summary of the top selected alignments. See below for an excerpt of this file for a test protein ubiquitin:
      and pomona_aln.tab Default POMONA alignment file, which stores the details of the top selected alignments. See below for an excerpt of this file for a test protein ubiquitin:
      are generated, for storing the summary of the identified alignments and the details of the structure alignments, respectively.
      Sample POMONA Summary File "pomona_sum.tab"
      VARS   INDEX PDB R1 RN D_R1 D_RN SCORE LEN GAP_SIZE MISMATCH IDENTITY DB_VOL DB_R1 DB_RN CLUST_ID CLUST_MEMCNT 
      FORMAT %4d %5s %4d %4d %4d %4d %6.2f %3d %3d %3d %3d  %s %8d %8d  %3d %3d
      
         1 3ehvC    1   71    1   71 264.13  71   0   1 100   pdb_06   582555   582626    1 915
         2 2mjbA    1   72    1   72 263.33  72   0   1 100   pdb_10   602798   602873    1 915
         3 4k7uA    1   72    1   72 262.22  72   0   0 100   pdb_09  1994953  1995024    1 915
         4 2qhoG    1   72    1   72 261.35  72   0   0 100   pdb_04  4300621  4300692    1 915
         5 2peaB    1   72    1   72 260.43  72   0   0 100   pdb_04  3603777  3603848    1 915
         6 2mbqB    1   72    1   72 260.27  72   0   0 100   pdb_10   592249   592320    1 915
         ...
      

      Sample POMONA Alignment File "pomona_aln.tab"
      VARS   INDEX PDB TAG R1 ALIGNMENT_STR 
      FORMAT %4d %5s %5s %4d %s
      
         1 3ehvC ALN_Q    0 0999999999999999999967678787777779999889999999999980089678799999999996m
         1 3ehvC SS_CS    1 LEEEEEELLLLEEEEEELLLLLHHHHHHHHHHHLLLLHHLLEEEELLEELLLLLLHHHHLLLLLLEEEEEE
         1 3ehvC QUERY    1 MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVL
         1 3ehvC IDENT    0 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
         1 3ehvC SBJCT    1 MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVL
         1 3ehvC  DSSP    1 LEEEEELLLLLEEEEELLLLLEHHHHHHHHHHHHLLLHHHEEEEELLEELLLLLELHHHLLLLLLEEEEEE
         ...
      

    • The PDB database searching by POMONA typically takes hours to days for a protein with normal size, therefore the parallel searching mode is highly recommended when multiple cores are available for a computer or a computer cluster, which can be speficied by an option of -np Number of total processes for POMONA alignment.

      Default: 10
      .
  4. Note that the default alignment settings will fulfill the needs of most users, however, the alignment setting can be customzied:
    [+]
  5. More under construction ...
POMONA Server
nmrserver_logo POMONA Server can be used for the purpose of searching structure alignments from the PDB, users need to submit their input chemical shift file to the server with a default All PDB Proteins option. The server will send the results to the users via email. Note that only the final alignment files pomona_sum.tab and pomona_aln.tab will be sent.


Pairwise Comparison of Protein Structures by POMONA To use POMONA for searching an optimal alignment to a known protein or a set of known proteins, users can simply follow a procedure listed below:
  1. Create a directory for the prediction session; all subsequent commands will be executed from this directory.
  2. Prepare the input table of chemical shift assignments (for example "myshifts.tab") with a proper format (see the previous section); please also carefully inspect the chemical shifts for the possible referencing offset and outliers.
  3. Run POMONA master script (pomona) to perform the structure alignment:
    • Most commonly, if the proteins to be aligned have structure deposited in the PDB, this will simply require a command such as:
      pomona -in myshifts.tab -db 1d3zA 1ubqA 1ubi
      where the name(s) after the option -db are the PDB identifier of the protein(s) to be aligned, and shoule be a standard 4-letter PDB identifier and with an optional character for the chain id. If multiple proteins are used, their names should be separated by a space character. Note, as the script needs to download file(s) from PDB website, the network accessibility is therefore required, as well an unix/linux program wget to perform the download (Mac OS may need a manual installation of wget program)

    • If users want to apply the structure alignment to proteins with a local PDB file, a command such as listed below should be used:
      pomona -in myshifts.tab -db 1d3zA.pdb 1ubqA.pdb 1ubi.pdb
      where the names after the option -db are for the PDB file of the protein(s) to be aligned. Note that the PDB file should have a standard format as those from the PDB database.
    The program first generates a POMONA database by using the PDB files (downloaded from the PDB website or provided by the users), then applies a clustering and summarization procedure, and generates a final summary output file pomona_sum.tab Default POMONA summary file, which stores the alignment summary of the top selected alignments. See below for an excerpt of this file from a pairwise alignment between a test protein ubiquitin and its multiple homologues, this file can also be found at $POMONA_DIR/demo/pairwise/pomona_sum.tab:
    VARS INDEX PDB R1 RN D_R1 D_RN SCORE LEN GAP_SIZE MISMATCH IDENTITY DB_VOL DB_R1 DB_RN CLUST_ID CLUST_MEMCNT 
    FORMAT %4d %5s %4d %4d %4d %4d %6.2f %3d %3d %3d %3d  %s %8d %8d  %3d %3d
    
       1 1ubiA    1   71    1   71 255.83  72   0   1 100   pdb_00      153      228    1   3 
       2 1d3zA    1   71    1   71 254.64  72   0   1 100   pdb_00        1       76    1   3 
       3 1ubqA    1   71    1   71 254.44  72   0   1 100   pdb_00       77      152    1   3 
    
    and a alignment output file pomona_aln.tab Default POMONA alignment file, which stores the details of the top selected alignments. See below for an excerpt of this file from a pairwise alignment between protein ubiquitin and its homologues, this file can also be found at $POMONA_DIR/demo/pairwise/pomona_aln.tab:
    VARS   INDEX PDB TAG R1 ALIGNMENT_STR 
    FORMAT %4d %5s %5s %4d %s
    
       1 1ubiA ALN_Q    0 09999999999999999998675888888887699898898799999999900997777999999999999m
       1 1ubiA SS_CS    1 LEEEEEELLLLEEEEEELLLLLHHHHHHHHHHHLLLLHHLLEEEELLEELLLLLLHHHHLLLLLLEEEEEEE
       1 1ubiA QUERY    1 MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLR
       1 1ubiA IDENT    0 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
       1 1ubiA SBJCT    1 MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLR
       1 1ubiA  DSSP    1 LEEEEEELLLLEEEEELLLLLEHHHHHHHHHHHHLLLHHHEEEEELLEELLLLLEHHHHLLLLLLEEEEEEL
    
       2 1d3zA ALN_Q    0 09999999999999999999776888777776699998787899999999900996787999899999998m
       2 1d3zA SS_CS    1 LEEEEEELLLLEEEEEELLLLLHHHHHHHHHHHLLLLHHLLEEEELLEELLLLLLHHHHLLLLLLEEEEEEE
       2 1d3zA QUERY    1 MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLR
       2 1d3zA IDENT    0 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
       2 1d3zA SBJCT    1 MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLR
       2 1d3zA  DSSP    1 LEEEEELLLLLEEEEELLLLLEHHHHHHHHHHHHLLLHHHEEEEELLEELLLLLELHHHLLLLLLEEEEEEL
    
       3 1ubqA ALN_Q    0 09999999999999999999675888877777598998898999999999900996677999999999999m
       3 1ubqA SS_CS    1 LEEEEEELLLLEEEEEELLLLLHHHHHHHHHHHLLLLHHLLEEEELLEELLLLLLHHHHLLLLLLEEEEEEE
       3 1ubqA QUERY    1 MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLR
       3 1ubqA IDENT    0 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
       3 1ubqA SBJCT    1 MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLR
       3 1ubqA  DSSP    1 LEEEEEELLLLEEEEELLLLLEHHHHHHHHHHHHLLLHHHEEEEELLEELLLLLELHHHLLLLLLEEEEEEL
    
    Sample POMONA Summary File "pomona_sum.tab" for Pairwise Alignment
    VARS INDEX PDB R1 RN D_R1 D_RN SCORE LEN GAP_SIZE MISMATCH IDENTITY DB_VOL DB_R1 DB_RN CLUST_ID CLUST_MEMCNT 
    FORMAT %4d %5s %4d %4d %4d %4d %6.2f %3d %3d %3d %3d  %s %8d %8d  %3d %3d
    
       1 1ubiA    1   71    1   71 255.83  72   0   1 100   pdb_00      153      228    1   3 
       2 1d3zA    1   71    1   71 254.64  72   0   1 100   pdb_00        1       76    1   3 
       3 1ubqA    1   71    1   71 254.44  72   0   1 100   pdb_00       77      152    1   3 
    

    Sample POMONA Alignment File "pomona_aln.tab" for Pairwise Alignment
    VARS   INDEX PDB TAG R1 ALIGNMENT_STR 
    FORMAT %4d %5s %5s %4d %s
    
       1 1ubiA ALN_Q    0 09999999999999999998675888888887699898898799999999900997777999999999999m
       1 1ubiA SS_CS    1 LEEEEEELLLLEEEEEELLLLLHHHHHHHHHHHLLLLHHLLEEEELLEELLLLLLHHHHLLLLLLEEEEEEE
       1 1ubiA QUERY    1 MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLR
       1 1ubiA IDENT    0 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
       1 1ubiA SBJCT    1 MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLR
       1 1ubiA  DSSP    1 LEEEEEELLLLEEEEELLLLLEHHHHHHHHHHHHLLLHHHEEEEELLEELLLLLEHHHHLLLLLLEEEEEEL
    ...
    
  4. more under construction ...

POMONA Server
nmrserver_logo POMONA Server can be used for pairwise structure alignment. When submit chemical shift input file of the query protein to the server, users need to (1) select the option of Selceted PDB Protein(s) and (2) type in the PDB identifer of the protein(s) to be aligned. Note that the option to align a local PDB file currently is not provided by the server.


POMONA/CS-RosettaCM Structure Generation
I. Prepare CS-RosettaCM running data
POMONA generates multiple structure alignments, which can be used as structural templates for running a CS-RosettaCM structure generation procedure. To generate structural templates from POMONA alignments and other required inputs and scripts for running CS-RosettaCM structure generation, users can use an option -rosettaCM in addition to other options to running POMONA alignment, such as:
pomona -in myshifts.tab -rosettaCM
which performs the following procedures:
  1. It firstly runs POMONA to identify 1000 (or any number defined by the -count option) top structure alignments and applies clustering to them in terms of the similarity of their global conformation, see above sections for the details of running POMONA alignment.
  2. Next, it selects top 2 alignments (or any number of alignments specified by the option -NperCluster) from top 10 clusters (or any number of clusters specified by the option -NSelCluster), which are stored at an output file pomona_selected.aln.
  3. It then generates structure templates from the top selected alignments (pomona_selected.aln) and all other required inputs/scripts for running a CS-RosettaCM comparative modeling, which are all stored in a directory of csRosettaCM and include:
    csRosettaCM/
    [+]

II. Run CS-RosettaCM
Users then need to run the generated script runCSRosettaCM in csRosettaCM directory to start the CS-RosettaCM structure modeling. Note that the script runCSRosettaCM may require a manual modification if the Rosetta installation environment is not properly configured, or for performing parallel jobs on computing clusters.
Script runCSRosettaCM
#!/bin/csh
#

# ====== ***PLEASE verify & modify below definitions*** ======
set rosettaBinDir = /Your_Rosetta_Directory/main/source/bin/
set rosettaBin    = $rosettaBinDir/rosetta_scripts.default.linuxgccrelease
set rosettaMPIBin = $rosettaBinDir/rosetta_scripts.mpi.linuxgccrelease
set rosettaDB     = /Your_Rosetta_Directory/main/database
# ===========================================================

$rosettaBin -parser:protocol rosetta_cm.xml @flags -database $rosettaDB \
            -out:file:silent default.out -out:file:scorefile default.sc
# other options: -hybridize:starting_template [IntergerVector] 'to define starting templates' 

#MPI command
#mpirun -np 10 $rosettaMPIBin -parser:protocol rosetta_cm.xml @flags -database $rosettaDB \
                              -out:file:silent default.out -out:file:scorefile default.sc

After the CS-RosettaCM structure modeling job is done, users need to run the analysis script analyzeCSRosettaCM (after a modification or confirmation for the Rosetta installation environment defined in it) in csRosettaCM directory. It generates a directory with a name ExtractedPDBs for storing all models and the analysis tables, which includes:
ExtractedPDBs/
[+]

For more information regarding the evaulation of the generated data, please see the next section of "How to Select Consistent POMONA/CS-RosettaCM Predictions".


More under construction ...

POMONA Server
nmrserver_logo POMONA Server can be used for generating CS-RosettaCM inputs and scripts. Users need to submit their input chemical shift file to the server and check the Generate CS-RosettaCM Inputs option. The server will send results to the users via email, which includes the final alignment files pomona_sum.tab and pomona_aln.tab, and a CS-RosettaCM package file csRosettaCM.zip. Users then need to:
  1. Create a local directory for the CS-RosettaCM job; download all server generated files to this directory; unzip csRosettaCM.zip file;
  2. Go to the generated csRosettaCM directory, edit the master scripts runCSRosettaCM and analyzeCSRosettaCM according to the Rosetta installation;
  3. Perform CS-RosettaCM modeling by running the master script runCSRosettaCM;
  4. After CS-RosettaCM job is done, run CS-RosettaCM structure analysis by using the master script analyzeCSRosettaCM.

top

How to Select Consistent POMONA/CS-RosettaCM Predictions
It is important to carefully evaluate POMONA/CS-RosetttaCM generated structures to avoid accepting unconverged and/or incorrect models. Similar as the CS-Rosetta protocol, stringent accepting criteria are therefore enforced as below:
  1. the lowest energy CS-RosettaCM models must be clustered within about 2.5 Å of Cα-RMSD100, AND
  2. the energy of the lowest energy CS-RosettaCM models must be considerably lower than that of the lowest energy models from a standard CS-Rosetta modeling
1. Convergence criterion

The convergence criterion on the lowest energy CS-RosettaCM models can be inspected by checking the output from running the analysis script analyzeCSRosettaCM, see below for an example from a demo POMONA/CS-RosettaCM modeling for protein Ubiquitin. This script automatically finds the 10 lowest energy CS-RosettaCM models, calculates the Cα-RMSD values relative to the mean coordinates of these 10 models. A Cα-RMSD100 value, calculated from Cα-RMSD/(1+ln(N/100)), where N is the number of residues of the protein, is then used to judge the convergence of the CS-RosettaCM modeling, i.e., the generated CS-RosettaCM models are converged ONLY when the averaged Cα-RMSD100 value is below ~ 2.5 Å. Note that although results where clustering of the lowest energy structures is less tight than 2.5 Å may still be useful for further analysis, such results should not be over-interpreted and could be in error.

The convergence can also be visually inspected by plotting the Rosetta energy versus the Cα-RMSD (to the lowest energy model) value, which can be found as the last two columns in the output file ExtractedPDBs/name.scores.rms2Low.txt. See below for two such plots generated from a POMONA/CS-RosettaCM modeling for protein s.rhodopsin and mad2, for which the convergence is observed for the models of protein s.rhodopsin but not for the models of protein mad2.

Script analyzeCSRosetta Output (for protein ubiquitin)
calculatng Ca-rmsd to the lowest energy model ...
checking 10 lowest energy models ...
Ca-rmsd (to the mean structure) for 10 lowest energy models:
(for residues 2 9 11 71)
0.3627   S_0072.pdb
0.3398   S_0027.pdb
0.2350   S_0084.pdb
0.3473   S_0034.pdb
0.4438   S_0038.pdb
0.4112   S_0095.pdb
0.3535   S_0077.pdb
0.5108   S_0053.pdb
0.4297   S_0070.pdb
0.3838   S_0087.pdb
averaged Ca-rmsd:   0.382 +/- 0.074
Convergence Plot for Protein s.rhodopsin Result for s.rhodopsin
Convergence Plot for Protein mad2 Result for mad2
2. Energy criterion
A second requirement for accepting a CS-RosettaCM structure is that the total Rosetta energy (incl. the chemical shift scoring term) is significantly lower than those lowest values obtained by CS-Rosetta models. The standard CS-Rosetta de novo structure generation procedure is provided based on a well-refined fragment assembly and full-atom refinement procdure, which can sample and generate full-atom structural models with low energy for proteins with small and medium size. CS-Rosetta will typically fail for large proteins, but the lowest energy that a CS-Rosetta procedure reach can be used as a second standard to judge a CS-RosettaCM modeling (for larger proteins). For details regarding the standard CS-Rosetta protocol, please check our CS-Rosetta website.

When preparing the CS-RosettaCM with the "pomona -in myshifts.tab -rosettaCM" command, a csRosetta package/directory other than csRosettaCM is also generated. With a similar procedure to run CS-RosettaCM modeling and analyze CS-RosettaCM models, the reference CS-Rosetta modeling can be performed by using the runCSRosetta script, the generated CS-Rosetta models can be anaylzed with the analyzeCSRosetta script. The Rosetta energy of the lowest energy CS-Rosetta models can be found in the output file csRosetta/ExtractedPDBs/name.scores.txt. See right for two plots generated from a POMONA/CS-RosettaCM modeling for protein s.rhodopsin and mad2, for which the lower energy is observed for the converged CS-RosettaCM models (red dots) of protein s.rhodopsin; while for protein mad2, for which both the CS-Rosetta models (black dots) and CS-RosettaCM models (red) are not converged, the lowest energy CS-RosettaCM models show higher energy.
Energy Plot for Protein s.rhodopsin Result for s.rhodopsin
Energy Plot for Protein mad2 Result for mad2

More under construction ...


top


* All documents in PDF format require the free Adobe Acrobat Reader application for viewing

[ Home ] [ NIH ] [ NIDDK ] [ Terms of Use ]

last update: April 6 2015 / sy