POMONA | CS-RosettaCM: Chemical Shift Homology Modeling using Protein alignments Obtained by Matching Of NMR Assignments

How to Get POMONA Package

A stable version POMONA software package can be downloaded below. When downloading software from this website, you are agreeing to our Terms of Use, including the terms that there is no right to privacy on this system, and that the software from this website is not to be redistributed without permission from the authors. The POMONA package provides the hardware & OS versions of linux9, and mac (see here for a definition of those hardware & OS versions by the NMRPipe system), and requires multiple Unix programs and other external software packages in order to use all its features and/or to perform it in an efficient way (see details below).

The most common POMONA installation procedure on an Unix environment (linux9 and mac) will involve:

Create a directory for the POMONA installation [for example, type mkdir /disk1/POMONA in an "xterm" terminal window].

Go to the selected install directory [cd /disk1/POMONA].

Download and store the POMONA installation files (pomona.tZ, install.com) and database file (pomonaData.tZ) into the selected install directory

Via a web browser: Right-click the download links below and select "Save Target As", "Save Link As" or "Download Linked File (As)" (depending on the browser type), and save the files into the selected install directory (Be sure to retain the exact file name shown below).

Or via the unix command "wget":
wget https://spin.niddk.nih.gov/bax-apps/software/POMONA/pomona.tZ wget https://spin.niddk.nih.gov/bax-apps/software/POMONA/pomonaData.tZ wget https://spin.niddk.nih.gov/bax-apps/software/POMONA/install.com

Execute the install.com script:

In most cases, no arguments will be required.

It will be sufficient to make the install scripts executable (by typing "chmod +x install.com"), then run the install.com script.

Use the command ./install.com +help to generate a list of install command-line options.

The installation will generate a "stand-alone" initialization script pomona_init.com, which stores all required and optional environment variables for running the program; it also recommend a common way to apply the initialization, i.e., adding the following lines to the ~/.cshrc file:
if (-e /disk1/POMONA/pomona_init.com) then source /disk1/POMONA/pomona_init.com endif Note: All required environment variables in the pomona_init.com script will be filled during the installation procedure, while the optional environment variables are required to be input manually by the users after the installation. Please check here to see a full list of the environment variables defined in this initialization script.

There is also a Web-Based version of POMONA which can be used directly without installing POMONA. However, due to our limited computing resources, the CS-RosettaCM structure generation, the most time-consuming procedure of the POMONA/CS-RosettaCM, is not provided by our server. POMONA server will provide all required inputs/scripts for running the CS-RosettaCM structure generation, users have to run CS-RosettaCM on their own. You can access this Web-based system, along with other facilities for manipulating chemical shifts, dipolar couplings, and molecular structures at the Bax Group NMR Server site:

POMONA/CS-RosettaCM Installation Files

POMONA Web Server

(Version 1.00 Rev 2014.301.14.56)
install.com [size: 9KB]
pomona.tZ [size: 24MB]
pomonaData.tZ [size: 2.8GB]

Other Software Programs

By default, only the TALOS-N program is required by POMONA for searching structure alignments from the PDB. However, in order to run other optional modules and the CS-RosettaCM approach, multiple external software programs are needed, which are listed below:

Software

Note

URL

TALOS-N REQUIRED to prepare inputs for all POMONA alignments.

TALOS-N provides its outputs as the required inputs for running POMONA alignments

https://spin.niddk.nih.gov/bax-apps/software/TALOS-N/

ROSETTA required to prepare and run CS-RosettaCM and CS-Rosetta approaches.

POMONA provides all required inputs/scripts for running the CS-RosettaCM comparative modeling, for which the RosettaCM module from Rosetta software suite is needed.
Note: Rosetta versions of 2014 or newer ONLY
Note: See here also for a tutorial on how to install and use the Rosetta program.

https://www.rosettacommons.org/

DSSP
required to prepare the database for pairwise POMONA alignments.

DSSP assigns secondary structure from PDB coordinates

http://swift.cmbi.ru.nl/gv/dssp/

REDUCE required to prepare the database for pairwise POMONA alignments.

REDUCE adds hydrogens to PDB coordinates

http://kinemage.biochem.duke.edu/software/reduce.php

BLAST required to prepare de novo fragment candidates for CS-Rosetta/CS-RosettaCM.

BLAST generates amino acid sequence profile from sequence alignments.
Note 1: Do NOT use the C++ version BLAST+
Note 2: nr database is recommended

ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy/
ftp://ftp.ncbi.nlm.nih.gov/blast/db/

Contents

What is POMONA and CS-RosettaCM?
Components of the POMONA System
How to Use POMONA/CS-RosettaCM
Chemical Shift Input Format Used by POMONA/CS-RosettaCM
Inspecting POMONA Alignments
How to Select Consistent Predictions in POMONA/CS-RosettaCM

About POMONA and CS-RosettaCM

POMONA is a newly designed NMR chemical shift guided protein structure alignment method to generate protein templates with the best matched local structure from the Protein Data Bank. It is based on experimental input data comprising ¹³C^α, ¹³C^β, ¹³C', ¹⁵N, ¹H^α, and ¹H^N NMR chemical shifts, plus sparse NOEs if available, and directly exploits the powerful bioinformatics algorithms previously developed for sequence-based homology modeling. POMONA is shown to identify structural homologues in the absence of significant sequence similarity.

When in combination with a subsequent chemical shift based Rosetta comparative modeling (CS-RosettaCM), the well selected POMONA structural templates enable the generation of full atom models that are demonstrated to match well to the corresponding structures experimentally derived from X-ray diffraction or NMR data. The POMONA/CS-RosettaCM protocol is proven as an alternate approach to protein structure determination from a minimal set of NMR input data, which is applicable to larger proteins representing a wide variety of folds.

top

POMONA/CS-RosettaCM Flowchart

Components of the POMONA System The POMONA core system is implemented in the C++ language. Moreover, multiple Unix shell scripts are provided in the POMONA package to evaluate and prepare the inputs, analyze the output, and so on. All files/scripts and directories of the POMONA system include:

pomona

[+]

master script to run POMONA/CS-RosettaCM (click left to see all allowed options).

# Input options
`-in`	[none]	Input Chemical Shift Table.
`-pdb`	[none]	Reference PDB Structure Input.
`-noe`	[none]	NOE Restraint Input File (NMRPipe format).
`-keepTails`	[default]	To Keep Flexible Tails.
`-trimTails`		To Trim Flexible Tails and Use Trimmed As Input.
`-selRes2Trim`	[none]	List of Specified Tail Residues to be Trimmed.

# Chemical Shift (CS) Quality Check and Correction:
`-offset`	[default]	Apply CS Offset Correction If Needed.
`-iso`		Apply 2H Isotope Correction to CA/CB Shifts.
# Database and Homology Options:
`-db`	[PDB]	List of PDB Database (default) or .pdb Files to be Searched.
`-excl`	[none]	List of Proteins (PDB Identifiers) to be Excluded.
`-maxSeqID`	[100]	Max Sequence Identity to be Searched.
# POMONA Alignment Options:
`-wCS`	[1.0]	Weight for Chemical Shift.
`-wAA`	[0.1]	Weight for Amino Acid Sequence.
`-count`	[1000]	Number of Top Alignments.
`-minLen`	[30]	Min. Allowed Alignment Length.
`-maxGapLen`	[30]	Max. Allowed Length for a Single Alignment Gap.
`-gapI`	[5.0]	Initial Gap Opening Penalty.
`-gapE`	[0.3]	Gap Extending Penalty.
`-CaDist0`	[6.5]	Min. Ca Dist to Penalize a Gap Opening in Database Protein.
# Alignment Clustering Options:
`-rmsdCut`	[4.0]	Cluster Distance Cutoff of Ca-RMSD.
# CS-RosettaCM Options:
`-rosettaCM`		Generate CS-RosettaCM Scripts and Inputs.
`-nohoms`		Exclude Blast Identifed Homologs from De Novo Fragments.
`-NSelCluster`	10	Number of Selected Cluster of POMONA Alignments.
`-NperCluster`	2td>	Number of Selected Alignments per Cluster.
# MPI Options:
`-MPI`	[default]	Use MPI for POMONA Alignment.
`-noMPI`		Do not Use MPI for POMONA Alignment.
`-MPI_np`	[10]	Number of Processes for POMONA Alignment.

pomona_init.com

[+]

initialization script to define all required and optional environment variables to run POMONA master script pomona.

# Required variables (will be automatically filled during POMONA installation)
`setenv POMONA_DIR`	automatically filled	# POMONA installation path
`setenv TALOSN_DIR`	automatically filled	# TALOS-N installation path
`setenv talosn`	`$TALOSN_DIR/talosn`	# TALOSN master script

# Optional variabes (need to be filled by users after POMONA installation)
`setenv dssp`	`/your/DSSP/installation/path/`	# DSSP installation path
`setenv reduce`	`/your/REDUCE/binary/`	# REDUCE binary
`setenv reduceDB`	`/your/REDUCE/library/file/`	# REDUCE library file

`setenv ROSETTA_DIR`	`/your/ROSETTA/installation/path/main`	# ROSETTA installation path
`setenv ROSETTA_DB`	`$ROSETTA_DIR/database/`	# ROSETTA database path
`setenv ROSETTA_BIN`	`$ROSETTA_DIR/source/bin/`	# ROSETTA binary path
`setenv ROSETTA_VALL`	`$ROSETTA_DIR/tools/fragment_tools/vall.apr24.2008.extended`	# ROSETTA vall database file
`setenv ROSETTA_PICKER`	`$ROSETTA_BIN/fragment_picker.linuxgccrelease`	# ROSETTA fragment_picker binary
`setenv ROSETTA_THREAD`	`$ROSETTA_BIN/partial_thread.linuxgccrelease`	# ROSETTA partial_thread binary
`setenv ROSETTA_PDBEXT`	`$ROSETTA_BIN/extract_pdbs.default.linuxgccrelease`	# ROSETTA extract_pdbs binary
`setenv ROSETTA_SCRIPT`	`$ROSETTA_BIN/rosetta_scripts.linuxgccrelease`	# ROSETTA rosetta_scripts binary
`setenv ROSETTA_ABINITIO`	`$ROSETTA_BIN/minirosetta.default.linuxgccrelease`	# ROSETTA minirosetta binary

`setenv PSIBLAST_BIN`	`/your/BLAST/installation/path/bin/blastpgp`	# PSI-BLAST binary
`setenv PSIBLAST_NR`	`/your/BLAST/nr/database/path/nr`	# BLAST nr database

bin/ [+] [-] directory for all compiled binary files for Linux (*.linux9, *.static.linux9) and MacOS (*.mac)

scripts/ [+] [-] directory for all required utility scripts of POMONA

data/ [+] [-] directory for required data files of POMONA

demo/ [+] [-] directory with example chemical shift input data and scripts for a demo of POMONA.

click [+]/[-] to see/hide the expand view and details for a given component

top

NMR Input Data for POMONA/CS-RosettaCM

POMONA system is desiged to, utilizing majorly the backbone and ¹³C^β chemical shifts, to (1) identifying all possible structural homologues from the PDB, (2) generating the pairswise structural alignment to specific protein(s), and (3) preparing and applying CS-RosettaCM comparative modeling. To use these features of POMONA, users need to follow the below procedures to properly prepare and inspect their NMR input data.

Chemical Shift Data Format and Requirements
POMONA requires an input chemical shift table of standard nmrPipe/TALOS format. An example portion of the required chemical shift table format is shown below (full example: ubiq.tab). Other examples can be found in the POMONA/demo directory, or at the POMONA Server site.

Click [+] to see/hide the full details of the requirements of the chemical shift table format

The chemical shift input file for POMONA uses the general-purpose NMRPipe table format.
All ¹³C chemical shifts (including ¹³C^α, ¹³C^β, and ¹³C') should be referenced relative to the methyl groups of 4,4-dimethyl-4-silapentane-1-sulfonic acid, or DSS. The ¹⁵N chemical shifts should be referenced relative to liquid ammonia at 25 degrees C.
Use the optional DATA FIRST_RESID line to specify the first residue ID number of the sequence. If it is not specified, residue numbering is assumed to begin at 1.
The protein sequence should be given as shown, using one or more DATA SEQUENCE lines. Space characters in the sequence will be ignored. Use c for oxidized CYS (C^β ~ 42.5 ppm) and C for reduced CYS (C^β ~ 28 ppm), h for protonated HIS and H for deprotonated HIS, in both the sequence header and the shift table. Use X for residues other than the usual 20 amino acids.
The data table must include columns for residue ID (listed as RESID in the VARS header), one-character residue name (RESNAME), atom name (ATOMNAME), and chemical shift (SHIFT).
The table must include a "VARS" line which labels the corresponding columns of the data table.
The table must include a "FORMAT" line which defines the data type of the corresponding columns of the table.

Atom names are always given exactly as:

HA	for Hα of all residues except glycine
HA2	for the first Hα of glycine residues
HA3	for the second Hα
C	for C' (CO)
CA	for Cα
CB	for Cβ
N	for N-amide
HN	for H-amide

As noted, there is an exception for naming Gly assignments, which should use HA2 and HA3 instead of HA. In the case of Gly HA2/HA3 assignments, POMONA will use the average value of the two, so that it is not necessary to have these assigned stereo specifically; for use of POMONA, the assignment can be arbitrary. Note however that the assignment must be given exactly as either "HA2" or "HA3" rather than "HA2|HA3" etc.
Other types of assignments may be present in the chemical shift table; they will be ignored.

Click [+] to see/hide an example NMRPipe/TALOS format chemical shift table (excerpt):

   REMARK Ubiquitin

   DATA FIRST_RESID 1

   DATA SEQUENCE MQIFVKTLTG KTITLEVEPS DTIENVKAKI QDKEGIPPDQ QRLIFAGKQL
   DATA SEQUENCE EDGRTLSDYN IQKESTLHLV LRLRGG

   VARS   RESID RESNAME ATOMNAME SHIFT
   FORMAT %4d   %1s     %4s      %8.3f

     1 M           HA                  4.23
     1 M           C                 170.54
     1 M           CA                 54.45
     1 M           CB                 33.27
     2 Q           HN                  8.90
     2 Q           N                 123.22
     2 Q           HA                  5.25
     2 Q           C                 175.92
     2 Q           CA                 55.08
     2 Q           CB                 30.76
   ...

POMONA can also use chemical shift input in the BMRB NMR-Star format. Two conversion Unix shell scripts, bmrb2talos_v21.com and bmrb2talos_v31.com, are included with the POMONA package and can be used to convert a NMR-Star format (V2.1 and V3.1 respectively) chemical shift table to TALOS format. Example command lines for using these scripts are:

bmrb2talos_v21.com  bmrb_v21.str > inCS.tab
bmrb2talos_v31.com  bmrb_v31.str

See also here for more details regarding the NMRPipe/TALOS format and NMRStar format.

NOE Constraint Data Format [+] Click above [+] to see the expand view

In order to use NOE contraints in csRosettaCM in the step of Rosetta structure generation, the the NOE contraints must be prepared with a Rosetta compatible format, see here for all allowed formats by Rosetta. In summary, a general NOE constraint can be defined by a AtomPair line such as:

#AtomPair: Atom1_Name Atom1_ResNum Atom2_Name Atom2_ResNum Func_Type Func_Def
AtomPair     H      3     H   112 BOUNDED 1.500 2.910 0.300
AtomPair     H      7     H   108 BOUNDED 1.500 2.720 0.300
AtomPair     H      9     H   106 BOUNDED 1.500 3.070 0.300

ambiguous NOE constrains can be defined by AmbiguousNMRDistance such as:

#AmbiguousNMRDistance: Atom1_Name Atom1_ResNum Atom2_Name Atom2_ResNum Func_Type Func_Def
# Ambiguous Distance between Atom1 and Atom2. The difference from AtomPair Constraint is that 
# atom names are specially parsed to detect ambiguous hydrogens, which are either experimentally 
# ambiguous or rotationally identical (like methyl hydrogens). The constraint applies to any 
# hydrogens equivalent to the named hydrogen. The logic for determining which hydrogens are which 
# is in src/core/scoring/constraints/AmbiguousNMRDistanceConstraints.cc:parse_NMR_name

Chemical Shift Data Inspection

As chemical shifts are the major inputs to POMONA/CS-Rosetta approach, their quality is therefore critical to achieve expected performance. The pre-check module from the TALOS-N prgram can be use to apply a quality inspection for the chemical shift inputs:

TALOS-N can identify possible referencing problems with the ¹³C^α, ¹³C^β, ¹³C' and¹H^α chemical shift inputs and possible chemical shift outliers when running a typical TALOS-N command with an additional -check option, for example by using the command line input argument:

talosn -in inCS.tab -check

This module first converts the chemical shifts of each residue to secondary chemical shifts, and subsequently evaluates these by correlating ¹³C^α, ¹³C^β, ¹³C' and¹H^α to the reference-free entity, ¹³C^α-¹³C^β. The estimated chemical shift referencing offsets, as well as their corresponding fitting error, will be printed for ¹³C^α, ¹³C^β, ¹³C' and¹H^α; this pre-check module will also identify residues with unusual chemical shifts, for which secondary chemical shifts fall outside the expected range. An example output of this module is with the following format:

   Chemical shift outlier checking...
     ...
     64 E CB Secondary Shift: -3.800 Limit: -3.765
     76 G  C Secondary Shift:  4.250 Limit:  1.925 !

   Chemical shift referencing checking...
      Estimated Referencing Offset for CA/CB: 0.795 +/- 0.104 ppm (Size: 66)

Note that:

An offset correction generally is only needed when the estimated referencing offset exceeds the average fitting error by more than about five standard deviations. To apply the offset correction, a script applyOffsetCorrection.com included in the POMONA package can be used with a following syntax:

applyOffsetCorrection.com inCS.tab

The chemical shift outliers, especially those with highly unusual chemical shifts, for which secondary chemical shifts deviate from the expected range by more than 2 times of the normal range of secondary chemical shifts, may correspond to experimental errors, and need to be inspected carefully prior to using them. For example, as shown in the above example, the identified chemical shift outlier from residue 76 correspond to a C-terminal carboxylate instead of a backbone carbonyl.

²H isotope correction for ¹³C^α/¹³C^β chemical shifts (Maltsev et al. J.Biol.NMR, 2012, 54, 181-191) is also required for chemical shifts measured for per-deuterated protein samples. To do this, a script applyIsotopeCorrection2CACB.com included in the POMONA package can be used with a following syntax:

applyIsotopeCorrection2CACB.com inCS.tab

Note that scripts applyOffsetCorrection.com and applyIsotopeCorrection2CACB.com will apply corections to the orginal chemical shift input files, while the orginal input file is re-named with a .orig suffix.

POMONA has a default option (-offset) to check the referencing offset and apply the possible correction for the chemical shift input, as well as an option (-iso) to apply the ²H isotope correction on the fly. However, it is still recommended to users to properly prepare and carefully inspect their chemical shifts prior to using them as input to POMONA.

Handling Flexible Tails/Loops [+]

top

How to Use POMONA/CS-RosettaCM

For a query protein with known backbone and ¹³C^β chemical shifts, POMONA is designed for (1) searching the Protein Data Bank (PDB) for proteins with best matched (local) structure, (2) aligning to any specific protein(s) according to their local structures, and (3) prepare all data and scripts for running a CS-RosettaCM structure generation procedure.

Protein Structure Database Searching by POMONA To use POMONA for searching structure alignments from the PDB, users can simply follow a procedure listed below:

Create a directory for the prediction session; all subsequent commands will be executed from this directory.
Prepare the input table of chemical shift assignments (for example "myshifts.tab") with a proper format (see the previous section); please also carefully inspect the chemical shifts for the possible referencing offset and outliers.
Run POMONA master script (pomona) to perform the PDB database searching. Most commonly, this will simply require a command line such as:
```
pomona -in myshifts.tab
```
- It first runs TALOS-N program for predicting various structural factors, such as the backbone torsion angles and the secondary structure, which are used as inputs for the following step of POMONA database searching.
- All protein chains (~225,000) in the PDB are divided into 11 subsets, each of which is used to construct a pre-defined POMONA PDB database (with a name of pdb_??.and stored at $POMONA_DIR/data/pdb/). During the PDB database searching, a summary file "pomona.pdb_??.tab" is created to store the best aligned structures obtained from searching a given PDB subset.
- After searching all PDB subsets, a final clustering and summarizing step is performed for all identified alignments. By default, top 1000 alignments with the highest alignment score are kept, the number of selected alignments can be specified with an option -count POMONA option to specify the number of selected alignments for output, the alignments are selected based on their total alignment score.
  
  Default: 1000
  . For those selected alignments, a clustering procedure is performed to identify aligned structures with similar global conformation, by using a default distance cutoff value of the normalized C^α-RMSD of 4 Å (or a cutoff value speicified by an option of -rmsdCut POMONA option to specify the distance cutoff used for clustering the top selected alignments in terms of the similarity of their global conformation, which is measured with a metric of the normalized C^α-RMSD calculated for the aligned residues.
  
  Default: 4 Å
  ). Two final files pomona_sum.tab Default POMONA summary file, which stores the alignment summary of the top selected alignments. See below for an excerpt of this file for a test protein ubiquitin:
  and pomona_aln.tab Default POMONA alignment file, which stores the details of the top selected alignments. See below for an excerpt of this file for a test protein ubiquitin:
  are generated, for storing the summary of the identified alignments and the details of the structure alignments, respectively.
  Sample POMONA Summary File "pomona_sum.tab"
```
VARS   INDEX PDB R1 RN D_R1 D_RN SCORE LEN GAP_SIZE MISMATCH IDENTITY DB_VOL DB_R1 DB_RN CLUST_ID CLUST_MEMCNT 
FORMAT %4d %5s %4d %4d %4d %4d %6.2f %3d %3d %3d %3d  %s %8d %8d  %3d %3d

   1 3ehvC    1   71    1   71 264.13  71   0   1 100   pdb_06   582555   582626    1 915
   2 2mjbA    1   72    1   72 263.33  72   0   1 100   pdb_10   602798   602873    1 915
   3 4k7uA    1   72    1   72 262.22  72   0   0 100   pdb_09  1994953  1995024    1 915
   4 2qhoG    1   72    1   72 261.35  72   0   0 100   pdb_04  4300621  4300692    1 915
   5 2peaB    1   72    1   72 260.43  72   0   0 100   pdb_04  3603777  3603848    1 915
   6 2mbqB    1   72    1   72 260.27  72   0   0 100   pdb_10   592249   592320    1 915
   ...
```
  Sample POMONA Alignment File "pomona_aln.tab"
```
VARS   INDEX PDB TAG R1 ALIGNMENT_STR 
FORMAT %4d %5s %5s %4d %s

   1 3ehvC ALN_Q    0 0999999999999999999967678787777779999889999999999980089678799999999996m
   1 3ehvC SS_CS    1 LEEEEEELLLLEEEEEELLLLLHHHHHHHHHHHLLLLHHLLEEEELLEELLLLLLHHHHLLLLLLEEEEEE
   1 3ehvC QUERY    1 MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVL
   1 3ehvC IDENT    0 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
   1 3ehvC SBJCT    1 MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVL
   1 3ehvC  DSSP    1 LEEEEELLLLLEEEEELLLLLEHHHHHHHHHHHHLLLHHHEEEEELLEELLLLLELHHHLLLLLLEEEEEE
   ...
```
- The PDB database searching by POMONA typically takes hours to days for a protein with normal size, therefore the parallel searching mode is highly recommended when multiple cores are available for a computer or a computer cluster, which can be speficied by an option of -np Number of total processes for POMONA alignment.
  
  Default: 10
  .

Note that the default alignment settings will fulfill the needs of most users, however, the alignment setting can be customzied:
[+] Click left to see a full description of all optional alignment parameters.

Input Options:
 -in         [None]   Input Shift Table.
 -inTitle    [None]   Input Shift Table Title.
 -pdb        [None]   Reference PDB Structure Input.
 -pdbTitle   [None]   Reference PDB Structure Input Title.
 -noe        [None]   NOE Restraint Input File (Rosetta3 format).
 -noeTitle   [None]   NOE Restraint Input Title.
Chemical Shift (CS) Quality Check and Correction:
 -offset     [false]  Apply CS Offset Correction If Needed.
 -iso        [false]  Apply 2H Isotope Correction to CA/CB Shifts.
Database and Homology Options:
 -db         [PDB]    List of PDB Database (default) or .pdb Files to be Searched.
 -excl       [None]   List of Proteins (PDB Identifiers) to be Excluded.
 -maxSeqID   [100]    Max Sequence Identity to be Searched.
POMONA Alignment Options:
 -wCS        [1.0]    Weight for Chemical Shift.
 -wAA        [0.1]    Weight for Amino Acid Sequence.
 -count      [1000]   Number of Top Alignments
 -minLen     [30]     Min Allowed Alignment Length.
 -maxGapLen  [30]     Max Allowed Length for a Single Alignment Gap.
 -gapI       [5.0]    Gap Opening Penalty.
 -gapE       [0.3]    Gap Extending Penalty.
 -CaDist0    [6.5]    Min. Ca Dist to penalize a gap opening in database protein.
Alignment Clustering Options:
 -rmsdCut    [4.0]    Cluster Distance Cutoff of Ca-RMSD.
RosettaCM Options:
 -rosettaCM           Generate RosettaCM Scripts and Inputs
 -NSelCluster [10]    Number of Selected Cluster of POMONA Alignments.
 -NperCluster [2]     Number of Selected Alignments per Cluster.
MPI Options:
 -MPI                 Use MPI (default) for POMONA Alignment.
 -noMPI               Do not use MPI for POMONA Alignment.
 -MPI_np      [10]    Number of processes for POMONA Alignment.

More under construction ...

POMONA Server

POMONA Server can be used for the purpose of searching structure alignments from the PDB, users need to submit their input chemical shift file to the server with a default All PDB Proteins option. The server will send the results to the users via email. Note that only the final alignment files pomona_sum.tab and pomona_aln.tab will be sent.

Pairwise Comparison of Protein Structures by POMONA To use POMONA for searching an optimal alignment to a known protein or a set of known proteins, users can simply follow a procedure listed below:

Create a directory for the prediction session; all subsequent commands will be executed from this directory.
Prepare the input table of chemical shift assignments (for example "myshifts.tab") with a proper format (see the previous section); please also carefully inspect the chemical shifts for the possible referencing offset and outliers.

Run POMONA master script (pomona) to perform the structure alignment:

Most commonly, if the proteins to be aligned have structure deposited in the PDB, this will simply require a command such as:
```
pomona -in myshifts.tab -db 1d3zA 1ubqA 1ubi
```
where the name(s) after the option -db are the PDB identifier of the protein(s) to be aligned, and shoule be a standard 4-letter PDB identifier and with an optional character for the chain id. If multiple proteins are used, their names should be separated by a space character. Note, as the script needs to download file(s) from PDB website, the network accessibility is therefore required, as well an unix/linux program wget to perform the download (Mac OS may need a manual installation of wget program)
If users want to apply the structure alignment to proteins with a local PDB file, a command such as listed below should be used:
```
pomona -in myshifts.tab -db 1d3zA.pdb 1ubqA.pdb 1ubi.pdb
```
where the names after the option -db are for the PDB file of the protein(s) to be aligned. Note that the PDB file should have a standard format as those from the PDB database.

The program first generates a POMONA database by using the PDB files (downloaded from the PDB website or provided by the users), then applies a clustering and summarization procedure, and generates a final summary output file pomona_sum.tab Default POMONA summary file, which stores the alignment summary of the top selected alignments. See below for an excerpt of this file from a pairwise alignment between a test protein ubiquitin and its multiple homologues, this file can also be found at $POMONA_DIR/demo/pairwise/pomona_sum.tab:

VARS INDEX PDB R1 RN D_R1 D_RN SCORE LEN GAP_SIZE MISMATCH IDENTITY DB_VOL DB_R1 DB_RN CLUST_ID CLUST_MEMCNT 
FORMAT %4d %5s %4d %4d %4d %4d %6.2f %3d %3d %3d %3d  %s %8d %8d  %3d %3d

   1 1ubiA    1   71    1   71 255.83  72   0   1 100   pdb_00      153      228    1   3 
   2 1d3zA    1   71    1   71 254.64  72   0   1 100   pdb_00        1       76    1   3 
   3 1ubqA    1   71    1   71 254.44  72   0   1 100   pdb_00       77      152    1   3

and a alignment output file pomona_aln.tab Default POMONA alignment file, which stores the details of the top selected alignments. See below for an excerpt of this file from a pairwise alignment between protein ubiquitin and its homologues, this file can also be found at $POMONA_DIR/demo/pairwise/pomona_aln.tab:

VARS   INDEX PDB TAG R1 ALIGNMENT_STR 
FORMAT %4d %5s %5s %4d %s

   1 1ubiA ALN_Q    0 09999999999999999998675888888887699898898799999999900997777999999999999m
   1 1ubiA SS_CS    1 LEEEEEELLLLEEEEEELLLLLHHHHHHHHHHHLLLLHHLLEEEELLEELLLLLLHHHHLLLLLLEEEEEEE
   1 1ubiA QUERY    1 MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLR
   1 1ubiA IDENT    0 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
   1 1ubiA SBJCT    1 MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLR
   1 1ubiA  DSSP    1 LEEEEEELLLLEEEEELLLLLEHHHHHHHHHHHHLLLHHHEEEEELLEELLLLLEHHHHLLLLLLEEEEEEL

   2 1d3zA ALN_Q    0 09999999999999999999776888777776699998787899999999900996787999899999998m
   2 1d3zA SS_CS    1 LEEEEEELLLLEEEEEELLLLLHHHHHHHHHHHLLLLHHLLEEEELLEELLLLLLHHHHLLLLLLEEEEEEE
   2 1d3zA QUERY    1 MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLR
   2 1d3zA IDENT    0 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
   2 1d3zA SBJCT    1 MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLR
   2 1d3zA  DSSP    1 LEEEEELLLLLEEEEELLLLLEHHHHHHHHHHHHLLLHHHEEEEELLEELLLLLELHHHLLLLLLEEEEEEL

   3 1ubqA ALN_Q    0 09999999999999999999675888877777598998898999999999900996677999999999999m
   3 1ubqA SS_CS    1 LEEEEEELLLLEEEEEELLLLLHHHHHHHHHHHLLLLHHLLEEEELLEELLLLLLHHHHLLLLLLEEEEEEE
   3 1ubqA QUERY    1 MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLR
   3 1ubqA IDENT    0 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
   3 1ubqA SBJCT    1 MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLR
   3 1ubqA  DSSP    1 LEEEEEELLLLEEEEELLLLLEHHHHHHHHHHHHLLLHHHEEEEELLEELLLLLELHHHLLLLLLEEEEEEL

Sample POMONA Summary File "pomona_sum.tab" for Pairwise Alignment

VARS INDEX PDB R1 RN D_R1 D_RN SCORE LEN GAP_SIZE MISMATCH IDENTITY DB_VOL DB_R1 DB_RN CLUST_ID CLUST_MEMCNT 
FORMAT %4d %5s %4d %4d %4d %4d %6.2f %3d %3d %3d %3d  %s %8d %8d  %3d %3d

   1 1ubiA    1   71    1   71 255.83  72   0   1 100   pdb_00      153      228    1   3 
   2 1d3zA    1   71    1   71 254.64  72   0   1 100   pdb_00        1       76    1   3 
   3 1ubqA    1   71    1   71 254.44  72   0   1 100   pdb_00       77      152    1   3

Sample POMONA Alignment File "pomona_aln.tab" for Pairwise Alignment

VARS   INDEX PDB TAG R1 ALIGNMENT_STR 
FORMAT %4d %5s %5s %4d %s

   1 1ubiA ALN_Q    0 09999999999999999998675888888887699898898799999999900997777999999999999m
   1 1ubiA SS_CS    1 LEEEEEELLLLEEEEEELLLLLHHHHHHHHHHHLLLLHHLLEEEELLEELLLLLLHHHHLLLLLLEEEEEEE
   1 1ubiA QUERY    1 MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLR
   1 1ubiA IDENT    0 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
   1 1ubiA SBJCT    1 MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLR
   1 1ubiA  DSSP    1 LEEEEEELLLLEEEEELLLLLEHHHHHHHHHHHHLLLHHHEEEEELLEELLLLLEHHHHLLLLLLEEEEEEL
...

more under construction ...

POMONA Server

POMONA Server can be used for pairwise structure alignment. When submit chemical shift input file of the query protein to the server, users need to (1) select the option of Selceted PDB Protein(s) and (2) type in the PDB identifer of the protein(s) to be aligned. Note that the option to align a local PDB file currently is not provided by the server.

POMONA/CS-RosettaCM Structure Generation

I. Prepare CS-RosettaCM running data

POMONA generates multiple structure alignments, which can be used as structural templates for running a CS-RosettaCM structure generation procedure. To generate structural templates from POMONA alignments and other required inputs and scripts for running CS-RosettaCM structure generation, users can use an option -rosettaCM in addition to other options to running POMONA alignment, such as:

pomona -in myshifts.tab -rosettaCM

which performs the following procedures:

It firstly runs POMONA to identify 1000 (or any number defined by the -count option) top structure alignments and applies clustering to them in terms of the similarity of their global conformation, see above sections for the details of running POMONA alignment.
Next, it selects top 2 alignments (or any number of alignments specified by the option -NperCluster) from top 10 clusters (or any number of clusters specified by the option -NSelCluster), which are stored at an output file pomona_selected.aln.

It then generates structure templates from the top selected alignments (pomona_selected.aln) and all other required inputs/scripts for running a CS-RosettaCM comparative modeling, which are all stored in a directory of csRosettaCM and include:

csRosettaCM/

[+]

II. Run CS-RosettaCM

Users then need to run the generated script runCSRosettaCM in csRosettaCM directory to start the CS-RosettaCM structure modeling. Note that the script runCSRosettaCM may require a manual modification if the Rosetta installation environment is not properly configured, or for performing parallel jobs on computing clusters.

Script runCSRosettaCM

#!/bin/csh
#

# ====== ***PLEASE verify & modify below definitions*** ======
set rosettaBinDir = /Your_Rosetta_Directory/main/source/bin/
set rosettaBin    = $rosettaBinDir/rosetta_scripts.default.linuxgccrelease
set rosettaMPIBin = $rosettaBinDir/rosetta_scripts.mpi.linuxgccrelease
set rosettaDB     = /Your_Rosetta_Directory/main/database
# ===========================================================

$rosettaBin -parser:protocol rosetta_cm.xml @flags -database $rosettaDB \
            -out:file:silent default.out -out:file:scorefile default.sc
# other options: -hybridize:starting_template [IntergerVector] 'to define starting templates' 

#MPI command
#mpirun -np 10 $rosettaMPIBin -parser:protocol rosetta_cm.xml @flags -database $rosettaDB \
                              -out:file:silent default.out -out:file:scorefile default.sc

After the CS-RosettaCM structure modeling job is done, users need to run the analysis script analyzeCSRosettaCM (after a modification or confirmation for the Rosetta installation environment defined in it) in csRosettaCM directory. It generates a directory with a name ExtractedPDBs for storing all models and the analysis tables, which includes:

ExtractedPDBs/

[+]

For more information regarding the evaulation of the generated data, please see the next section of "How to Select Consistent POMONA/CS-RosettaCM Predictions".

More under construction ...

POMONA Server

POMONA Server can be used for generating CS-RosettaCM inputs and scripts. Users need to submit their input chemical shift file to the server and check the

Generate CS-RosettaCM Inputs

option. The server will send results to the users via email, which includes the final alignment files pomona_sum.tab and pomona_aln.tab, and a CS-RosettaCM package file csRosettaCM.zip. Users then need to:

Create a local directory for the CS-RosettaCM job; download all server generated files to this directory; unzip csRosettaCM.zip file;
Go to the generated csRosettaCM directory, edit the master scripts runCSRosettaCM and analyzeCSRosettaCM according to the Rosetta installation;
Perform CS-RosettaCM modeling by running the master script runCSRosettaCM;
After CS-RosettaCM job is done, run CS-RosettaCM structure analysis by using the master script analyzeCSRosettaCM.

top

How to Select Consistent POMONA/CS-RosettaCM Predictions
It is important to carefully evaluate POMONA/CS-RosetttaCM generated structures to avoid accepting unconverged and/or incorrect models. Similar as the CS-Rosetta protocol, stringent accepting criteria are therefore enforced as below:

the lowest energy CS-RosettaCM models must be clustered within about 2.5 Å of C^α-RMSD₁₀₀, AND
the energy of the lowest energy CS-RosettaCM models must be considerably lower than that of the lowest energy models from a standard CS-Rosetta modeling

1. Convergence criterion

The convergence criterion on the lowest energy CS-RosettaCM models can be inspected by checking the output from running the analysis script `analyzeCSRosettaCM`, see below for an example from a demo POMONA/CS-RosettaCM modeling for protein Ubiquitin. This script automatically finds the 10 lowest energy CS-RosettaCM models, calculates the C^α-RMSD values relative to the mean coordinates of these 10 models. A C^α-RMSD₁₀₀ value, calculated from C^α-RMSD/(1+ln(N/100)), where N is the number of residues of the protein, is then used to judge the convergence of the CS-RosettaCM modeling, i.e., the generated CS-RosettaCM models are converged ONLY when the averaged C^α-RMSD₁₀₀ value is below ~ 2.5 Å. Note that although results where clustering of the lowest energy structures is less tight than 2.5 Å may still be useful for further analysis, such results should not be over-interpreted and could be in error. The convergence can also be visually inspected by plotting the Rosetta energy versus the C^α-RMSD (to the lowest energy model) value, which can be found as the last two columns in the output file `ExtractedPDBs/name.scores.rms2Low.txt`. See below for two such plots generated from a POMONA/CS-RosettaCM modeling for protein s.rhodopsin and mad2, for which the convergence is observed for the models of protein s.rhodopsin but not for the models of protein mad2.
Script analyzeCSRosetta Output (for protein ubiquitin) calculatng Ca-rmsd to the lowest energy model ... checking 10 lowest energy models ... Ca-rmsd (to the mean structure) for 10 lowest energy models: (for residues 2 9 11 71) 0.3627 S_0072.pdb 0.3398 S_0027.pdb 0.2350 S_0084.pdb 0.3473 S_0034.pdb 0.4438 S_0038.pdb 0.4112 S_0095.pdb 0.3535 S_0077.pdb 0.5108 S_0053.pdb 0.4297 S_0070.pdb 0.3838 S_0087.pdb averaged Ca-rmsd: 0.382 +/- 0.074	Convergence Plot for Protein s.rhodopsin	Convergence Plot for Protein mad2

2. Energy criterion

A second requirement for accepting a CS-RosettaCM structure is that the total Rosetta energy (incl. the chemical shift scoring term) is significantly lower than those lowest values obtained by CS-Rosetta models. The standard CS-Rosetta de novo structure generation procedure is provided based on a well-refined fragment assembly and full-atom refinement procdure, which can sample and generate full-atom structural models with low energy for proteins with small and medium size. CS-Rosetta will typically fail for large proteins, but the lowest energy that a CS-Rosetta procedure reach can be used as a second standard to judge a CS-RosettaCM modeling (for larger proteins). For details regarding the standard CS-Rosetta protocol, please check our CS-Rosetta website.

When preparing the CS-RosettaCM with the "pomona -in myshifts.tab -rosettaCM" command, a csRosetta package/directory other than csRosettaCM is also generated. With a similar procedure to run CS-RosettaCM modeling and analyze CS-RosettaCM models, the reference CS-Rosetta modeling can be performed by using the runCSRosetta script, the generated CS-Rosetta models can be anaylzed with the analyzeCSRosetta script. The Rosetta energy of the lowest energy CS-Rosetta models can be found in the output file csRosetta/ExtractedPDBs/name.scores.txt. See right for two plots generated from a POMONA/CS-RosettaCM modeling for protein s.rhodopsin and mad2, for which the lower energy is observed for the converged CS-RosettaCM models (red dots) of protein s.rhodopsin; while for protein mad2, for which both the CS-Rosetta models (black dots) and CS-RosettaCM models (red) are not converged, the lowest energy CS-RosettaCM models show higher energy.

Energy Plot for Protein s.rhodopsin

Energy Plot for Protein mad2

More under construction ...

top

* All documents in PDF format require the free Adobe Acrobat Reader application for viewing

[ Home ] [ NIH ] [ NIDDK ] [ Terms of Use ]
last update: April 6 2015 / sy