A DYNAMO Cookbook

How to use DYNAMO--A system for structure determination from NMR data

by John Kuszewski

Table of Contents

1. Introduction--what dynamo does & how it does it

constraints into coordinates

types of constraints

MD and simulated annealing

Fragment reconstruction

1A. The DYNAMO energy function

2. dynamo project organization

2A. The GMC Editor

2B. GMC files & their contents

2C. The format of the GMC restraint tables

2D. The simulated annealing GUI and an early peek at dynamo's simulated annealing process

3. Getting started--converting a project from xplor

Step 1: create the GMC file & its covalent geometry tables

Step 2: convert the experimental restraints tables

Step 3: edit the default radius of gyration table

4. Running a structure

4A. A typical example

4B. Simple modifications

4C. Calculating an average structure

4D. Comparing the refined average structure to a known structure

5. Creating new residue types

How dynamo stores information about residues

Step 1--create a new TCL add... procedure for your new residue type

Step 2--Let the GMC editor know about your new residue type

Step 3--update the TCL code index

Step 4--open up the GMC editor and admire your new residue type

1. Introduction

DYNAMO is a computer program that determines structures of proteins and nucleic acids which are consistent with a set of NMR-derived constraints. Structure determination from NMR data is particularly challenging because the number of structural constraints available from NMR spectroscopy is small compared to the number of degrees of freedom in a biological macromolecule. Successful NMR structure determination therefore relies on two approaches: maximizing the number and variety of experimental constraints, and maximizing the number and accuracy of other constraints that can be determined a priori.

DYNAMO is able to use a variety of NMR-derived constraints: NOE distances, torsion angles, 3J couplings, and dipolar couplings. In addition, backbone chemical shifts can be translated into backbone torsion angle constraints through the use of the program TALOS.

In addition to experimental constraints, DYNAMO also incorporates structural restraints that are known a priori. There are two classes of a priori restraints: covalent geometry restraints, and DELPHIC restraints. The covalent geometry of a macromolecule is restrained to conform to the bond lengths, bond angles, chiralities, etc., that are expected from small-molecule xray crystal structures. Larger sections of proteins and nucleic acids, such as groups of related torsion angles, can be restrained to fall into one or more reasonable conformations using DELPHIC constraints.

DYNAMO uses molecular dynamics based simulated annealing to find sets of atomic coordinates that are consistent with the experimental and a priori constraints. Molecular dynamics is a computational technique that uses numerical integration to simulate Newton's equations of motion. Simulated annealing is a computational technique for function optimization. In simulated annealing, a system's temperature begins at a high value and is slowly reduced. Thus, early in a simulated annealing procedure, the system has sufficient kinetic energy to jump over large potential energy barriers. But as the system cools, its movement is gradually restricted to lower and lower energy structures. A sufficiently high starting temperature ensures that all the relevant conformational space is sampled, and a sufficiently slow cooling rate ensures that a very low energy structure is finally chosen.

DYNAMO also includes tools for generating starting structures from dipolar coupling data alone. A database of known protein structure fragments is consulted, and their expected dipolar coupling patterns are calculated. Fragments whose expected dipolar couplings match those observed for a given region of a protein can then be used to predict its structure. This fragment reconstruction approach is able to generate a structure that is reasonably close to the true structure. It can be used as a starting point for molecular dynamics simulated annealing refinement, ensuring that the most relevant parts of conformational space are explored thouroughly.

1A. The DYNAMO energy function

Molecular dynamics uses numerical integration to simulate a system's movement by Newton's equations of motion. The forces on each atom are the partial derivatives of the potential energy function. DYNAMO's potential energy function includes terms that describe the current coordinates' agreement with both experimental and a priori constraints:

Etot = Ebond + Eangle + Eimpr + Evdw + Enoe + Ejcoup + Etorsion + Edipo + Ergyr

where

Etot is the total energy.

Ebond describes the structure's agreement with expected covalent bond lengths:

	Ebond = kbond Σ (expected_distance - actual_distance)2

Eangle describes the structure's agreement with expected covalent bond angles:

	Eangle = kangle Σ (expected_angle - actual_angle)2

Eimpr describes the structure's agreement with expected "improper" torsion angles. Improper torsion angles are constraints on the geometry of groups of four atoms that are used to maintain correct chirality of chiral centers and the planarity of planar groups.

	Eimpr = kimpr Σ (expected_impr - actual_impr)2

Evdw describes the structure's steric overlap, using a simple repulsive potential:

	Evdw = kvdw Σ ((size_scale * expected_min_distance)2 - actual_distance2)2

Enoe describes the structure's agreement with observed NOE distances. It uses a "square well" potential that is zero if the actual distance is within a given range:

	Enoe = knoe Σ ...

Ejcoup describes the structure's agreement with observed 3J couplings. Karplus parameters can be specified for each type of 3J coupling.

	Ejcoup = kjcoup Σ (observed_J - calculated_J)2
	
	calculated_J = A cos2 (phi + phase) + B cos (phi + phase) + C

Etorsion describes the structure's agreement with experimental torsion angle constraints. Like the NOE term, it uses a square well potential:

	Etorsion = ktorsion Σ delta_torsion2

	delta_torsion = actual_torsion - observed_torsion_upperbound 
			if actual_torsion > observed_torsion_upperbound, 
		= actual_torsion - observed_torsion_lowerbound
			if actual_torsion < observed_torsion_lowerbound,
		= 0 otherwise

Edipo describes the structure's agreement with experimentally observed dipolar couplings. Data from two different alignment tensors can be used simultaneously. The expected dipolar couplings are calculated from the structure as described in (???)

	Edipo = kdipo Σ (observed_dipo - calculated_dipo)2

Ergyr describes the structure's agreement with its expected overall size. NMR structures are typically expanded relative to crystal structures, because there are so many forces from the vdw term pushing atoms apart, with relatively few forces from the NOE term pulling them back together. The radius of gyration is a measure of a structure's overall size--specifically, it is the RMS distance of a group of atoms to their centroid. It can be predicted reasonably accurately for globular proteins just from the number of residues:

	expected_Rgyr = 2.2 Nresidues0.38

	Ergyr = krgyr Σ (expected_Rgyr - actual_Rgyr)2

2. DYNAMO project organization

DYNAMO organizes a structure calculation project in a "GMC" (Generalized Molecular Coordinate) file. A GMC file is actually a directory that contains tables of covalent geometry restraints, experimental restraints, and PDB files calculated from them.

2A. The GMC Editor

Dynamo includes a graphical tool to create and edit GMC files. It's called the GMC Editor. The executable is in (NMRBASE)/dynamo/tcl/gmcEdit. To create a new GMC file with it, just type

gmcEdit

To examine or modify an existing GMC file, type

gmcEdit an_existing_GMC_file_name.gmc

On the left, there is a section to create, edit, and delete structure segments. A segment is a single protein or nucleic acid chain, or a group of separate small molecules that you wish to think of as a unit (for example, a bunch of water molecules). To create a new segment, click the "new segment" button, select the line in the segment selection list, and click "edit segment."

Set the segment's type (protein, DNA, RNA, or other) with the radio buttons. The sequence can be typed in directly, or entered by pressing the buttons with residue type labels. Each of the valid residue types for each segment type has a corresponding button. Proteins have additional check boxes to end the segment with an amine group on the N terminus or an acid group on the C terminus (these boxes should ordinarily be checked). Nucleic acids have additional check boxes to add phosphates to their 5' or 3' ends.

When you're done entering the sequence of a given segment, click the button marked "save changes to this segment".

On the top right, there is a section to create, edit, and delete disulfide bonds. Since disulfides can connect residues in different segments, they are edited separately in the lower region of the window. As with segments, start by clicking the "new disulfide" button, select the disulfide in the list, and then click "edit disulfide." Each disulfide simply needs to know the segment names and residue numbers of the cysteines to be connected. It doesn't matter which one is designated the "from" residue and which is the "to" residue. As with the segment editor, when you're done entering a disulfide, click the "save changes to this disulfide" button.

On the bottom right, there is a section to create, edit, and delete molecular symmetry records. These are used to force two segments to take up identical and symmetric structures (as is appropriate for dimers). As with segments, start by clicking the "new symmetry" button, select the symmetry in the list, and then click "edit symmetry." Each symmetry restraint simply needs to know the names of the segments to be connected. It doesn't matter which one is designated the "from" segment and which is the "to" segment. When you're done entering a symmetry restraint, click the "save changes to this symmetry" button.

Once you've entered all your segments and disulfides, click the "generate GMC" button at the bottom. DYNAMO will first make sure that the information you've entered is valid (eg., that the residues selected for a disulfide are actually cys, that each segment has a unique name, etc). Then it will create the GMC file directory and the tables within it that store the covalent geometry restraints. This process can take a minute or two.

Finally, quit the program with the button marked "quit." (of course)

As an example, we will create a GMC file for an imaginary protein/DNA complex called BOB. This complex is made up of a dimer of peptides that are linked by an intermolecular disulfide bond, together with a symmetric double stranded DNA molecule.

  1. Go to the dynamo/demos directory. Type gmcEdit at the prompt to bring up the GMC Editor.
  2. Set the GMC directory name to "bob".
  3. Create a new segment. Edit it. Change its name to "BOBA". Set its starting residue number to 1. Set its type to "protein." Enter the sequence ala gln leu cys ile phe asp. Save the changes to that segment.
  4. Create another segment. Edit it. Change its name to "BOBB". Make all settings and its sequence identical to BOBA. Save the changes.
  5. Create another segment. Edit it. Change its name to "DNAA". Set its starting residue number to 100. Enter the sequence dAde dGua dThy dCyt. Save the changes.
  6. Create another segment. Edit it. Change its name to "DNAB". Make all settings and its sequence identical to DNAA. Save the changes.
  7. Create a disulfide bond. Edit it. Set the "from" segment to BOBA. Set the from residue number to 4. Set the "to" segment to BOBB. Set the to residue number to 4. Save the changes.
  8. Create a symmetry restraint. Edit it. Set the "from" segment to BOBA. Set the "to" segment to "BOBB". Save the changes.
  9. Create a symmetry restraint. Edit it. Set the "from" segment to DNAA. Set the "to" segment to DNAB. Save the changes.
  10. Click the "Generate GMC" button at the bottom right. The information that we've entered is first checked for consistency, and then the contents of the GMC directory are generated. This might take a minute.
  11. Click the "Quit" button.

2B. The files within a GMC

A GMC directory contains several files that define the covalent geometry of the system, together with files that define the experimental constraints (NOE distances, dipolar couplings, 3J couplings, and torsion angles) and other a priori constraints. They are:

gmc_record.txt

The file read by the GMC editor to determine the contents of this GMC directory. Not intended to be edited by hand--it's created and modified by the GMC Editor.

atoms.tab

Defines atomic mass and van der Waals parameters for each atom. Generated automatically from the sequence, using the GMC editor (Section 2C).

bonds.tab

Defines covalent bond length restraints. Generated automatically from the sequence, using the GMC editor.

angles.tab

Defines covalent bond angle restraints. Generated automatically from the sequence, using the GMC editor.

impropers.tab

Defines improper torsion angle restraints, which are used to maintain the correct geometry of planar groups and chiral centers. Generated automatically from the sequence, using the GMC editor.

vdwex.tab

Defines atom pairs for which the van der Waals interaction is not to be calculated. These are typically atoms that are connected by one, two, or three covalent bonds. Generated automatically from the sequence, using the GMC editor.

radgyr.tab

Defines radius-of-gyration restraints. By default, the entire system is included in a single Rgyr restraint, with a target value calculated as described in Section 1A. Users should edit this file to remove disordered residues, or to add multiple restraints for complex structures. Generated automatically from the sequence, using the GMC editor.

ntor.tab

Defines names for the torsion angles that correspond to each rotatable bond in the system. Generated automatically from the sequence, using the GMC editor.

random.pdb

A PDB file that contains a randomized set of starting coordinates. These are reasonably close to an extended strand, but have poor covalent geometry. These coordinates are used as a starting structure by several simulated annealing protocols. Generated automatically from the sequence, using the GMC editor.

noes.tab

Defines the NOE distance restraints. See Section 2C for a detailed description of the format. Can be generated automatically from an xplor format NOE table, or by hand.

torsions.tab

Defines experimental restraints on torsion angles. See Section 2C for a detailed description of the format. Can be generated automatically from an xplor format dihedral restraints table, or by hand.

jcoup.tab

Defines experimental 3J coupling restraints. See Section 2C for a detailed description of the format. Can be generated automatically from an xplor format jcoupling table, or by hand.

dObsA.tab and dObsB.tab

Define experimental dipolar coupling restraints, for two different alignment tensors ("A" and "B"). See Section 2C for a detailed description of the format. Can be generated automatically from an xplor format dipolar coupling table, or by hand.

csObs.tab

Defines experimental chemical shift restraints. These are only used by the fragment reconstruction method right now. See section 2C for a detailed description of the format. Must be generated by hand.

2C. The format of the GMC restraint tables

The constraint files within a GMC directory are all variations on the nmrPipe table format. For example, the table of covalent bond length constraints ("bonds.tab") looks like:

VARS SEGNAME_I RESNAME_I RESID_I ATOMNAME_I SEGNAME_J RESNAME_J RESID_J ATOMNAME_J D FC
FORMAT %8s %3s %4d %4s %8s %3s %4d %4s %8.3f %8.3f
    UBIQ MET    1    C     UBIQ MET    1    O    1.231 1000.000
    UBIQ MET    1   CA     UBIQ MET    1    C    1.525 1000.000
    UBIQ MET    1   CA     UBIQ MET    1   CB    1.530 1000.000
    UBIQ MET    1   CA     UBIQ MET    1   HA    1.080 1000.000
    UBIQ MET    1   CB     UBIQ MET    1   CG    1.530 1000.000

The first two lines identify the contents of each data column. The two atoms involved are identified by their segment name, residue name, residue number, and atom name. Other information relevant to the constraint (the equilibrium bond length in Ångstroms and the force constant in kcal/mol/Å2) are defined in the last two columns.

Similar formats are used to define the other covalent geometry restraints in atoms.tab, angles.tab, and impropers.tab. Since these are generated automatically by the GMC Editor, they rarely need to be examined or edited.

The experimental restraints tables follow a similar format. For example, the torsion angle restraint table looks like

VARS   INDEX  SEGNAME_I RESID_I RESNAME_I ATOMNAME_I SEGNAME_J RESID_J RESNAME_J ATOMNAME_J SEGNAME_K RESID_K RESNAME_K ATOMNAME_K SEGNAME_L RESID_L RESNAME_L ATOMNAME_L  ANGLE_LO  ANGLE_HI FC
FORMAT %4d %4d %4s %4s %4s %4d %4s %4s %4s %4d %4s %4s %4s %4d %4s %4s %4s %8.3f %8.3f %8.3f

   1   UBIQ  1 MET C    UBIQ   2 GLN N    UBIQ   2 GLN CA   UBIQ   2 GLN C   -138.00  -66.00    1.0
   2   UBIQ  2 GLN C    UBIQ   3 ILE N    UBIQ   3 ILE CA   UBIQ   3 ILE C   -173.00 -109.00    1.0
   3   UBIQ  3 ILE C    UBIQ   4 PHE N    UBIQ   4 PHE CA   UBIQ   4 PHE C   -149.00  -93.00    1.0
   4   UBIQ  4 PHE C    UBIQ   5 VAL N    UBIQ   5 VAL CA   UBIQ   5 VAL C   -163.00  -75.00    1.0
   5   UBIQ  5 VAL C    UBIQ   6 LYS N    UBIQ   6 LYS CA   UBIQ   6 LYS C   -133.00  -85.00    1.0
   6   UBIQ  6 LYS C    UBIQ   7 THR N    UBIQ   7 THR CA   UBIQ   7 THR C   -142.00  -58.00    1.0
   7   UBIQ  7 THR C    UBIQ   8 LEU N    UBIQ   8 LEU CA   UBIQ   8 LEU C    -82.00  -42.00    1.0

In this file, an "index" column has been added to the beginning of each line to identify each constraint for later evaluation. Sixteen columns are needed to specify the segment name, residue number, residue name, and atom name of each of the four atoms that define the torsion angle. Finally, two columns give the lower and upper bounds on the angle (in degrees), followed by a force constant (in kcal/mol/rad2).

The 3J couplings restraints table has a format similar to the torsion angle restraints:

VARS INDEX SEGNAME_I RESID_I RESNAME_I ATOMNAME_I SEGNAME_J RESID_J RESNAME_J ATOMNAME_J SEGNAME_K RESID_K RESNAME_K ATOMNAME_K SEGNAME_L RESID_L RESNAME_L ATOMNAME_L A B C PHASE OBSJ FC 
FORMAT %3d %5d %4s %4s %4s %5d %4s %4s %4s %5d %4s %4s %4s %5d %4s %4s %4s %9.5f %9.5f %9.5f %9.5f %9.3f %.2f 

   1    UBIQ  1 MET  C    UBIQ   2 GLN  N    UBIQ   2 GLN  CA   UBIQ   2 GLN  C     1.74000  -0.57000   0.25000  240.00000    2.30 1.0 
   2    UBIQ  3 ILE  C    UBIQ   4 PHE  N    UBIQ   4 PHE  CA   UBIQ   4 PHE  C     1.74000  -0.57000   0.25000  240.00000    1.30 1.0 
   3    UBIQ  4 PHE  C    UBIQ   5 VAL  N    UBIQ   5 VAL  CA   UBIQ   5 VAL  C     1.74000  -0.57000   0.25000  240.00000    0.80 1.0 
   4    UBIQ  5 VAL  C    UBIQ   6 LYS  N    UBIQ   6 LYS  CA   UBIQ   6 LYS  C     1.74000  -0.57000   0.25000  240.00000    1.60 1.0 

Once again, the first column defines an "index" number for each constraint. The next sixteen columns select the four atoms involved in the torsion angle. Three more columns define the Karplus parameters A, B, and C for that 3J coupling (where Jcalc = A cos2 theta + B cos theta + C). The next column allows a phase correction (in degrees) to be applied to the Karplus calculation. This phase correction is necessary in cases where the atoms selected to define the torsion angle (in the above example, C..N..Ca..C atoms are used to define a backbone phi angle) are different from the atoms involved in creating the observed coupling (in this case, HN..Ha). Finally, the observed coupling constant (in Hz) is defined, together with a force constant (in kcal/mol/Hz2).

The format of the dipolar couplings restraints file is:

VARS   SEGNAME_I RESID_I RESNAME_I ATOMNAME_I SEGNAME_J RESID_J RESNAME_J ATOMNAME_J D      DD    W
FORMAT %5d     %6s       %6s        %5d     %6s       %6s    %9.3f   %9.3f %.2f

UBIQ    1    MET      C    UBIQ   2    GLN     HN       3.993     0.333 3.00
UBIQ    2    GLN      C    UBIQ   3    ILE     HN      -5.646     0.333 3.00
UBIQ    3    ILE      C    UBIQ   4    PHE     HN       1.041     0.333 3.00
UBIQ    4    PHE      C    UBIQ   5    VAL     HN       0.835     0.333 3.00

The first eight columns select the two atoms involved in the dipolar coupling. The next column gives the observed value of the coupling (in Hz). The next column gives an indication of the possible range of values for this coupling, relative to the HN..N coupling. The final column defines a force constant (in kcal/mol/Hz2).

The NOE table format is somewhat more complex than the other experimental restraints tables:

VARS INDEX GROUP RESID_I RESNAME_I ATOMNAME_I SEGNAME_I RESID_J RESNAME_J ATOMNAME_J SEGNAME_J D_LO D_HI FC W S
FORMAT %4d %3d  %5d %6s %6s %4s  %5d %6s %6s %4s  %9.3f %9.3f %.2f %.2f %.2f

   1  1    UBIQ MET  1  HA      UBIQ MET  1  HG1     1.800   4.459 1.0 1.0 1.0
   1  2    UBIQ MET  1  HA      UBIQ MET  1  HG2     1.800   4.459 1.0 1.0 1.0
   2  1    UBIQ MET  1  HA      UBIQ MET  1  HG1     1.800   4.060 1.0 1.0 1.0
   2  2    UBIQ MET  1  HA      UBIQ MET  1  HG2     1.800   4.060 1.0 1.0 1.0
   3  1    UBIQ MET  1  HB2     UBIQ MET  1  HE1     1.800   3.565 1.0 1.0 1.0
   3  1    UBIQ MET  1  HB2     UBIQ MET  1  HE2     1.800   3.565 1.0 1.0 1.0
   3  1    UBIQ MET  1  HB2     UBIQ MET  1  HE3     1.800   3.565 1.0 1.0 1.0
   4  1    UBIQ MET  1  HE1     UBIQ MET  1  HG1     1.800   3.155 1.0 1.0 1.0
   4  1    UBIQ MET  1  HE2     UBIQ MET  1  HG1     1.800   3.155 1.0 1.0 1.0
   4  1    UBIQ MET  1  HE3     UBIQ MET  1  HG1     1.800   3.155 1.0 1.0 1.0
   4  2    UBIQ MET  1  HE1     UBIQ MET  1  HG2     1.800   3.155 1.0 1.0 1.0
   4  2    UBIQ MET  1  HE2     UBIQ MET  1  HG2     1.800   3.155 1.0 1.0 1.0
   4  2    UBIQ MET  1  HE3     UBIQ MET  1  HG2     1.800   3.155 1.0 1.0 1.0

As in the 3J coupling restraint file, the first column is an index number to identify a particular constraint. Columns 3-10 identify two atoms whose distance will be used in calculating the NOE energy. The next two columns define that restraint's lower and upper distance bounds (in Ångstroms), and the last column sets a force constant for that restraint (in kcal/mol/Å2).

But there are two differences in the NOE restraints file's format. The first is that more than one line of the file can be given the same index number. Having more than one line with the same index number tells dynamo that these lines are all part of a single NOE constraint. (The meaning of that statement will become clear momentarily.) The second difference in the NOE restraints file format is the addition of the "group" column. Like the "index" column, the group column contains an integer. It is used in combination with the index column to define different possible assignments for a particular NOE restraint. Lines in the NOE restraints file which have the same index number and the same group number are treated as part of a single possible assignment for an NOE. The interatomic distances from each line's atoms are averaged together using an R-6 sum average. Lines (or groups of lines) with the same index number but different group numbers are treated as separate possible assignments for a given NOE.

2D. The Simulated Annealing GUI and an early peek at dynamo's simulated annealing process

Once you've created a GMC file, you'll need to add your experimental constraints tables to it (see sections 2A and 2B for their names and formats). You can type them in with an ordinary text editor. Section 3? will describe how to automatically convert xplor constraints tables into their dynamo equivalents.

Once you have a GMC directory that contains all the experimental constraints you want, it's time to calculate a structure. Structure calculations in dynamo are controlled by short TCL scripts. These automatically create a graphical interface to let you follow the progress of the calculation. We'll discuss the details of the TCL files that calculate structures in section 4.

To run an existing structure calculation script, just type its filename. For example, (NMRBASE)/dynamo/demos/ubiq/ubiqSA.tcl starts the script and brings up the simulated annealing GUI:

The current time step number is reported in the title bar of the window. The panels in the left column report the current temperature (in K), total potential energy (in kcal/mol), total energy gradient (in kcal/mol/Å), the RMS difference between observed and calculated dipolar couplings (in Hz), the RMS van der Waals overlap (calculated only for those vdW pairs that are overlapped at all and expressed in Å), and the RMS difference between the actual and expected covalent bond lengths (in Å). The panels in the right column graph the RMS bond angle violations (in degrees), the RMS improper torsion angle violations (in degrees), the RMS experimental torsion angle violations (in degrees), the RMS 3J coupling violations (in Hz), the RMS NOE distance violation (in Å), and the RMS radius of gyration violation (in Å).

Looking at the resulting graphs of temperatures and violations, several features of dynamo's standard simulated annealing schedule can be discovered. Looking at the temperature graph, it is clear that the annealing schedule has three phases. The initial phase is about 500 dynamics steps long, with very tight control over the system's temperature (of 4000K). During this phase, the structure's agreement with covalent bond length, bond angle, and improper torsion angle restraints all undergo significant improvements. The van der Waals overlap, 3J coupling, and dipolar coupling violations appear to be zero, but this actually indicates that the vdW, 3J, and dipolar coupling energy terms are turned off completely during this phase.

The second phase begins with a brief spike in the temperature, which then returns to roughly the starting point. This phase continues for a longer period--roughly 2000 dynamics steps. During this phase, the NOE restraints continue to improve, but few other structural features change significantly. The van der Waals, 3J, and dipolar coupling terms continue to be turned off. The radius of gyration shows several oscillations.

Finally, the last and longest phase begins. The 3J and dipolar coupling violations and van der Waals overlap begin to be reported and show rapid improvements in their agreement. The improper torsion angles show a sudden improvement in their agreement. As the temperature cools from 4000K down to 0, the calculated values of the experimental torsion angle restraints and 3J couplings finally come into agreement with their target values.

So why does dynamo's simulated annealing schedule behave in this manner? When the trajectory begins from a randomized PDB file (as it does in this example), the "crunched up" covalent structure is like a tightly stretched spring, storing a large amount of potential energy. Starting free dynamics from this structure would cause its potential energy to be converted to kinetic energy, sending the system's temperature to millions of kelvins. By maintaining absolute control over the system's temperature during this initial regularization phase, the covalent geometry can be quickly improved without blowing up the dynamics trajectory. The van der Waals energy is turned off during this phase to improve conformational sampling by allowing atoms to move through each other. Because the 3J and dipolar couplings are so sensitive to the correct structure, they offer little guidance at this early stage, when the coordinates are far from correct.

In the second, high temperature phase, the structure undergoes large motions as seen in the NOE and radius of gyration violations graphs. The structural collapse that began during the regularization phase continues rapidly. Because there is no van der Waals energy, dynamo attempts to satisfy all the NOE distance constraints simultaneously simply by squashing all the atoms on top of each other. The radius of gyration of this overlapped structure is obviously too small, and the radius of gyration restraints attempt to wrench it open against the pull of the NOEs. This is an important process, since it ensures that the structure is globally correct (as shown by the low NOE violations at the end of the high temperature phase) without being too small (as shown by the reduced radius of gyration violations at the end of the phase). The structure remains too far from correct for the 3J or dipolar couplings to do much good, so they remain turned off.

In the final phase, the details of the structure are finally sorted out. The temperature gradually drops, preventing the system from jumping out of the globally correct region that was found at the end of the high temperature phase. The van der Waals overlap energy is turned on gradually, eliminating atomic overlap without causing undue spikes in temperature. The force constant on the improper torsion angle restraints is switched to its final value, resulting in a sudden improvement in the impropers' agreement. 3J and dipolar couplings are finally usable as structural constraints. By the end of the annealing schedule, the system's agreement with all of its restraints--experimental and a priori--is usually excellent.

3. Getting started--converting a project from xplor

Many dynamo users have used xplor in the past, so we will discuss the process of converting a project from xplor's collection of restraint tables to dynamo's GMC format.

Step 1--create the GMC file and its covalent geometry tables

Open the GMC Editor and enter in the structure's covalent geometry as described in section 2A. Save the GMC.

Step 2--convert the experimental restraints tables

Creating the experimental restraints tables can be laborious, but converting xplor tables into dynamo's format is largely automatic. A simple shell script can call all the appropriate conversion scripts in turn:
#!/bin/csh

set pdbName = ../ubiq.gmc/random.pdb

talos2dyn.tcl      -in talosPred.tab         -pdb $pdbName -seg UBIQ > torsions.tab
jConvert.tcl       -in ubiq_expt_jcoup.tbl   -pdb $pdbName -seg UBIQ > jcoup.tab
noeConvert.tcl     -in ubiq_complete_noe.tbl -pdb $pdbName -seg UBIQ > noes.tab
dcConvert.tcl      -in dipolars_expt_A.tbl   -pdb $pdbName -seg UBIQ > dObsA.tab
dcConvert.tcl      -in dipolars_expt_B.tbl   -pdb $pdbName -seg UBIQ > dObsB.tab
torsionConvert.tcl -in expt_torsions.tbl     -pdb $pdbName -seg UBIQ > torsions.tab

The purpose of each line is self-evident. Each one reads in a particular experimental constraint file in xplor format (except for talos2dyn, which reads talos output). Each also reads a corresponding PDB file so that the script knows the appropriate sequences. Since many xplor users never define a segment name in single-domain structure calculations, a default segment name can be defined. Finally, the converted restraints are written out to dynamo tables with the specific names listed in section 2B.

Step 3--edit the default radius of gyration restraints table

The radius of gyration restraints file generated by the GMC editor often requires editing by hand. The proper application of the radius of gyration restraint is described fully in J. Am. Chem. Soc. 121, 2337 (1999). Briefly, there are several rules to follow in defining radius of gyration restraints. First, disordered residues should not be included in a radius of gyration restraint. Second, non-globular structures should be broken up into several roughly globular sections, each with an independent Rgyr restraint. Third, nucleic acids should not have their own radius of gyration restraints. But, fourth, structures of complexes (protein/DNA or protein/protein) should have intermolecular radius of gyration restraints to ensure close packing of interfaces.

Create radius of gyration restraints as appropriate for your structure, following the guidelines above. The file format is described in section 2C.

4. Running a structure

4A. A typical example

Here is an example TCL script that calculates a single structure of ubiquitin (from (NMRBASE)/dynamo/demos/ubiq/ubiqSA.tcl):

#!/bin/sh
# The next line restarts using nmrWish \
exec nmrWish "$0" -- "$@"

set auto_path "[split $env(TCLPATH) :] $auto_path"
set auto_path [linsert $auto_path 0 $env(NMRBASE)/dynamo/tcl]

set gmcDir   ubiq.gmc
set inName   random.pdb
set outName  dyn.pdb
set randSeed 54321 

#
#-------------------------------------------------------------

dynInit

showDynLogo

readGMC $gmcDir $inName gmc

srand $randSeed

simulateAnnealing -gmcName gmc \
       -initTemperature 4000 -numInitSteps  500 \
       -highTemperature 4000 -numHighSteps 2000 \
       -coolStartTemperature 4000 -coolEndTemperature 0 -numCoolSteps 12000 \
       -printEnergyEvery 10 -graph

dynWrite -pdb -src $gmc(pdb) -out $gmcDir/$outName -rem $dynEnergyText

showDynDone

#exit 

The first six lines don't need to be modified. The next set of lines define TCL variables that contain some basic information for dynamo: the name of the GMC directory, the name of the PDB file that will provide the starting coordinates, the name of the output PDB file, and a random number seed. You'll need to modify these lines appropriately. Note that in this example, we're beginning from the randomized coordinates that are automatically generated by the GMC editor (in the file random.pdb).

Since we've specified the complete set of constraints (via the GMC file name), the starting coordinates, and the random number seed, this TCL script will generate exactly the same output coordinates each time it's run.

The TCL code below the dividing line actually performs the calculation. You can modify it to change, for example, the number of structures to be calculated or the details of the annealing schedule. But you don't have to--the standard code works almost all the time without modification.

The first two commands are trivial--they turn on dynamo and show off its logo in a GUI window.

The third command calls a TCL procedure named readGMC (defined in (NMRBASE)/dynamo/tcl/read_GMC.tcl). Given the name of the GMC directory, it looks for each of the standard files within it (listed in section 2A) and reads them in turn. It records the results into a TCL array (called 'gmc' in this case).

The fourth command simply initializes the random number generator with the value given in the randSeed variable.

The fifth command is the heart of the structure calculation. It calls a TCL procedure named simulateAnnealing (defined in (NMRBASE)/dynamo/tcl/simulate_annealing.tcl). Given the parsed results of readGMC (including the starting coordinates) and the correctly initialized random number generator, it applies dynamo's standard three-phase annealing protocol to solve the structure. The protocol is outlined in section 2D. The temperature and length of each phase can be defined with flags to the simulateAnnealing command, as shown in this example. The final two flags to simulateAnnealing define how often the procedure should print out the current temperature, potential energy, and violations (in this example, these statistics will be printed every 10 dynamics steps) and indicate that the results should also be graphed in the simulated annealing GUI (section 2D).

The sixth command writes out the final coordinates to a PDB file. The dynWrite command is given flags that indicate the type of file to write (-pdb), the location of the final coordiantes (-src $gmc(pdb)), the name of the file (-out $gmcDir/$outName), and some text to be placed in REMARKS lines at the head of the file (-rem $dynEnergyText). This last flag uses a variable defined automatically during the simulatedAnnealing command, which contains the text of the final dynamics statistics printout.

The final command brings up a dialog box to remind you that the calculation is finished.

4B. Simple modifications

The example protocol shown above can be easily modified. For example, to calculate several structures, have a look at (NMRBASE)/dynamo/demos/ubiq/ubiqSAN.tcl:

#!/bin/sh
# The next line restarts using nmrWish \
exec nmrWish "$0" -- "$@"

set auto_path "[split $env(TCLPATH) :] $auto_path"
set auto_path [linsert $auto_path 0 $env(NMRBASE)/dynamo/tcl]

set gmcDir      ubiq.gmc
set inName      random.pdb
set outTemplate dynSAN%03d.pdb 
set iseed       54321
set count       10

#
#-------------------------------------------------------------

dynInit

showDynLogo

readGMC $gmcDir $inName gmc

for {set i 1} {$i <= $count} {incr i} \
   {
    srand $iseed

    simulateAnnealing -gmcName gmc \
       -initTemperature 4000 -numInitSteps  500 \
       -highTemperature 4000 -numHighSteps 2000 \
       -coolStartTemperature 4000 -coolEndTemperature 0 -numCoolSteps 12000 \
       -printEnergyEvery 10 -graph

    set outName [format $gmcDir/$outTemplate $i]

    dynWrite -pdb -src $gmc(pdb) -out $outName -rem $dynEnergyText
    dynRead  -pdb -src $gmc(pdb) -in  $gmcDir/$inName

    incr iseed 100
   }

showDynDone

#exit 
The script is nearly identical to the single-structure example shown before. The first difference is that the output file name is replaced with an output file name template. The number of structures to be calculated is defined by the count variable, which will also be used to generate the output filenames. The single call to readGMC remains, but the calls to simulateAnnealing and dynWrite are enclosed in a loop. After writing out each iteration's PDB file, the original starting coordinates are re-read using the dynRead command, so that each structure calculation begins at the same point. The random number seed, though, is modified for each iteration, so that different coordinates are produced each time.

4C. Calculating an average stucture

It has long been known that the average of several structures calculated from a particular set of restraints is more accurate than any of the individual members. Calculating this average structure takes three steps: Filtering out any non-converged structures, calculating the average of the remaining structures' coordinates, and refining that average structure. The first two steps can be accomplished by a single shell script ((NMRBASE)/dynamo/demos/ubiq/ubiqAvgPDB.com):
#!/bin/csh

set goodPDBList = (`noeLimit.com 0.1 ubiq.gmc/dynSAN*.pdb`)

dynAvg.tcl -align None -r1 2 -rN 72 -in $goodPDBList -out ubiq.gmc/avg.pdb 

The first line runs noeLimit.com, a script that evaluates the RMS NOE violations for each of several PDB files, and returns the names of those whose violations are less than a particular cutoff value (0.1 A in this case).

The second line runs dynAvg.tcl, which is given a list of converged PDB file names and a region of residue numbers to be included in the best fitting. It outputs the geometric average coordinates of these files into avg.pdb.

Because this average structure can have "crunched up" covalent geometry in disordered regions, the last step in creating an average structure is to refine it gently. This is accomplished by another simple modification of the ubiqSA.tcl script, called (NMRBASE)/dynamo/demos/ubiq/ubiqRefineAvg.tcl:
#!/bin/sh
# The next line restarts using nmrWish \
exec nmrWish "$0" -- "$@"

set auto_path "[split $env(TCLPATH) :] $auto_path"
set auto_path [linsert $auto_path 0 $env(NMRBASE)/dynamo/tcl]

set gmcDir   ubiq.gmc
set inName   avg.pdb
set outName  refined_avg.pdb
set randSeed 54321 

#
#-------------------------------------------------------------

dynInit

showDynLogo

readGMC $gmcDir $inName gmc

srand $randSeed

simulateAnnealing \
    -gmcName gmc  \
    -initTemperature 300 -numInitSteps 50 \
    -highTemperature 300 -numHighSteps  0 \
    -coolStartTemperature 300 -coolEndTemperature 0 -numCoolSteps 1000 \
    -printEnergyEvery 10 -graph -constantRgyr

dynWrite -pdb -src $gmc(pdb) -out $gmcDir/$outName -rem $dynEnergyText

showDynDone

#exit 

This file is nearly identical to the original ubiqSA.tcl. The only changes are to the starting coordinate file name (avg.pdb, produced by ubiqAvgPDB.com), the output file name, and the temperature and duration of the annealing phases. Because this structure begins with an accurate set of coordinates, we don't want dynamo to move the coordinates too far. Therefore, we set the starting temperature for the various phases to only 300K, and reduce the length of each as well.

The final difference is the -constantRgyr flag to the simulateAnnealing procedure. Since this structure is very accurate to begin with, the radius of gyration restraints do not have to be used to counteract inappropriate overlap during the high temperature phase (see section 2D). This flag informs the simulateAnnealing procedure of this fact and adjusts it accordingly.

4D. Comparing the refined average structure to a known structure

Check the minimized average structures' agreement with the x-ray crystal structure:

rms.tcl 1ubiq.pdb ubiq.gmc/refined_avg.pdb    2 72
rms.tcl compares two PDB files of the same structure, using only those residues between (in this case) positions 2 and 72 in the best fitting and RMSD calculation.

5. Creating new residue types

Dynamo's GMC editor contains knowledge of all 20 standard amino acid residues, all four standard DNA bases, and all four standard RNA bases. Many users, though, work with systems that contain non-natural residues. Fortunately, dynamo is easily extended to work with new residue types you can define.

How dynamo stores information about residues

Take a look at (NMRBASE)/dynamo/tcl/params_protein.tcl It is a long list of TCL procedures, along the lines of

proc add_ala {segmentName resNum} {

   add_Atom $segmentName ALA $resNum C 12.0 0.0903 3.2072 0.48
   add_Atom $segmentName ALA $resNum CA 12.0 0.0903 3.2072 0.22
   add_Atom $segmentName ALA $resNum CB 12.0 0.0903 3.2072 -0.30
   add_Atom $segmentName ALA $resNum HA 1.0 0.0045 2.6157 0.10
   add_Atom $segmentName ALA $resNum HB1 1.0 0.0045 2.6157 0.10
   add_Atom $segmentName ALA $resNum HB2 1.0 0.0045 2.6157 0.10
   add_Atom $segmentName ALA $resNum HB3 1.0 0.0045 2.6157 0.10
   add_Atom $segmentName ALA $resNum HN 1.0 0.0498 1.4254 0.26
   add_Atom $segmentName ALA $resNum N 14.0 0.1592 2.7618 -0.10
   add_Atom $segmentName ALA $resNum O 16.0 0.2342 2.6406 -0.48

   add_Bond_Intrares $segmentName ALA $resNum C O 1.231 1000.000
   add_Bond_Intrares $segmentName ALA $resNum CA C 1.525 1000.000
   add_Bond_Intrares $segmentName ALA $resNum CA CB 1.521 1000.000
   add_Bond_Intrares $segmentName ALA $resNum CA HA 1.080 1000.000
   add_Bond_Intrares $segmentName ALA $resNum CB HB1 1.080 1000.000
   add_Bond_Intrares $segmentName ALA $resNum CB HB2 1.080 1000.000
   add_Bond_Intrares $segmentName ALA $resNum CB HB3 1.080 1000.000
   add_Bond_Intrares $segmentName ALA $resNum N CA 1.458 1000.000
   add_Bond_Intrares $segmentName ALA $resNum N HN 0.98 1000.0

   add_Angle_Intrares $segmentName ALA $resNum CA C O 120.80 500.000
   add_Angle_Intrares $segmentName ALA $resNum CA CB HB1 109.500 500.000
   add_Angle_Intrares $segmentName ALA $resNum CA CB HB2 109.500 500.000
   add_Angle_Intrares $segmentName ALA $resNum CA CB HB3 109.500 500.000
   add_Angle_Intrares $segmentName ALA $resNum CB CA C 110.500 500.000
   add_Angle_Intrares $segmentName ALA $resNum HA CA C 109.500 500.000
   add_Angle_Intrares $segmentName ALA $resNum HA CA CB 109.500 500.000
   add_Angle_Intrares $segmentName ALA $resNum HB1 CB HB2 109.500 500.000
   add_Angle_Intrares $segmentName ALA $resNum HB1 CB HB3 109.500 500.000
   add_Angle_Intrares $segmentName ALA $resNum HB2 CB HB3 109.500 500.000
   add_Angle_Intrares $segmentName ALA $resNum HN N CA 120.00 500.0
   add_Angle_Intrares $segmentName ALA $resNum N CA C 111.200 500.000
   add_Angle_Intrares $segmentName ALA $resNum N CA CB 110.400 500.000
   add_Angle_Intrares $segmentName ALA $resNum N CA HA 109.500 500.000

   add_Improper_Intrares $segmentName ALA $resNum HA N C CB 65.977 500.000
   add_Improper_Intrares $segmentName ALA $resNum HB1 HB2 CA HB3 -66.514 500.000

   add_Delphic_Positioning_System_Intrares $segmentName ALA $resNum CB CA N ALA
   add_Delphic_Positioned_Atom $segmentName ALA $resNum CB ALA_CB
   add_Delphic_Positioned_Atom $segmentName ALA $resNum CA ALA_CA
   add_Delphic_Positioned_Atom $segmentName ALA $resNum N  ALA_N
   add_Delphic_Positioned_Atom $segmentName ALA $resNum O  PEP_O
   add_Delphic_Positioned_Atom $segmentName ALA $resNum C  PEP_C
   add_Delphic_Positioned_Atom $segmentName ALA $resNum N  PEP_N
}

This procedure is given the name of a segment and residue position into which this residue is to be added.

The purpose of each line is self-evident. Each atom in this residue needs to be created with a call to another TCL procedure called add_Atom. Each covalent bond length restraint is created by a call to add_Bond. Each bond angle restraint is created by a call to add_Angle, and each improper torsion angle is created by a call to add_Improper. Since DELPHIC terms are only available for the naturally occuring residue types, calls to add_Delphic... routines should be deleted when you create a new residue type.

Step 1--create a new TCL add... procedure for your new residue type

The process of creating a new residue type entails creating a new TCL procedure similar to add_Ala. Since most non-natural residue types are very similar to one or more existing residue types, it's usually best to begin by copying an appropriate add_(residue type) procedure. Rename it, but keep it in the same file: params_protein.tcl for protein residue types, params_dna_rna.tcl for nucleic acids, or params_other.tcl as appropriate.

Modify your new TCL procedure to add or delete atoms, bonds, bond angles, and impropers as appropriate. Be sure to remove all the calls to the add_Delphic... procedures, since DELPHIC terms are only available for naturally-occuring residue types.

Appropriate values for bond lengths, etc., can often be guessed by examining other add_(residue) TCL procedures, or by examining the interatomic distances and angles in a crystal structure that contains your new residue type.

Step 2--Let the GMC editor know about your new residue type

Take a look at (NMRBASE)/dynamo/tcl/parameter_index.tcl:
proc parameter_Index {kind} {

    set result {}

    set kind [string tolower $kind]


# protein residue types


    if {($kind == "ala") || ($kind == "protein") || ($kind == "all")} {
	lappend result {ala protein add_ala}
    }
    
    if {($kind == "arg") || ($kind == "protein") || ($kind == "all")} {
	lappend result {arg protein add_arg}
    }

    if {($kind == "asn") || ($kind == "protein") || ($kind == "all")} {
	lappend result {asn protein add_asn}
    }

and so forth.

This is the place where the GMC Editor looks for information about the residue types it knows about. Each residue type has a similar bit of code. If the procedure is asked for information about that specific residue type, or about all residues of its class (protein, DNA, RNA, or other), or if the procedure is asked for information about all residue types, a TCL list is added to the result. It contains three pieces of information: The name of the residue (which will be used to label the appropriate button in the GUI), the type of segment it can enter into, and the name of the TCL procedure that can create one of these residues.

Modify parameter_index.tcl appropriately to add the information about your new residue type.

Step 3--update the TCL code index

Remain in the (NMRBASE)/dynamo/tcl directory, and type

index.tcl

at the prompt.

Step 4--open up the GMC editor and admire your new residue type

Open the GMC Editor, and a button for your new residue type should appear in the appropriate section of the GUI. Go ahead and try building a structure with your new residue type.