How to use DYNAMO--A system for structure determination from NMR data
by John Kuszewski
constraints into coordinates
types of constraints
MD and simulated annealing
Fragment reconstruction
1A. The DYNAMO energy function
2A. The GMC Editor
2B. GMC files & their contents
2C. The format of the GMC restraint tables
2D. The simulated annealing GUI and an early peek at dynamo's simulated annealing process
Step 1: create the GMC file & its covalent geometry tables
Step 2: convert the experimental restraints tables
Step 3: edit the default radius of gyration table
4A. A typical example
4B. Simple modifications
4C. Calculating an average structure
4D. Comparing the refined average structure to a known structure
How dynamo stores information about residues
Step 1--create a new TCL add... procedure for your new residue type
Step 2--Let the GMC editor know about your new residue type
Step 3--update the TCL code index
Step 4--open up the GMC editor and admire your new residue type
DYNAMO is a computer program that determines structures of proteins and nucleic acids which are consistent with a set of NMR-derived constraints. Structure determination from NMR data is particularly challenging because the number of structural constraints available from NMR spectroscopy is small compared to the number of degrees of freedom in a biological macromolecule. Successful NMR structure determination therefore relies on two approaches: maximizing the number and variety of experimental constraints, and maximizing the number and accuracy of other constraints that can be determined a priori.
DYNAMO is able to use a variety of NMR-derived constraints: NOE distances, torsion angles, 3J couplings, and dipolar couplings. In addition, backbone chemical shifts can be translated into backbone torsion angle constraints through the use of the program TALOS.
In addition to experimental constraints, DYNAMO also incorporates structural restraints that are known a priori. There are two classes of a priori restraints: covalent geometry restraints, and DELPHIC restraints. The covalent geometry of a macromolecule is restrained to conform to the bond lengths, bond angles, chiralities, etc., that are expected from small-molecule xray crystal structures. Larger sections of proteins and nucleic acids, such as groups of related torsion angles, can be restrained to fall into one or more reasonable conformations using DELPHIC constraints.
DYNAMO uses molecular dynamics based simulated annealing to find sets of atomic coordinates that are consistent with the experimental and a priori constraints. Molecular dynamics is a computational technique that uses numerical integration to simulate Newton's equations of motion. Simulated annealing is a computational technique for function optimization. In simulated annealing, a system's temperature begins at a high value and is slowly reduced. Thus, early in a simulated annealing procedure, the system has sufficient kinetic energy to jump over large potential energy barriers. But as the system cools, its movement is gradually restricted to lower and lower energy structures. A sufficiently high starting temperature ensures that all the relevant conformational space is sampled, and a sufficiently slow cooling rate ensures that a very low energy structure is finally chosen.
DYNAMO also includes tools for generating starting structures from dipolar coupling data alone. A database of known protein structure fragments is consulted, and their expected dipolar coupling patterns are calculated. Fragments whose expected dipolar couplings match those observed for a given region of a protein can then be used to predict its structure. This fragment reconstruction approach is able to generate a structure that is reasonably close to the true structure. It can be used as a starting point for molecular dynamics simulated annealing refinement, ensuring that the most relevant parts of conformational space are explored thouroughly.
Molecular dynamics uses numerical integration to simulate a system's movement by Newton's equations of motion. The forces on each atom are the partial derivatives of the potential energy function. DYNAMO's potential energy function includes terms that describe the current coordinates' agreement with both experimental and a priori constraints:
Etot = Ebond + Eangle + Eimpr + Evdw + Enoe + Ejcoup + Etorsion + Edipo + Ergyr
where
Etot is the total energy.
Ebond describes the structure's agreement with expected covalent bond lengths:
Ebond = kbond Σ (expected_distance - actual_distance)2
Eangle describes the structure's agreement with expected covalent bond angles:
Eangle = kangle Σ (expected_angle - actual_angle)2
Eimpr describes the structure's agreement with expected "improper" torsion angles. Improper torsion angles are constraints on the geometry of groups of four atoms that are used to maintain correct chirality of chiral centers and the planarity of planar groups.
Eimpr = kimpr Σ (expected_impr - actual_impr)2
Evdw describes the structure's steric overlap, using a simple repulsive potential:
Evdw = kvdw Σ ((size_scale * expected_min_distance)2 - actual_distance2)2
Enoe describes the structure's agreement with observed NOE distances. It uses a "square well" potential that is zero if the actual distance is within a given range:
Enoe = knoe Σ ...
Ejcoup describes the structure's agreement with observed 3J couplings. Karplus parameters can be specified for each type of 3J coupling.
Ejcoup = kjcoup Σ (observed_J - calculated_J)2 calculated_J = A cos2 (phi + phase) + B cos (phi + phase) + C
Etorsion describes the structure's agreement with experimental torsion angle constraints. Like the NOE term, it uses a square well potential:
Etorsion = ktorsion Σ delta_torsion2 delta_torsion = actual_torsion - observed_torsion_upperbound if actual_torsion > observed_torsion_upperbound, = actual_torsion - observed_torsion_lowerbound if actual_torsion < observed_torsion_lowerbound, = 0 otherwise
Edipo describes the structure's agreement with experimentally observed dipolar couplings. Data from two different alignment tensors can be used simultaneously. The expected dipolar couplings are calculated from the structure as described in (???)
Edipo = kdipo Σ (observed_dipo - calculated_dipo)2
Ergyr describes the structure's agreement with its expected overall size. NMR structures are typically expanded relative to crystal structures, because there are so many forces from the vdw term pushing atoms apart, with relatively few forces from the NOE term pulling them back together. The radius of gyration is a measure of a structure's overall size--specifically, it is the RMS distance of a group of atoms to their centroid. It can be predicted reasonably accurately for globular proteins just from the number of residues:
expected_Rgyr = 2.2 Nresidues0.38 Ergyr = krgyr Σ (expected_Rgyr - actual_Rgyr)2
DYNAMO organizes a structure calculation project in a "GMC" (Generalized Molecular Coordinate) file. A GMC file is actually a directory that contains tables of covalent geometry restraints, experimental restraints, and PDB files calculated from them.
Dynamo includes a graphical tool to create and edit GMC files. It's called the GMC Editor. The executable is in (NMRBASE)/dynamo/tcl/gmcEdit. To create a new GMC file with it, just type
gmcEdit
To examine or modify an existing GMC file, type
gmcEdit an_existing_GMC_file_name.gmc
On the left, there is a section to create, edit, and delete structure segments. A segment is a single protein or nucleic acid chain, or a group of separate small molecules that you wish to think of as a unit (for example, a bunch of water molecules). To create a new segment, click the "new segment" button, select the line in the segment selection list, and click "edit segment."
Set the segment's type (protein, DNA, RNA, or other) with the radio buttons. The sequence can be typed in directly, or entered by pressing the buttons with residue type labels. Each of the valid residue types for each segment type has a corresponding button. Proteins have additional check boxes to end the segment with an amine group on the N terminus or an acid group on the C terminus (these boxes should ordinarily be checked). Nucleic acids have additional check boxes to add phosphates to their 5' or 3' ends.
When you're done entering the sequence of a given segment, click the button marked "save changes to this segment".
On the top right, there is a section to create, edit, and delete disulfide bonds. Since disulfides can connect residues in different segments, they are edited separately in the lower region of the window. As with segments, start by clicking the "new disulfide" button, select the disulfide in the list, and then click "edit disulfide." Each disulfide simply needs to know the segment names and residue numbers of the cysteines to be connected. It doesn't matter which one is designated the "from" residue and which is the "to" residue. As with the segment editor, when you're done entering a disulfide, click the "save changes to this disulfide" button.
On the bottom right, there is a section to create, edit, and delete molecular symmetry records. These are used to force two segments to take up identical and symmetric structures (as is appropriate for dimers). As with segments, start by clicking the "new symmetry" button, select the symmetry in the list, and then click "edit symmetry." Each symmetry restraint simply needs to know the names of the segments to be connected. It doesn't matter which one is designated the "from" segment and which is the "to" segment. When you're done entering a symmetry restraint, click the "save changes to this symmetry" button.
Once you've entered all your segments and disulfides, click the "generate GMC" button at the bottom. DYNAMO will first make sure that the information you've entered is valid (eg., that the residues selected for a disulfide are actually cys, that each segment has a unique name, etc). Then it will create the GMC file directory and the tables within it that store the covalent geometry restraints. This process can take a minute or two.
Finally, quit the program with the button marked "quit." (of course)
As an example, we will create a GMC file for an imaginary protein/DNA complex called BOB. This complex is made up of a dimer of peptides that are linked by an intermolecular disulfide bond, together with a symmetric double stranded DNA molecule.
A GMC directory contains several files that define the covalent geometry of the system, together with files that define the experimental constraints (NOE distances, dipolar couplings, 3J couplings, and torsion angles) and other a priori constraints. They are:
gmc_record.txt
The file read by the GMC editor to determine the contents of this GMC directory. Not intended to be edited by hand--it's created and modified by the GMC Editor.
atoms.tab
Defines atomic mass and van der Waals parameters for each atom. Generated automatically from the sequence, using the GMC editor (Section 2C).
bonds.tab
Defines covalent bond length restraints. Generated automatically from the sequence, using the GMC editor.
angles.tab
Defines covalent bond angle restraints. Generated automatically from the sequence, using the GMC editor.
impropers.tab
Defines improper torsion angle restraints, which are used to maintain the correct geometry of planar groups and chiral centers. Generated automatically from the sequence, using the GMC editor.
vdwex.tab
Defines atom pairs for which the van der Waals interaction is not to be calculated. These are typically atoms that are connected by one, two, or three covalent bonds. Generated automatically from the sequence, using the GMC editor.
radgyr.tab
Defines radius-of-gyration restraints. By default, the entire system is included in a single Rgyr restraint, with a target value calculated as described in Section 1A. Users should edit this file to remove disordered residues, or to add multiple restraints for complex structures. Generated automatically from the sequence, using the GMC editor.
ntor.tab
Defines names for the torsion angles that correspond to each rotatable bond in the system. Generated automatically from the sequence, using the GMC editor.
random.pdb
A PDB file that contains a randomized set of starting coordinates. These are reasonably close to an extended strand, but have poor covalent geometry. These coordinates are used as a starting structure by several simulated annealing protocols. Generated automatically from the sequence, using the GMC editor.
noes.tab
Defines the NOE distance restraints. See Section 2C for a detailed description of the format. Can be generated automatically from an xplor format NOE table, or by hand.
torsions.tab
Defines experimental restraints on torsion angles. See Section 2C for a detailed description of the format. Can be generated automatically from an xplor format dihedral restraints table, or by hand.
jcoup.tab
Defines experimental 3J coupling restraints. See Section 2C for a detailed description of the format. Can be generated automatically from an xplor format jcoupling table, or by hand.
dObsA.tab and dObsB.tab
Define experimental dipolar coupling restraints, for two different alignment tensors ("A" and "B"). See Section 2C for a detailed description of the format. Can be generated automatically from an xplor format dipolar coupling table, or by hand.
csObs.tab
Defines experimental chemical shift restraints. These are only used by the fragment reconstruction method right now. See section 2C for a detailed description of the format. Must be generated by hand.
The constraint files within a GMC directory are all variations on the nmrPipe table format. For example, the table of covalent bond length constraints ("bonds.tab") looks like:
VARS SEGNAME_I RESNAME_I RESID_I ATOMNAME_I SEGNAME_J RESNAME_J RESID_J ATOMNAME_J D FC
FORMAT %8s %3s %4d %4s %8s %3s %4d %4s %8.3f %8.3f
UBIQ MET 1 C UBIQ MET 1 O 1.231 1000.000
UBIQ MET 1 CA UBIQ MET 1 C 1.525 1000.000
UBIQ MET 1 CA UBIQ MET 1 CB 1.530 1000.000
UBIQ MET 1 CA UBIQ MET 1 HA 1.080 1000.000
UBIQ MET 1 CB UBIQ MET 1 CG 1.530 1000.000
The first two lines identify the contents of each data column. The two atoms involved are identified by their segment name, residue name, residue number, and atom name. Other information relevant to the constraint (the equilibrium bond length in Ångstroms and the force constant in kcal/mol/Å2) are defined in the last two columns.
Similar formats are used to define the other covalent geometry restraints in atoms.tab, angles.tab, and impropers.tab. Since these are generated automatically by the GMC Editor, they rarely need to be examined or edited.
The experimental restraints tables follow a similar format. For example, the torsion angle restraint table looks like
VARS INDEX SEGNAME_I RESID_I RESNAME_I ATOMNAME_I SEGNAME_J RESID_J RESNAME_J ATOMNAME_J SEGNAME_K RESID_K RESNAME_K ATOMNAME_K SEGNAME_L RESID_L RESNAME_L ATOMNAME_L ANGLE_LO ANGLE_HI FC FORMAT %4d %4d %4s %4s %4s %4d %4s %4s %4s %4d %4s %4s %4s %4d %4s %4s %4s %8.3f %8.3f %8.3f 1 UBIQ 1 MET C UBIQ 2 GLN N UBIQ 2 GLN CA UBIQ 2 GLN C -138.00 -66.00 1.0 2 UBIQ 2 GLN C UBIQ 3 ILE N UBIQ 3 ILE CA UBIQ 3 ILE C -173.00 -109.00 1.0 3 UBIQ 3 ILE C UBIQ 4 PHE N UBIQ 4 PHE CA UBIQ 4 PHE C -149.00 -93.00 1.0 4 UBIQ 4 PHE C UBIQ 5 VAL N UBIQ 5 VAL CA UBIQ 5 VAL C -163.00 -75.00 1.0 5 UBIQ 5 VAL C UBIQ 6 LYS N UBIQ 6 LYS CA UBIQ 6 LYS C -133.00 -85.00 1.0 6 UBIQ 6 LYS C UBIQ 7 THR N UBIQ 7 THR CA UBIQ 7 THR C -142.00 -58.00 1.0 7 UBIQ 7 THR C UBIQ 8 LEU N UBIQ 8 LEU CA UBIQ 8 LEU C -82.00 -42.00 1.0
In this file, an "index" column has been added to the beginning of each line to identify each constraint for later evaluation. Sixteen columns are needed to specify the segment name, residue number, residue name, and atom name of each of the four atoms that define the torsion angle. Finally, two columns give the lower and upper bounds on the angle (in degrees), followed by a force constant (in kcal/mol/rad2).
The 3J couplings restraints table has a format similar to the torsion angle restraints:
VARS INDEX SEGNAME_I RESID_I RESNAME_I ATOMNAME_I SEGNAME_J RESID_J RESNAME_J ATOMNAME_J SEGNAME_K RESID_K RESNAME_K ATOMNAME_K SEGNAME_L RESID_L RESNAME_L ATOMNAME_L A B C PHASE OBSJ FC FORMAT %3d %5d %4s %4s %4s %5d %4s %4s %4s %5d %4s %4s %4s %5d %4s %4s %4s %9.5f %9.5f %9.5f %9.5f %9.3f %.2f 1 UBIQ 1 MET C UBIQ 2 GLN N UBIQ 2 GLN CA UBIQ 2 GLN C 1.74000 -0.57000 0.25000 240.00000 2.30 1.0 2 UBIQ 3 ILE C UBIQ 4 PHE N UBIQ 4 PHE CA UBIQ 4 PHE C 1.74000 -0.57000 0.25000 240.00000 1.30 1.0 3 UBIQ 4 PHE C UBIQ 5 VAL N UBIQ 5 VAL CA UBIQ 5 VAL C 1.74000 -0.57000 0.25000 240.00000 0.80 1.0 4 UBIQ 5 VAL C UBIQ 6 LYS N UBIQ 6 LYS CA UBIQ 6 LYS C 1.74000 -0.57000 0.25000 240.00000 1.60 1.0
Once again, the first column defines an "index" number for each constraint. The next sixteen columns select the four atoms involved in the torsion angle. Three more columns define the Karplus parameters A, B, and C for that 3J coupling (where Jcalc = A cos2 theta + B cos theta + C). The next column allows a phase correction (in degrees) to be applied to the Karplus calculation. This phase correction is necessary in cases where the atoms selected to define the torsion angle (in the above example, C..N..Ca..C atoms are used to define a backbone phi angle) are different from the atoms involved in creating the observed coupling (in this case, HN..Ha). Finally, the observed coupling constant (in Hz) is defined, together with a force constant (in kcal/mol/Hz2).
The format of the dipolar couplings restraints file is:
VARS SEGNAME_I RESID_I RESNAME_I ATOMNAME_I SEGNAME_J RESID_J RESNAME_J ATOMNAME_J D DD W FORMAT %5d %6s %6s %5d %6s %6s %9.3f %9.3f %.2f UBIQ 1 MET C UBIQ 2 GLN HN 3.993 0.333 3.00 UBIQ 2 GLN C UBIQ 3 ILE HN -5.646 0.333 3.00 UBIQ 3 ILE C UBIQ 4 PHE HN 1.041 0.333 3.00 UBIQ 4 PHE C UBIQ 5 VAL HN 0.835 0.333 3.00
The first eight columns select the two atoms involved in the dipolar coupling. The next column gives the observed value of the coupling (in Hz). The next column gives an indication of the possible range of values for this coupling, relative to the HN..N coupling. The final column defines a force constant (in kcal/mol/Hz2).
The NOE table format is somewhat more complex than the other experimental restraints tables:
VARS INDEX GROUP RESID_I RESNAME_I ATOMNAME_I SEGNAME_I RESID_J RESNAME_J ATOMNAME_J SEGNAME_J D_LO D_HI FC W S FORMAT %4d %3d %5d %6s %6s %4s %5d %6s %6s %4s %9.3f %9.3f %.2f %.2f %.2f 1 1 UBIQ MET 1 HA UBIQ MET 1 HG1 1.800 4.459 1.0 1.0 1.0 1 2 UBIQ MET 1 HA UBIQ MET 1 HG2 1.800 4.459 1.0 1.0 1.0 2 1 UBIQ MET 1 HA UBIQ MET 1 HG1 1.800 4.060 1.0 1.0 1.0 2 2 UBIQ MET 1 HA UBIQ MET 1 HG2 1.800 4.060 1.0 1.0 1.0 3 1 UBIQ MET 1 HB2 UBIQ MET 1 HE1 1.800 3.565 1.0 1.0 1.0 3 1 UBIQ MET 1 HB2 UBIQ MET 1 HE2 1.800 3.565 1.0 1.0 1.0 3 1 UBIQ MET 1 HB2 UBIQ MET 1 HE3 1.800 3.565 1.0 1.0 1.0 4 1 UBIQ MET 1 HE1 UBIQ MET 1 HG1 1.800 3.155 1.0 1.0 1.0 4 1 UBIQ MET 1 HE2 UBIQ MET 1 HG1 1.800 3.155 1.0 1.0 1.0 4 1 UBIQ MET 1 HE3 UBIQ MET 1 HG1 1.800 3.155 1.0 1.0 1.0 4 2 UBIQ MET 1 HE1 UBIQ MET 1 HG2 1.800 3.155 1.0 1.0 1.0 4 2 UBIQ MET 1 HE2 UBIQ MET 1 HG2 1.800 3.155 1.0 1.0 1.0 4 2 UBIQ MET 1 HE3 UBIQ MET 1 HG2 1.800 3.155 1.0 1.0 1.0
As in the 3J coupling restraint file, the first column is an index number to identify a particular constraint. Columns 3-10 identify two atoms whose distance will be used in calculating the NOE energy. The next two columns define that restraint's lower and upper distance bounds (in Ångstroms), and the last column sets a force constant for that restraint (in kcal/mol/Å2).
But there are two differences in the NOE restraints file's format. The first is that more than one line of the file can be given the same index number. Having more than one line with the same index number tells dynamo that these lines are all part of a single NOE constraint. (The meaning of that statement will become clear momentarily.) The second difference in the NOE restraints file format is the addition of the "group" column. Like the "index" column, the group column contains an integer. It is used in combination with the index column to define different possible assignments for a particular NOE restraint. Lines in the NOE restraints file which have the same index number and the same group number are treated as part of a single possible assignment for an NOE. The interatomic distances from each line's atoms are averaged together using an R-6 sum average. Lines (or groups of lines) with the same index number but different group numbers are treated as separate possible assignments for a given NOE.
Once you've created a GMC file, you'll need to add your experimental constraints tables to it (see sections 2A and 2B for their names and formats). You can type them in with an ordinary text editor. Section 3? will describe how to automatically convert xplor constraints tables into their dynamo equivalents.
Once you have a GMC directory that contains all the experimental constraints you want, it's time to calculate a structure. Structure calculations in dynamo are controlled by short TCL scripts. These automatically create a graphical interface to let you follow the progress of the calculation. We'll discuss the details of the TCL files that calculate structures in section 4.
To run an existing structure calculation script, just type its filename. For example, (NMRBASE)/dynamo/demos/ubiq/ubiqSA.tcl starts the script and brings up the simulated annealing GUI:
The current time step number is reported in the title bar of the window. The panels in the left column report the current temperature (in K), total potential energy (in kcal/mol), total energy gradient (in kcal/mol/Å), the RMS difference between observed and calculated dipolar couplings (in Hz), the RMS van der Waals overlap (calculated only for those vdW pairs that are overlapped at all and expressed in Å), and the RMS difference between the actual and expected covalent bond lengths (in Å). The panels in the right column graph the RMS bond angle violations (in degrees), the RMS improper torsion angle violations (in degrees), the RMS experimental torsion angle violations (in degrees), the RMS 3J coupling violations (in Hz), the RMS NOE distance violation (in Å), and the RMS radius of gyration violation (in Å).
Looking at the resulting graphs of temperatures and violations, several features of dynamo's standard simulated annealing schedule can be discovered. Looking at the temperature graph, it is clear that the annealing schedule has three phases. The initial phase is about 500 dynamics steps long, with very tight control over the system's temperature (of 4000K). During this phase, the structure's agreement with covalent bond length, bond angle, and improper torsion angle restraints all undergo significant improvements. The van der Waals overlap, 3J coupling, and dipolar coupling violations appear to be zero, but this actually indicates that the vdW, 3J, and dipolar coupling energy terms are turned off completely during this phase.
The second phase begins with a brief spike in the temperature, which then returns to roughly the starting point. This phase continues for a longer period--roughly 2000 dynamics steps. During this phase, the NOE restraints continue to improve, but few other structural features change significantly. The van der Waals, 3J, and dipolar coupling terms continue to be turned off. The radius of gyration shows several oscillations.
Finally, the last and longest phase begins. The 3J and dipolar coupling violations and van der Waals overlap begin to be reported and show rapid improvements in their agreement. The improper torsion angles show a sudden improvement in their agreement. As the temperature cools from 4000K down to 0, the calculated values of the experimental torsion angle restraints and 3J couplings finally come into agreement with their target values.
So why does dynamo's simulated annealing schedule behave in this manner? When the trajectory begins from a randomized PDB file (as it does in this example), the "crunched up" covalent structure is like a tightly stretched spring, storing a large amount of potential energy. Starting free dynamics from this structure would cause its potential energy to be converted to kinetic energy, sending the system's temperature to millions of kelvins. By maintaining absolute control over the system's temperature during this initial regularization phase, the covalent geometry can be quickly improved without blowing up the dynamics trajectory. The van der Waals energy is turned off during this phase to improve conformational sampling by allowing atoms to move through each other. Because the 3J and dipolar couplings are so sensitive to the correct structure, they offer little guidance at this early stage, when the coordinates are far from correct.
In the second, high temperature phase, the structure undergoes large motions as seen in the NOE and radius of gyration violations graphs. The structural collapse that began during the regularization phase continues rapidly. Because there is no van der Waals energy, dynamo attempts to satisfy all the NOE distance constraints simultaneously simply by squashing all the atoms on top of each other. The radius of gyration of this overlapped structure is obviously too small, and the radius of gyration restraints attempt to wrench it open against the pull of the NOEs. This is an important process, since it ensures that the structure is globally correct (as shown by the low NOE violations at the end of the high temperature phase) without being too small (as shown by the reduced radius of gyration violations at the end of the phase). The structure remains too far from correct for the 3J or dipolar couplings to do much good, so they remain turned off.
In the final phase, the details of the structure are finally sorted out. The temperature gradually drops, preventing the system from jumping out of the globally correct region that was found at the end of the high temperature phase. The van der Waals overlap energy is turned on gradually, eliminating atomic overlap without causing undue spikes in temperature. The force constant on the improper torsion angle restraints is switched to its final value, resulting in a sudden improvement in the impropers' agreement. 3J and dipolar couplings are finally usable as structural constraints. By the end of the annealing schedule, the system's agreement with all of its restraints--experimental and a priori--is usually excellent.
#!/bin/csh set pdbName = ../ubiq.gmc/random.pdb talos2dyn.tcl -in talosPred.tab -pdb $pdbName -seg UBIQ > torsions.tab jConvert.tcl -in ubiq_expt_jcoup.tbl -pdb $pdbName -seg UBIQ > jcoup.tab noeConvert.tcl -in ubiq_complete_noe.tbl -pdb $pdbName -seg UBIQ > noes.tab dcConvert.tcl -in dipolars_expt_A.tbl -pdb $pdbName -seg UBIQ > dObsA.tab dcConvert.tcl -in dipolars_expt_B.tbl -pdb $pdbName -seg UBIQ > dObsB.tab torsionConvert.tcl -in expt_torsions.tbl -pdb $pdbName -seg UBIQ > torsions.tab
The purpose of each line is self-evident. Each one reads in a particular experimental constraint file in xplor format (except for talos2dyn, which reads talos output). Each also reads a corresponding PDB file so that the script knows the appropriate sequences. Since many xplor users never define a segment name in single-domain structure calculations, a default segment name can be defined. Finally, the converted restraints are written out to dynamo tables with the specific names listed in section 2B.
The radius of gyration restraints file generated by the GMC editor often requires editing by hand. The proper application of the radius of gyration restraint is described fully in J. Am. Chem. Soc. 121, 2337 (1999). Briefly, there are several rules to follow in defining radius of gyration restraints. First, disordered residues should not be included in a radius of gyration restraint. Second, non-globular structures should be broken up into several roughly globular sections, each with an independent Rgyr restraint. Third, nucleic acids should not have their own radius of gyration restraints. But, fourth, structures of complexes (protein/DNA or protein/protein) should have intermolecular radius of gyration restraints to ensure close packing of interfaces.
Create radius of gyration restraints as appropriate for your structure, following the guidelines above. The file format is described in section 2C.
Here is an example TCL script that calculates a single structure of ubiquitin (from (NMRBASE)/dynamo/demos/ubiq/ubiqSA.tcl):
#!/bin/sh
# The next line restarts using nmrWish \
exec nmrWish "$0" -- "$@"
set auto_path "[split $env(TCLPATH) :] $auto_path"
set auto_path [linsert $auto_path 0 $env(NMRBASE)/dynamo/tcl]
set gmcDir ubiq.gmc
set inName random.pdb
set outName dyn.pdb
set randSeed 54321
#
#-------------------------------------------------------------
dynInit
showDynLogo
readGMC $gmcDir $inName gmc
srand $randSeed
simulateAnnealing -gmcName gmc \
-initTemperature 4000 -numInitSteps 500 \
-highTemperature 4000 -numHighSteps 2000 \
-coolStartTemperature 4000 -coolEndTemperature 0 -numCoolSteps 12000 \
-printEnergyEvery 10 -graph
dynWrite -pdb -src $gmc(pdb) -out $gmcDir/$outName -rem $dynEnergyText
showDynDone
#exit
The first six lines don't need to be modified. The next set of lines define TCL variables that contain some basic information for dynamo: the name of the GMC directory, the name of the PDB file that will provide the starting coordinates, the name of the output PDB file, and a random number seed. You'll need to modify these lines appropriately. Note that in this example, we're beginning from the randomized coordinates that are automatically generated by the GMC editor (in the file random.pdb).
Since we've specified the complete set of constraints (via the GMC file name), the starting coordinates, and the random number seed, this TCL script will generate exactly the same output coordinates each time it's run.
The TCL code below the dividing line actually performs the calculation. You can modify it to change, for example, the number of structures to be calculated or the details of the annealing schedule. But you don't have to--the standard code works almost all the time without modification.
The first two commands are trivial--they turn on dynamo and show off its logo in a GUI window.
The third command calls a TCL procedure named readGMC (defined in (NMRBASE)/dynamo/tcl/read_GMC.tcl). Given the name of the GMC directory, it looks for each of the standard files within it (listed in section 2A) and reads them in turn. It records the results into a TCL array (called 'gmc' in this case).
The fourth command simply initializes the random number generator with the value given in the randSeed variable.
The fifth command is the heart of the structure calculation. It calls a TCL procedure named simulateAnnealing (defined in (NMRBASE)/dynamo/tcl/simulate_annealing.tcl). Given the parsed results of readGMC (including the starting coordinates) and the correctly initialized random number generator, it applies dynamo's standard three-phase annealing protocol to solve the structure. The protocol is outlined in section 2D. The temperature and length of each phase can be defined with flags to the simulateAnnealing command, as shown in this example. The final two flags to simulateAnnealing define how often the procedure should print out the current temperature, potential energy, and violations (in this example, these statistics will be printed every 10 dynamics steps) and indicate that the results should also be graphed in the simulated annealing GUI (section 2D).
The sixth command writes out the final coordinates to a PDB file. The dynWrite command is given flags that indicate the type of file to write (-pdb), the location of the final coordiantes (-src $gmc(pdb)), the name of the file (-out $gmcDir/$outName), and some text to be placed in REMARKS lines at the head of the file (-rem $dynEnergyText). This last flag uses a variable defined automatically during the simulatedAnnealing command, which contains the text of the final dynamics statistics printout.
The final command brings up a dialog box to remind you that the calculation is finished.
The example protocol shown above can be easily modified. For example, to calculate several structures, have a look at (NMRBASE)/dynamo/demos/ubiq/ubiqSAN.tcl:
#!/bin/sh
# The next line restarts using nmrWish \
exec nmrWish "$0" -- "$@"
set auto_path "[split $env(TCLPATH) :] $auto_path"
set auto_path [linsert $auto_path 0 $env(NMRBASE)/dynamo/tcl]
set gmcDir ubiq.gmc
set inName random.pdb
set outTemplate dynSAN%03d.pdb
set iseed 54321
set count 10
#
#-------------------------------------------------------------
dynInit
showDynLogo
readGMC $gmcDir $inName gmc
for {set i 1} {$i <= $count} {incr i} \
{
srand $iseed
simulateAnnealing -gmcName gmc \
-initTemperature 4000 -numInitSteps 500 \
-highTemperature 4000 -numHighSteps 2000 \
-coolStartTemperature 4000 -coolEndTemperature 0 -numCoolSteps 12000 \
-printEnergyEvery 10 -graph
set outName [format $gmcDir/$outTemplate $i]
dynWrite -pdb -src $gmc(pdb) -out $outName -rem $dynEnergyText
dynRead -pdb -src $gmc(pdb) -in $gmcDir/$inName
incr iseed 100
}
showDynDone
#exit
The script is nearly identical to the single-structure example shown before. The first
difference is that the output file name is replaced with an output file name template.
The number of structures to be calculated is defined by the count variable, which will
also be used to generate the output filenames. The single call to readGMC remains, but
the calls to simulateAnnealing and dynWrite are enclosed in a loop. After writing out
each iteration's PDB file, the original starting coordinates are re-read using the
dynRead command, so that each structure calculation begins at the same point. The
random number seed, though, is modified for each iteration, so that different coordinates
are produced each time.
#!/bin/csh set goodPDBList = (`noeLimit.com 0.1 ubiq.gmc/dynSAN*.pdb`) dynAvg.tcl -align None -r1 2 -rN 72 -in $goodPDBList -out ubiq.gmc/avg.pdb
The first line runs noeLimit.com, a script that evaluates the RMS NOE violations for each of several PDB files, and returns the names of those whose violations are less than a particular cutoff value (0.1 A in this case).
The second line runs dynAvg.tcl, which is given a list of converged PDB file names and a region of residue numbers to be included in the best fitting. It outputs the geometric average coordinates of these files into avg.pdb.
Because this average structure can have "crunched up" covalent geometry in disordered regions, the last step in creating an average structure is to refine it gently. This is accomplished by another simple modification of the ubiqSA.tcl script, called (NMRBASE)/dynamo/demos/ubiq/ubiqRefineAvg.tcl:
#!/bin/sh
# The next line restarts using nmrWish \
exec nmrWish "$0" -- "$@"
set auto_path "[split $env(TCLPATH) :] $auto_path"
set auto_path [linsert $auto_path 0 $env(NMRBASE)/dynamo/tcl]
set gmcDir ubiq.gmc
set inName avg.pdb
set outName refined_avg.pdb
set randSeed 54321
#
#-------------------------------------------------------------
dynInit
showDynLogo
readGMC $gmcDir $inName gmc
srand $randSeed
simulateAnnealing \
-gmcName gmc \
-initTemperature 300 -numInitSteps 50 \
-highTemperature 300 -numHighSteps 0 \
-coolStartTemperature 300 -coolEndTemperature 0 -numCoolSteps 1000 \
-printEnergyEvery 10 -graph -constantRgyr
dynWrite -pdb -src $gmc(pdb) -out $gmcDir/$outName -rem $dynEnergyText
showDynDone
#exit
This file is nearly identical to the original ubiqSA.tcl. The only changes are to the starting coordinate file name (avg.pdb, produced by ubiqAvgPDB.com), the output file name, and the temperature and duration of the annealing phases. Because this structure begins with an accurate set of coordinates, we don't want dynamo to move the coordinates too far. Therefore, we set the starting temperature for the various phases to only 300K, and reduce the length of each as well.
The final difference is the -constantRgyr flag to the simulateAnnealing procedure. Since this structure is very accurate to begin with, the radius of gyration restraints do not have to be used to counteract inappropriate overlap during the high temperature phase (see section 2D). This flag informs the simulateAnnealing procedure of this fact and adjusts it accordingly.
Check the minimized average structures' agreement with the x-ray crystal structure:
rms.tcl 1ubiq.pdb ubiq.gmc/refined_avg.pdb 2 72rms.tcl compares two PDB files of the same structure, using only those residues between (in this case) positions 2 and 72 in the best fitting and RMSD calculation.
Dynamo's GMC editor contains knowledge of all 20 standard amino acid residues, all four standard DNA bases, and all four standard RNA bases. Many users, though, work with systems that contain non-natural residues. Fortunately, dynamo is easily extended to work with new residue types you can define.
Take a look at (NMRBASE)/dynamo/tcl/params_protein.tcl It is a long list of TCL procedures, along the lines of
proc add_ala {segmentName resNum} {
add_Atom $segmentName ALA $resNum C 12.0 0.0903 3.2072 0.48
add_Atom $segmentName ALA $resNum CA 12.0 0.0903 3.2072 0.22
add_Atom $segmentName ALA $resNum CB 12.0 0.0903 3.2072 -0.30
add_Atom $segmentName ALA $resNum HA 1.0 0.0045 2.6157 0.10
add_Atom $segmentName ALA $resNum HB1 1.0 0.0045 2.6157 0.10
add_Atom $segmentName ALA $resNum HB2 1.0 0.0045 2.6157 0.10
add_Atom $segmentName ALA $resNum HB3 1.0 0.0045 2.6157 0.10
add_Atom $segmentName ALA $resNum HN 1.0 0.0498 1.4254 0.26
add_Atom $segmentName ALA $resNum N 14.0 0.1592 2.7618 -0.10
add_Atom $segmentName ALA $resNum O 16.0 0.2342 2.6406 -0.48
add_Bond_Intrares $segmentName ALA $resNum C O 1.231 1000.000
add_Bond_Intrares $segmentName ALA $resNum CA C 1.525 1000.000
add_Bond_Intrares $segmentName ALA $resNum CA CB 1.521 1000.000
add_Bond_Intrares $segmentName ALA $resNum CA HA 1.080 1000.000
add_Bond_Intrares $segmentName ALA $resNum CB HB1 1.080 1000.000
add_Bond_Intrares $segmentName ALA $resNum CB HB2 1.080 1000.000
add_Bond_Intrares $segmentName ALA $resNum CB HB3 1.080 1000.000
add_Bond_Intrares $segmentName ALA $resNum N CA 1.458 1000.000
add_Bond_Intrares $segmentName ALA $resNum N HN 0.98 1000.0
add_Angle_Intrares $segmentName ALA $resNum CA C O 120.80 500.000
add_Angle_Intrares $segmentName ALA $resNum CA CB HB1 109.500 500.000
add_Angle_Intrares $segmentName ALA $resNum CA CB HB2 109.500 500.000
add_Angle_Intrares $segmentName ALA $resNum CA CB HB3 109.500 500.000
add_Angle_Intrares $segmentName ALA $resNum CB CA C 110.500 500.000
add_Angle_Intrares $segmentName ALA $resNum HA CA C 109.500 500.000
add_Angle_Intrares $segmentName ALA $resNum HA CA CB 109.500 500.000
add_Angle_Intrares $segmentName ALA $resNum HB1 CB HB2 109.500 500.000
add_Angle_Intrares $segmentName ALA $resNum HB1 CB HB3 109.500 500.000
add_Angle_Intrares $segmentName ALA $resNum HB2 CB HB3 109.500 500.000
add_Angle_Intrares $segmentName ALA $resNum HN N CA 120.00 500.0
add_Angle_Intrares $segmentName ALA $resNum N CA C 111.200 500.000
add_Angle_Intrares $segmentName ALA $resNum N CA CB 110.400 500.000
add_Angle_Intrares $segmentName ALA $resNum N CA HA 109.500 500.000
add_Improper_Intrares $segmentName ALA $resNum HA N C CB 65.977 500.000
add_Improper_Intrares $segmentName ALA $resNum HB1 HB2 CA HB3 -66.514 500.000
add_Delphic_Positioning_System_Intrares $segmentName ALA $resNum CB CA N ALA
add_Delphic_Positioned_Atom $segmentName ALA $resNum CB ALA_CB
add_Delphic_Positioned_Atom $segmentName ALA $resNum CA ALA_CA
add_Delphic_Positioned_Atom $segmentName ALA $resNum N ALA_N
add_Delphic_Positioned_Atom $segmentName ALA $resNum O PEP_O
add_Delphic_Positioned_Atom $segmentName ALA $resNum C PEP_C
add_Delphic_Positioned_Atom $segmentName ALA $resNum N PEP_N
}
This procedure is given the name of a segment and residue position into which this residue is to be added.
The purpose of each line is self-evident. Each atom in this residue needs to be created with a call to another TCL procedure called add_Atom. Each covalent bond length restraint is created by a call to add_Bond. Each bond angle restraint is created by a call to add_Angle, and each improper torsion angle is created by a call to add_Improper. Since DELPHIC terms are only available for the naturally occuring residue types, calls to add_Delphic... routines should be deleted when you create a new residue type.
The process of creating a new residue type entails creating a new TCL procedure similar to add_Ala. Since most non-natural residue types are very similar to one or more existing residue types, it's usually best to begin by copying an appropriate add_(residue type) procedure. Rename it, but keep it in the same file: params_protein.tcl for protein residue types, params_dna_rna.tcl for nucleic acids, or params_other.tcl as appropriate.
Modify your new TCL procedure to add or delete atoms, bonds, bond angles, and impropers as appropriate. Be sure to remove all the calls to the add_Delphic... procedures, since DELPHIC terms are only available for naturally-occuring residue types.
Appropriate values for bond lengths, etc., can often be guessed by examining other add_(residue) TCL procedures, or by examining the interatomic distances and angles in a crystal structure that contains your new residue type.
proc parameter_Index {kind} {
set result {}
set kind [string tolower $kind]
# protein residue types
if {($kind == "ala") || ($kind == "protein") || ($kind == "all")} {
lappend result {ala protein add_ala}
}
if {($kind == "arg") || ($kind == "protein") || ($kind == "all")} {
lappend result {arg protein add_arg}
}
if {($kind == "asn") || ($kind == "protein") || ($kind == "all")} {
lappend result {asn protein add_asn}
}
and so forth.
This is the place where the GMC Editor looks for information about the residue types it knows about. Each residue type has a similar bit of code. If the procedure is asked for information about that specific residue type, or about all residues of its class (protein, DNA, RNA, or other), or if the procedure is asked for information about all residue types, a TCL list is added to the result. It contains three pieces of information: The name of the residue (which will be used to label the appropriate button in the GUI), the type of segment it can enter into, and the name of the TCL procedure that can create one of these residues.
Modify parameter_index.tcl appropriately to add the information about your new residue type.
Remain in the (NMRBASE)/dynamo/tcl directory, and type
index.tcl
at the prompt.
Open the GMC Editor, and a button for your new residue type should appear in the appropriate section of the GUI. Go ahead and try building a structure with your new residue type.