NMRPipe Table
HHS logo niddk logo

NMRPipe Table Format

Many facilities in NMRPipe use input or produce output in the form of a multi-column text table. Examples include peak tables, chemical shift tables, and tables of dipolar couplings. NMRPipe has its own general-purpose table format, along with several tools for manipulating such tables. In NMRPipe documentation, this format is sometimes refered to as the Generic Database format, or GDB.

The NMRPipe GDB Table format is a text file which contains header information and one or more columns of data:

An example NMRPipe GDB format table follows, in this case a table of protein backbone random coil chemical shifts:

   

   # Random Coil Chemical Shifts

   VARS   RESNAME HA_PPM C_PPM CA_PPM CB_PPM N_PPM   
   FORMAT %s      %7.1f  %7.1f %7.1f  %7.1f  %7.1f
   
    ALA      4.32   177.8   52.3   19.0   123.8  
    CYS      4.55   174.6   56.9   28.9   118.8
    cys      4.71   174.6   55.4   43.7   118.6
    ASP      4.64   176.3   54.0   40.8   120.4
    GLU      4.35   176.6   56.4   29.7   120.2
    PHE      4.62   175.8   58.0   39.0   120.3
    GLY      3.96   174.9   45.1 9999.0   108.8
    HIS      4.73   173.3   54.5   27.9   118.2
    HIH      4.73   173.3   53.3   28.5   118.2
    ILE      4.17   176.4   61.3   38.0   119.9
    LYS      4.32   176.6   56.5   32.5   120.4
    LEU      4.34   177.6   55.1   42.3   121.8
    MET      4.48   176.3   55.3   32.6   119.6
    ASN      4.74   175.2   52.8   37.9   118.7
    PRO      4.42   177.3   63.1   31.7   135.8 
    GLN      4.34   176.0   56.1   28.4   119.8
    ARG      4.34   176.3   56.1   30.3   120.5
    SER      4.47   174.6   58.2   63.2   115.7
    THR      4.35   174.7   62.1   69.2   113.6
    VAL      4.12   176.3   62.3   32.1   119.2
    TRP      4.66   176.1   57.7   30.3   121.3
    TYR      4.55   175.9   58.1   38.8   120.3

Missing Values

As noted above, since columns are space-separated, no missing or all-blank values are allowed. Instead, place-holder values are used to take the place of missing values, such as "9999.0" for a missing chemical shift value in the table above. How missing values are handled depends on the application which is using the table. The GDB table format allows for the keyword (Null) to be used in place of a missing value, but not every application which uses the table format will accept this.

FORMAT Specifiers

As mentioned above, the FORMAT line uses format specifiers to define the general type of data in a column. The format specifiers are adapted from those used by the UNIX/C formated printing functions such as printf: %d for integers, %f or %e for floating-point values, and %s for text strings.

Importantly, since format specifiers can effect how table values are manipulated in a script, format specifiers for floating point values should be always include sufficient precision for the given data.

Note that values in an input table do not need to match the details of the format specifier, they only need to match the general type, either text string, integer, or floating point value. Specifically:

Some example format specifiers follow.

Format Type Example Meaning
%d Integer 16 Any integer, with leading minus sign if needed.
%4d Integer 16 Any integer, with leading minus sign if needed, output as four characters or more.
%-4d Integer 16 Any integer, output as four characters or more, left justified.
%04d Integer 0016 Any integer, zero-padded on the left to four characters.
%f Float  3.276400 Any Floating Point Number, with a leading minus sign if needed.
%+f Float +3.276400 Any Floating Point Number, always including a leading plus or minus sign.
%.2f Float 3.28 Any Floating Point Number, with a leading minus sign if needed, with two places after the decimal.
%6.2f Float 3.28 Any Floating Point Number, with two places after the decimal, output as six characters or more.
%e Float 1.200352e+02 Any Floating Point Number, in scientific notation.
%.2e Float 1.20e+02 Any Floating Point Number, with two places after the decimal, in scientific notation.
%s String GLY Any Text String.
%4s String GLY Any Text String, output as four characters or more, right justified.
%-4s String GLY Any Text String, output as four characters or more, left-justified.
 

Specification of Atom Names

Many NMRPipe applications use information associated with one or more atoms. For example, in a chemical shift table, each chemical shift entry corresponds to a particular atom. In a dipolar coupling table, each dipolar coupling corresponds to a pair of atoms I and J. And, in a J-coupling table, each J-coupling is associated with a torsion, specfied by four atoms I, J, K, and L.

In the NMRPipe table format, a given atom is identified according to a residue ID, residue name, and atom name. These correspond to table variable names RESID RESNAME and ATOMNAME. Values for RESNAME and ATOMNAME will generally be treated as case-sensitive, but are usually all-upper-case. In case of systems with more than one chain or molecule, an atom specifcation can include an optional chain ID or segment ID. These correspond to table variable names CHAINNAME and SEGNAME.

In the case of entries which identify two or more atoms, the variable names will have a post-fix such as _I _J _K or _L. For example, each dipolar coupling entry specifies two atoms I and J, so each entry must include columns for RESID_I ATOMNAME_I RESNAME_I and RESID_J ATOMNAME_J RESNAME_J, and might also include columns for CHAINNAME_I and CHAINNAME_J or SEGNAME_I and SEGNAME_J.

Specification of Amino Acid Sequence

Many NMRPipe applications, such as TALOS, use input data specifically for proteins. Many of these applications require that complete amino acid sequence information is included in an input table. This is commonly done by including a DATA FIRST_RESID and one or more DATA SEQUENCE lines, as shown in this chemical shift table.

The FIRST_RESID line gives the starting residue number of the sequence, which is assumed to be 1 if no FIRST_RESID line is given. The DATA SEQUENCE lines give the amino acid sequence in single-letter codes, with the code X commonly used for non-standard amino acids. In may examples, the amino acid codes are specified in groups of 10 for clarity. In practice, space characters in the amino acid sequence are ignored, so the amino acid codes can be grouped in any way, and any number of codes can be given in one DATA SEQUENCE line. For convenience, the NMRPipe table utility getTabInfo.tcl can create DATA SEQUENCE text from the sequence of a PDB structure. For example, the command:

   getTabInfo.tcl -in 1UBQ.pdb -seqText
will produce output like this:
   DATA FIRST_RESID 1

   DATA SEQUENCE MQIFVKTLTG KTITLEVEPS DTIENVKAKI QDKEGIPPDQ QRLIFAGKQL
   DATA SEQUENCE EDGRTLSDYN IQKESTLHLV LRLRGG

Applications for Manipulating Tables

NMRPipe includes several general-purpose applications for manipulating GDB-Format tables. There are applications to display tables, plot data from tables, sort and adjust table values, select a subset of entries according to a condition, and extract table values for use with other scripts.

Application Use
delTab.tcl Delete randomly or systematically selected rows.
diffTab.tcl Form the difference between values in two related tables.
plotTab.tcl Draw an XY Plot of two columns in a table.
svdTab.tcl Perform linear least squares on columns of a table.
addTabNoise.tcl Add random noise to data in a table.
fitTab.tcl Apply a fitting function to data from a table.
addTabVar.tcl Add new columns (variables) to a table.
getTabCol.tcl Get the values from a given table column.
selectTab.tcl Select and save table entries according to a condition.
adjTab.tcl Adjust or set values in a table.
getTabInfo.tcl Get information about a table.
appendTab.tcl Join two related tables.
getTabRow.tcl Get the values from a given table row.
showTab.tcl Display a table in an interactive viewer.
For example, to extract and print all the chemical shift values in a chemical shift table:
   selectTab.tcl -in csObs.tab -var SHIFT
To extract shifts as above, for residues 10 to 20, alone:
   selectTab.tcl -in csObs.tab -var SHIFT -cond "RESID >= 10 && RESID <= 20" 
To extract only HN shifts:
   selectTab.tcl -in csObs.tab -var SHIFT -cond "strmatch( ATOMNAME, 'HN' )"
To list the variable names in a table:
   getTabInfo.tcl -in dCalcA.tab -parm varNames
To extract the DATA SAUPE values in a dipolar coupling calculation output:
   getTabInfo.tcl -in dCalcA.tab -key SAUPE -data 0
To add a new column of floating point data called W to an existing table, with an initial value of 100.0 in each row:
   addTabVar.tcl -in dc.tab -out dcw.tab -var W -float -fmt %7.3f -val 100.0 
Catenate two PDB files, and renumber their atom ID values:
   cat a.pdb b.pdb > ab.pdb
   adjTab.tcl -in ab.pdb -out new.pdb -pdb -renumber
Extract the first residue ID and the one-letter amino acid sequence from a protein PDB file:
   getTabInfo.tcl -in ref.pdb -seqInfo
Display protein sequence information from a PDB file and print it in the DATA SEQUENCE format used in NMRPipe tables:
   getTabInfo.tcl -in ref.pdb -seqText

PDB Files

As shown in the previous examples, Many of the table manipulation applications above can accept input in the form of a PDB file. In this case, the variable names used to access the table are automatically set as if the PDB file had VARS and FORMAT lines. Since a PDB file is not space-delimited, data is extracted according to a range of character positions, where the first character in a line is position 1:

Variable NameFormatCharacter Range
ATOMID %d  7 - 11
ATOMNAME %s 13 - 16
LOCID %s 17 - 17
RESNAME %s 18 - 21
CHAINID %s 22 - 22
RESID %d 23 - 26
ICODE %s 27 - 27
X %.3f 31 - 38
Y %.3f 39 - 46
Z %.3f 47 - 54
OCCUPANCY %.2f 55 - 60
TEMPFACTOR %.2f 61 - 66
SEGID %s 73 - 76
ELEMENT %s 77 - 78
CHARGE %s 79 - 80

Given the above definitions, it is possible to manipulate a PDB file with NMRPipe's general-purpose table tools. For example, to find the average values of the X Y and Z coordinates of a PDB file, these commands:

  set xc = `getTabCol.tcl -in ref.pdb -pdb -var X`
  set yx = `getTabCol.tcl -in ref.pdb -pdb -var Y`
  set zc = `getTabCol.tcl -in ref.pdb -pdb -var Z`

  getStat.tcl -stat Avg -x $xc
  getStat.tcl -stat Avg -x $yc
  getStat.tcl -stat Avg -x $zc
would produce output like this:
   Avg 30.2337878049
   Avg 28.9899536585
   Avg 15.3499943089


[ Home ] [ NIH ] [ NIDDK ]
last updated:  Dec 6 2011 / big fd