EMBO Logo 2009d

NMRPipe Files and Examples for EMBO NMR Course 2009
[See Also: The Big NMRPipe Reference Page]

Frank Delaglio


NMRPipe Files for EMBO NMR Course 2009

  • When downloading, be sure to retain the exact file name shown here, renaming if needed.
  • Check that the final file size matches the size shown here exactly.

Windows Internet Explorer: Right-Click, "Save Target As"
Linux Mozilla: Left-Click; Use "File/Save Page As" if a file is displayed as a web page.
Mac OS X Safari: Right-Click, "Download Linked File (As)"

[embo2009d.ppt] Lecture Slides (September 2, 2009. 24,829,440 bytes)
[pipedemo.tar] Demo Data Directory (pipe/demo) (September 2, 2009. 933,314,560 bytes)

EMBO Practicum

The NMRPipe installation can be found under the pipe directory of your EMBO computer account. You are encouraged to copy the software and data for your own use.

NMRPipe: Introduction

NMRPipe has its genesis as a spectral processing engine, emphasizing multidimensional NMR applications. The use of NMRPipe is noted in roughly 50% of the NMR structures accepted into the Protein Data Bank (PDB) since 2000. Over the years, NMRPipe has been augmented as part of a plan to provide a set of tools under a common framework for all aspects of biomolecular NMR. The key philosophy is a bottom-up approach to software and application development, where simpler components are combined using standard scripting techniques (here, UNIX C-Shell and TCL) to achieve complex goals. An early focus of the software was flexibility, since protein NMR methods were rapidly changing and expanding, typical protein structure calculation projects took months or even years, and no completely standard protocol was used. Now, computers are fast enough to process 3D spectra in seconds, experimental methods for high-throughput NMR structure determination are available, and NMR structural biology is practiced by those who might not be completely familiar with details of multidimensional signal processing. Also, 1D and 2D spectral series analysis are now common tools for protein-ligand screening and characterization. In response, our current software development includes an emphasis on automation and batch processing, and spectral series analysis.

A complete list of all programs, functions, and scripts in the NMRPipe system is given in The Big NMRPipe Reference Page. Also, most every program and script can be invoked with the -help argument to list the command-line arguments which can be used, for example. If the help text is very long, it can be displayed a page at a time by a UNIX pipeline (in this case, redirecting standard error to the UNIX page-by-page viewing command more):
   nmrPipe -help |& more
In the case of the nmrPipe program itself, help text for a specific function can be displayed by including the -fn option with the -help flag, for example:
   nmrPipe -help -fn FT |& more

Since it is script-based, NMRPipe is highly customizable. The following is a list of some things that the software can do; the sample data and tutorials for this NMR course will include examples of most of these facilities:

Special applications developed using NMRPipe include:

Some General Tips About Spectral Processing

  1. First Order Phase Correction, Folding, and First-point Scaling

    A delay in the acquisition introduces a first-order phase distortion. If there is no delay, no first order phase correction is required. Each delay of 1 dwell (1 point) introduces an additional 360 first order phase correction.

    Many current NMR pulse sequences are designed so that there is no delay in the directly acquired dimension. So, when phasing a spectrum interactively, its best to try phasing with P0 only first.

    Depending on the delay, the first point of the FID should be adjusted before Fourier Transform. The first point scaling factor is selected by the window function argument "-c".

    If the required first order phase P1 for the given dimension is 0.0, the first point scaling factor should be 0.5. This is because the discrete Fourier transform does the equivalent of counting the point at t=0 twice. If the first point is not scaled properly in this case, ridge-like baseline offsets in the spectrum will result.

    In all other cases (P1 is not zero), this scale factor should be 1.0. This is because the first point of the FID no longer corresponds to t=0, and so it shouldn't be scaled.

    If the scale factor is not set correctly, it will introduce a baseline distortion which is either zero-order or sinusoidal, depending on what first-order phase is applied.

    When possible, it is best to set up experiments with either exactly 0, 1/2, or 1-point delay. There are several reasons:

    Data with P1 = 360 have the first point t=0 missing (i.e. 1 point delay). Since the first point of the FID corresponds to the sum of points in the corresponding spectrum, this missing first point can be "restored" by adding a constant to the phased spectrum. This can be done conveniently by automated zero-order baseline correction, as shown below:


          Delay: 0   point    P1 =   0   FID: Scale -c 0.5
          Delay: 1/2 point    P1 = 180   FID: Scale -c 1.0 
                                         Spectrum: Folded peaks have opposite sign
          Delay: 1   point    P1 = 360   FID: Scale -c 1.0
                                         Spectrum: Use "POLY -auto -ord 0" after FT and PS.
  2. Zero Filling

    As a general rule, each dimension in the time-domain should always be at least doubled in size by zero filling before Fourier Transform. If this is not done, the real part of the transformed result will not contain all of the information in the original complex time-domain data. This means that information will be lost when imaginary data is deleted during the usual transform schemes. So, in most every case, data can be suitably processed by Zero-Fill in auto mode:

          | nmrPipe  -fn ZF -auto \

    With no other arguments, this will double the size of the data by zero filling, and continue to zero fill if needed so that the final result has a power-of-two number of points. The power-of-two size is not a requirement, but it will usually make Fourier processing faster.

  3. Baseline Correction

    The default automated baseline correction (POLY -auto) should usually only be used when at least two dimensions of the data have been transformed, so that there is as much empty baseline as possible for automated detection. So, it is common practice to create a processing scheme like this, which processes the first two dimensions, then transposes the data to correct the first dimension:

          nmrPipe -in fid/test001.fid \
          | nmrPipe  -fn POLY -time                             \
          | nmrPipe  -fn SP -off 0.5 -end 0.98 -pow 2 -c 0.5    \
          | nmrPipe  -fn ZF -auto                               \
          | nmrPipe  -fn FT -verb                               \
          | nmrPipe  -fn PS -p0 43 -p1 0.0 -di                  \
          | nmrPipe  -fn EXT -left -sw                          \
          | nmrPipe  -fn TP                                     \
          | nmrPipe  -fn SP -off 0.5 -end 0.98 -pow 1 -c 1.0    \
          | nmrPipe  -fn ZF -auto                               \
          | nmrPipe  -fn FT -verb                               \
          | nmrPipe  -fn PS -p0 -135 -p1 180 -di                \
          | nmrPipe  -fn TP                                     \
          | nmrPipe  -fn POLY -auto                             \
              -verb -ov -out test.ft2
    Zero-order automated baseline correction (POLY -auto -ord 0) is commonly applied to dimensions that have a missing first point (P1 = 360).

    Automated baseline correction should only be applied as needed.

    In most cases, only the directly-detected dimension, or dimensions with one-dwell delay (P1=360) need a baseline correction.

    NOTE WELL that cases for P1=0, P1=180, and P1=360 can all be handled by the proper combination of first point scaling and zero-order baseline correction. All other cases have the potential to introduce more difficult baseline distortions, as well as phase distortions for folded peaks. For these reasons, great care should be taken to set the acquisition delays to 0, 1/2, or 1 point whenever possible. This also eliminates the need to choose phase correction values manually, which can be very difficult with some spectra.

    In the case of digital oversampled data from Bruker instruments, the oversampling correction can result in baseline distortions which are especially problematic for homonuclear 2D and 3D cases or for many 1D 1H spectra. By default, oversampling correction is performed during conversion rather than during processing; when applied in this way, the correction is not ideal, although it has the convenience of creating a result which is ordinary time-domain data, with no artificial leading points. So, in many cases, baselines can be improved when needed by converting the data with the option Digital Oversampling Correction: During Processing selected in the bruker graphical conversion interface.

  4. Reversed Spectra, and Left/Right Swapped Spectra

    NOTE WELL that "FT -neg" is NOT exactly the same as simply reversing the spectrum with "REV". "FT -neg" is equivalent to REV followed by a one-point circular shift. So, if REV is applied instead, the PPM calibration will be off by one point.

    Of course, the "-neg" or "-alt" arguments should not be used if the data are not reversed or left/right swapped.

Common nmrPipe Processing Functions

The following is an alphabetical list of the most common nmrPipe processing functions used in the examples.

EXT Extract Region
Extracts a region from the current dimension with limits specified by the arguments -x1 and -xn; the limits can be labeled in points, percent, Hz, or PPM. Alternatively, the left or right half of the data can be extracted with the arguments -left and -right.

FT Fourier Transform
Applies a complex forward or inverse Fourier transform, with sign alternation for first half/second half rotated data (-alt) or complex conjugation for reversed data (-neg).

HT Hilbert Transform
Performs a Hilbert transform to reconstruct imaginary data.

LP Linear Prediction Extrapolation
By default uses forward LP method to extend the data to twice its original size via 8 complex coefficients. The number of predicted points can be adjusted via the -pred option, and the number of LP coefficients is specified by argument -ord. Mixed forward-backward LP is performed if the -fb argument is used. Mirror-image LP for data with no acquisition delay is performed if the argument -ps0-0 is used; mirror-image LP for data with a half-dwell acquisition delay is performed if the argument -ps90-180 is used.

MEM Maximum Entropy Reconstruction with Deconvolution
Applies Maximum Entropy reconstruction according to the method of Gull and Daniell: argument -ndim specifies the number of dimensions to reconstruct, arguments -pos and -neg are used to choose between all-positive mode and two-channel mode for reconstruction of data with both positive and negative signals. Argument -sigma specifies the estimated standard deviation of the noise in the time-domain. Argument -alpha specifies the fraction of a given iterate which will be added to the current MEM spectrum. Arguments -xconv -xcQ1 ... and -yconv ... etc. specify deconvolution in the form of an nmrPipe window function such as EM (exponential multiply) or an NMRPipe-format file which contains a deconvolution kernel. Other arguments can be used to optimize convergence speed, or to increase stability for reconstruction of data with high dynamic range.

POLY Subtract a Polynomial for Baseline Correction (frequency-domain)
Applies polynomial baseline correction of the order specified by argument -ord, via an automated baseline detection method when used with argument -auto. The default is a fourth order polynomial. The automated baseline mode works as follows: a copy of a given vector is divided into a series of adjacent sections, typically 8 points wide. The average value of each section is subtracted from all points in that section, to generate a "centered" vector. The intensities of the entire centered vector are sorted, and the standard deviation of the noise is estimated under the assumption that a given fraction (typically about 30%) of the smallest intensities belong to the baseline, and that the noise is normally distributed. This noise estimate is multiplied by a constant, typically about 1.5, to yield a classification threshold. Then, each section in the centered vector is classified as baseline only if none of the points in that section exceeds the threshold. These classifications are used to correct the original vector.

POLY Subtract a Polynomial for Solvent Suppression (time-domain)
When used with the argument -time, fits all data points to a polynomial, which is then subtracted from the original data. It is intended to fit and subtract low-frequency solvent signal in the FID, a procedure which often causes less distortion than time-domain convolution methods. By default, a fourth order polynomial is used. For speed, successive averages of regions are usually fit, rather than fitting all of the data points.

PS Phase Correction
Applies the zero and first order phase corrections as specified in degrees by the arguments -p0 and -p1. PS is commonly applied with the generic nmrPipe option -di which deletes imaginary data in the current dimension after processing.

SOL Solvent Suppression by Convolution Subtraction
Suppresses solvent signal by subtracting the results of a moving average filter with a default window of +/- 16 points.

SP Sine Bell Window with Adjustable Phase
Applies a sine-bell window extending from sinr(a*PI) to sinr(b*PI) with offset a, endpoint b, and exponent r specified by arguments -off, -end, and -pow, first-point scaling specified by argument -c. The default length is taken from the recorded time-domain size of the current dimension. By default, a = 0.0, b = 1.0, r = 1.0 (sine bell), and the first point scale factor is 1.0 (no scaling). In most examples, -off 0.5 is used to generate a cosine-like window which starts at height 1.0, while values for -end are usually around 0.95; settings of -end 1.0 are avoided, because this would result in a window function with the last point equal to zero. This effectively destroys information in one point of the given dimension, and also makes inverse processing problematic, since it will no longer be possible to divide data by the original window function.

TP 2D X/Y Transpose
Exchanges vectors from the X-axis and Y-axis of the data stream, so that the resultant data stream consists of vectors from the Y-axis of the original data.

ZF Zero Fill
Pads the data with zeros; the amount of padding can be specified by argument -zf, which defines the number of times to double the data size, or by the argument -size, which specifies the desired complex size after zero filling. By default, the data size is doubled by zero filling. Note that data should always be at least doubled by zero fill before FT, to prevent loss of information when imaginary data is deleted. The argument -auto will cause the zero-fill size to be rounded up to the nearest power of two for faster FT.

ZTP 3D X/Z Transpose
Exchanges vectors from the X-axis and Z-axis of the data stream, so that the resultant data stream consists of vectors from the Z-axis of the original data.

Conversion, Processing, Peak Detection: a example using 3D HNCO Data

Directory: pipe/demo/hnco

Pseudo-3D Analysis; Quantification of a 2D Relaxtion Series

Directory: pipe/demo/relax


   nlin.tab         Results of pseudo-3D Gaussian fitting.
   sim/test%03d.ft2 Simulated Spectrum Series from pseudo-3D fitting.
   dif/test%03d.ft2 Residual Spectrum Series from pseudo-3D fitting.

   txt/fit*.txt     X/Y Tables for each evolution curve.
   mod/fit*.tab     Output tables of evolution curve fitting results.
   gnu/fit*.gnu     Gnuplot plot commands for each curve.
   plot/fit*.ps     PostScript output of the fit results.

   autoFit.com      Created by autoFit.tcl to run seriesTab and nlinLS. 
   modelExp.com     Created via modelExp.tcl to fit each evolution.

How to adjust a peak parameter in NMRDraw

Peak Detection/Edit

This enables simple editing of the peak table in memory. There must already be a current peak table, either from "Peak Detection/Read" or "Peak Detection/Detect". When the "Edit" option is selected, the mouse can be used to insert or delete peaks, or to modify parameters. As with other mouse modes, the mouse button functions are given in the top border of nmrDraw.

NOTE! Once a peak table is edited, it should be saved using the "Write" option.

In the peak editing mode, the [Left] mouse button inserts a peak, the [Middle] mouse button can be used to adjust a peak value, and the [Right] mouse button will delete the nearest peak.

When using the middle mouse button to adjust a peak value, the value to be adjusted is the one selected for the label. By default, the peak "INDEX" number is displayed as a label. This can be changed through the "Variables" menu in the "Peak Detection" pop-up. For example, if the variable "ASS" is selected from the "Variables" menu, the existing assignments will be displayed, and these can then be modified. The "CLUSTID" value can also be displayed and modified this way.

After clicking the middle mouse button over the peak to change, type to erase the existing value, then type in the new value followed by the key. Because of keyboard focus issues, you may have to move the mouse in and out of the graphics area before typing; using the -focus option of nmrDraw may help this.

The keyboard commands "[" and "]" can be used to toggle the display of peak labels on and off.

Reconstruction of Non-Uniform Sampling (NUS) Data

Directory: pipe/demo/nusdemo2d In a conventional 2D experiment, the indirect dimension would be acquired with a set of N uniformly-spaced time increments in increasing order:

   time point = k*dT + delay
where dT is the time increment, delay is the initial acquisition delay, and k is the increment number which goes from 0 to N - 1. So, we can represent the measurement scheme for the indirect dimension as a list of N increment numbers. For a conventional data set with 512 indirect points, this would simply correspond to the list 0 1 2 3 4 ... 509 510 511.

In a non-uniform sampling scheme, one or more increments are skipped (not measured at all) during the acquisition. And, in common NUS schemes, the increments are measured in random order. So, in a typical NUS acquisition scheme, there is a sampling schedule in the form of a text file which lists the increments which were measured, in the order that they are recorded. One example of such a text file (here, hsqc.hdr_3) is shown below. In this example, a subset of 128 increments out of 512 possible increments are recorded in (pseudo) random order:

     0 511  30  20   1  26  79  12  32  41
     4  70  40  29   7  59  34  74  77  53 
     2  15  57  27  55   8  33  13  16  10 
    61  17 106  37  43 113  64  66  31  95 
     6  25  52   3  42  51  24  60  44  49 
    11  39  50 129  48  18 140  68  23  93 
   117  54   5  81  78 112 107  38   9  14 
    76  75  72  90 141 116  19  46 114 128 
    94  56 127 103  21  92  28 161 125  73 
    62  22 158  58  35 191  45 176 121  36 
   163 165  65  99 152 102  85  89  47  67 
   154 217 151 145 206  98 171  82 108 149 
   109 164 131 138  91  84  63 101 

In NMRPipe NUS schemes, sampling schedule files like the one above are used to re-shuffle and expand the original NUS time-domain data so that the measured increments are in increasing order as in conventional uniform data, with missing increments filled by zeros.

In addition, NMRPipe NUS schemes also create a sampling schedule kernel file (here, prof.y). This is an NMRPipe-format data file which contains 1.0 in every position where an increment was measured, and 0.0 in every position where an increment was skipped. This file can be used as input for Maximum Entropy deconvolution or Maximum Likelihood analysis:

NUS Sample Schedule Kernel
   nusSort.tcl -in test.fid -sample hsqc.hdr_3 -out sort.fid -expand -prof prof.y

   nmrPipe -in sort.fid \
   | nmrPipe -fn POLY -time \
   | nmrPipe -fn ZF -auto \
   | nmrPipe -fn FT \
   | nmrPipe -fn PS -p0 -52 -p1 0.0 \
   | nmrPipe -fn EXT -x1 10.5ppm -xn 5.5ppm -sw -verb \
   | nmrPipe -fn FT -inv \
   | nmrPipe -fn ZF -inv \
   | nmrPipe -fn MEM -sigma 200 -report 2 -ndim 2 \
             -xzf 1 -xconv EM   -xcQ1 12 \
             -yzf 1 -yconv FILE prof.y   \
             -out deco.ft2 -ov 

TALOS Demo: Chemical Shift Database Search for Phi/Psi Angles

See Also: the TALOS Web Site

Directory: pipe/demo/talos

cd pipe/demo/talos


more valpha.tab

talos.tcl -in valpha.tab

    (OR for if needed, use the fast version):

vina.tcl  -in valpha.tab -ref valpha.pdb -AUTO
rama.tcl  -in valpha.tab -ref valpha.pdb -sd -ras

(exit rasmol if needed)

talos2xplor.tcl -cs valpha.tab -pdb valpha.pdb > xplorTorsion.tbl
more xplorTorsion.tbl

DC -inCS valpha.tab -outCS csCalc.tab -pdb valpha.pdb -verb
more csCalc.tab
showCS.tcl -in csCalc.tab

Refinement of an Existing Structure to HN/N Dipolar Couplings

See Also: the DYNAMO Web Site

This example was kindly provided by Prof. Jaison Jacob, then at Vanderbilt University.

In this example, an existing structure is refined against a set of HN-N dipolar couplings. The initial structure agrees to only ~7 Hz RMSD with the dipolar couplings, but the refined structure to better than 1 Hz RMSD. Interestingly, the backbone of the refined structure is less than 0.3A RMSD from the initial structure; in this case, only a small change in the structure is needed to substantially improve the dipolar coupling agreement.

In order to use the DYNAMO structure calculation environment on a given molecular system, we first must use the tools of DYNAMO to create tables describing the covalent geometry of the molecules involved. Then we must create a PDB file with a complete set of atoms in the DYNAMO nomenclature. In this example, we start with a PDB file produced by some other molecular analysis software. So, in the first steps, we use the tools of DYNAMO to read the protein sequence information from the given PDB file, and to create a complete DYNAMO PDB file whose structure is refined to mimic the structure in the given PDB file. Then, this DYNAMO PDB file is refined using HN-N dipolar couplings.

The specific steps in the demonstration are:

cd pipe/demo/dchn


more all.com

(quit all graphs, quit rasmol)

Conventional Structure Calculation with NOEs and Dipolar Couplings

See Also: the DYNAMO Web Site

cd pipe/demo/ubiq


more init.com

(exit rasmol)
ls ubiq.gmc

(edit simpleSA.tcl ... change the initial random number "54321")

(exit rasmol)

ls ubiq.gmc

ov.tcl -r1 2 -rN 72 -ref 1ubq.pdb  -in ubiq.gmc/dyn*pdb
rasmol overlay.pdb
(exit rasmol)

set goodList = (`pdbSelect.tcl -n 5 -noe 0.1 -pdb ubiq.gmc/dyn*pdb`)
ov.tcl -r1 2 -rN 72 -ref 1ubq.pdb -in $goodList
rasmol overlay.pdb

scrollRama.tcl -pdb 1ubq.pdb $goodList

Molecular Fragment Replacement

Directory: pipe/demo/mfr

The MFR method determines elements of protein structure by finding small fragments (5-15 residues) in the PDB database whose simulated dipolar couplings and shifts match those measured for the target protein. These small homologous fragments can then be used in various ways to reconstitute larger elements of protein structure. This demo uses several types of dipolar couplings measured in two alignment media, which allows fragments to be assembled into larger structures of 10-50 residues or more.

The steps in this demo are:

cd pipe/demo/mfr


more dObsA.tab

more ext.com
(exit rasmol)

mfr.tcl -excl 1ubq -csThresh 2.5

more mfr.tab
plotTab.tcl -in mfr.tab -x A_DA -y A_DR
plotTab.tcl -in mfr.tab -x D_RES1 -y A_DA B_DA -yMax 0.0

rasmol init.pdb
scrollRama.tcl -pdb ref.pdb init.pdb -mfr mfr.tab