NMR Science - EMBO NMR Course 2009

EMBO Logo 2009d

NMRPipe Files and Examples for EMBO NMR Course 2009
[See Also: The Big NMRPipe Reference Page]

NMRPipe Files for EMBO NMR Course 2009

When downloading, be sure to retain the exact file name shown here, renaming if needed.
Check that the final file size matches the size shown here exactly.

Windows Internet Explorer: Right-Click, "Save Target As"
Linux Mozilla: Left-Click; Use "File/Save Page As" if a file is displayed as a web page.
Mac OS X Safari: Right-Click, "Download Linked File (As)"

[embo2009d.ppt] Lecture Slides (September 2, 2009. 24,829,440 bytes)
[pipedemo.tar] Demo Data Directory (pipe/demo) (September 2, 2009. 933,314,560 bytes)

EMBO Practicum

The NMRPipe installation can be found under the pipe directory of your EMBO computer account. You are encouraged to copy the software and data for your own use.

pipe/tar
This directory contains all the files needed to install NMRPipe on other computers.
pipe/nmr
This directory contains various directories associated with the installed version of NMRPipe.
pipe/demo/hnco
Example 3D HNCO data to introduce basic processing, display, and peak detection with NMRPipe.
pipe/demo/relax
Example of Pseudo-3D processing and quantification, in this case a 2D relaxation series.
pipe/demo/jmod
Example of Pseudo-3D processing and quantification, in this case a 2D J-modulated series.
pipe/demo/nusdemo2d
Examples of reconstruction methods for Non-Uniformly Sampled Data (NUS) including Maximum Entropy and Maximum Liklihood Methods.
pipe/demo/valpha/cbcaconh
Some additional examples of 3D spectral processing, including linear prediction applied to two dimessions, and Maximum Entropy Method applied in 2D and 3D.
pipe/demo/valpha/prime
An example of corrlated spectral viewing and strip display of related 2D and 3D triple-resonance spectra.
pipe/demo/talos
An example of TALOS, the databadse mining application for predicting protein backbone phi,psi angles from backbone chemical shifts.
pipe/demo/ubiq
An example of conventional structure determination with NOE, Dipolar Couplings, J-couplings, and TALOS torsion restraints.
pipe/demo/dchn
An example of refining an existing structure with dipolar couplings.
pipe/demo/mfr
An example of the NMR homology search method called Molecular Fragment Replacement (MFR). This example uses a combination of dipolar couplings and chemical shifts to determine the fold of ubiquitin, without use of NOE distance data.
pipe/demo/titr
An example of 2D-HSQC chemical shift titration processing and analysis, along with multivariate display of the results.
pipe/demo/apf
An example of batch 1D processing and analysis, along with multivariate navigation of the results.

NMRPipe: Introduction

NMRPipe has its genesis as a spectral processing engine, emphasizing multidimensional NMR applications. The use of NMRPipe is noted in roughly 50% of the NMR structures accepted into the Protein Data Bank (PDB) since 2000. Over the years, NMRPipe has been augmented as part of a plan to provide a set of tools under a common framework for all aspects of biomolecular NMR. The key philosophy is a bottom-up approach to software and application development, where simpler components are combined using standard scripting techniques (here, UNIX C-Shell and TCL) to achieve complex goals. An early focus of the software was flexibility, since protein NMR methods were rapidly changing and expanding, typical protein structure calculation projects took months or even years, and no completely standard protocol was used. Now, computers are fast enough to process 3D spectra in seconds, experimental methods for high-throughput NMR structure determination are available, and NMR structural biology is practiced by those who might not be completely familiar with details of multidimensional signal processing. Also, 1D and 2D spectral series analysis are now common tools for protein-ligand screening and characterization. In response, our current software development includes an emphasis on automation and batch processing, and spectral series analysis.

A complete list of all programs, functions, and scripts in the NMRPipe system is given in The Big NMRPipe Reference Page. Also, most every program and script can be invoked with the -help argument to list the command-line arguments which can be used, for example. If the help text is very long, it can be displayed a page at a time by a UNIX pipeline (in this case, redirecting standard error to the UNIX page-by-page viewing command more):

   nmrPipe -help |& more

In the case of the nmrPipe program itself, help text for a specific function can be displayed by including the -fn option with the -help flag, for example:

   nmrPipe -help -fn FT |& more

Since it is script-based, NMRPipe is highly customizable. The following is a list of some things that the software can do; the sample data and tutorials for this NMR course will include examples of most of these facilities:

Interpret parameters for conversion of Bruker, Varian, and JEOL Delta format data, with conversion of time-domain data and adjustment for digital oversampling.
Process, rephase and display multidimensional data, including options rigorous inverse processing, Maximum Entropy and Linear Prediction extrapolation.
Reconstruction of Non-Uniform Sampling (NUS) Spectral Data, via Maximum Likelihood Frequency Maps and Maximum Entropy Deconvolution.
Rapid and Effective Automated Peak Detection for 1D-4D.
Extensive Line-Shape fitting functions, including direct fitting of pseudo-3D data such as relaxation series. Lineshape profiles can be either ideal frequency-domain models, or digital time-domain models which match the experimental details of Fourier processing.
Vector decomposition of spectral and imaging data, and common image processing and segmentation functions.
Create simulated multidimensional hypercomplex time or frequency domain data.
Create and draw strip plots, projections, and overlays. Latest options include strip plots for multiple spectra, with drag and drop options to adjust strip order.
Predict protein backbone angles based on backbone chemical shifts.
Simulate and display protein backbone chemical shifts based on backbone angles.
Calculate J-couplings from Karplus parameters.
Simulate or fit and display Dipolar Couplings, visualize tensor parameters with respect to a PDB file.
Estimate protein alignment tensor parameters from measured dipolar couplings without prior knowledge of the structure.
List or display backbone and sidechain angles, visualize ramachandran trajectory for one or more proteins or fragments.
Analyze protein for solvent exposed surface, H-bonds, secondary structure and turn classification.
Find coordinate or torsion RMSD between two or more structures, form overlay.
Simulated annealing structure calculation, including NOEs, J-coupling, torsion restraints, radius of gyration, pseudo-contact shifts, and dipolar couplings.
Structure determination methods based on searching the PDB Database for NMR Parameter Homology, MFR (Molecular Fragment Replacement).

Special applications developed using NMRPipe include:

Facilities to automatically scan a file system to locate and identify NMR data for batch processing.
Automated 1D batch processing and plotting
Special facilities for automated 1D STD (Saturation Transfer Difference) analysis.
Automated 2D HSQC batch processing.
Multivariate analysis to interactively characterize similarities and differences in large spectral series.
Tools for extraction and analysis of chemical shift evolution curves.

Some General Tips About Spectral Processing

First Order Phase Correction, Folding, and First-point Scaling
A delay in the acquisition introduces a first-order phase distortion. If there is no delay, no first order phase correction is required. Each delay of 1 dwell (1 point) introduces an additional 360 first order phase correction.
Many current NMR pulse sequences are designed so that there is no delay in the directly acquired dimension. So, when phasing a spectrum interactively, its best to try phasing with P0 only first.
Depending on the delay, the first point of the FID should be adjusted before Fourier Transform. The first point scaling factor is selected by the window function argument "-c".
If the required first order phase P1 for the given dimension is 0.0, the first point scaling factor should be 0.5. This is because the discrete Fourier transform does the equivalent of counting the point at t=0 twice. If the first point is not scaled properly in this case, ridge-like baseline offsets in the spectrum will result.
In all other cases (P1 is not zero), this scale factor should be 1.0. This is because the first point of the FID no longer corresponds to t=0, and so it shouldn't be scaled.
If the scale factor is not set correctly, it will introduce a baseline distortion which is either zero-order or sinusoidal, depending on what first-order phase is applied.
When possible, it is best to set up experiments with either exactly 0, 1/2, or 1-point delay. There are several reasons:
- Phase correction values can be determined easily.
- If the delay is not a multiple of 1/2 point, the phase of folded peaks will be distorted.
- The hilbert transform (HT) is used, sometimes automatically, to reconstruct previously-deleted imaginary data for interactive rephasing or inverse processing. But, the HT can only reconstruct imaginary data perfectly if the phase is a multiple of 1/2 point.
Data with P1 = 360 have the first point t=0 missing (i.e. 1 point delay). Since the first point of the FID corresponds to the sum of points in the corresponding spectrum, this missing first point can be "restored" by adding a constant to the phased spectrum. This can be done conveniently by automated zero-order baseline correction, as shown below:
Summary
```
      Delay: 0   point    P1 =   0   FID: Scale -c 0.5

      Delay: 1/2 point    P1 = 180   FID: Scale -c 1.0 
                                     Spectrum: Folded peaks have opposite sign

      Delay: 1   point    P1 = 360   FID: Scale -c 1.0
                                     Spectrum: Use "POLY -auto -ord 0" after FT and PS.
```
Zero Filling
As a general rule, each dimension in the time-domain should always be at least doubled in size by zero filling before Fourier Transform. If this is not done, the real part of the transformed result will not contain all of the information in the original complex time-domain data. This means that information will be lost when imaginary data is deleted during the usual transform schemes. So, in most every case, data can be suitably processed by Zero-Fill in auto mode:
```
      | nmrPipe  -fn ZF -auto \
```
With no other arguments, this will double the size of the data by zero filling, and continue to zero fill if needed so that the final result has a power-of-two number of points. The power-of-two size is not a requirement, but it will usually make Fourier processing faster.
Baseline Correction
The default automated baseline correction (POLY -auto) should usually only be used when at least two dimensions of the data have been transformed, so that there is as much empty baseline as possible for automated detection. So, it is common practice to create a processing scheme like this, which processes the first two dimensions, then transposes the data to correct the first dimension:
```
      nmrPipe -in fid/test001.fid \
      | nmrPipe  -fn POLY -time                             \
      | nmrPipe  -fn SP -off 0.5 -end 0.98 -pow 2 -c 0.5    \
      | nmrPipe  -fn ZF -auto                               \
      | nmrPipe  -fn FT -verb                               \
      | nmrPipe  -fn PS -p0 43 -p1 0.0 -di                  \
      | nmrPipe  -fn EXT -left -sw                          \
      | nmrPipe  -fn TP                                     \
      | nmrPipe  -fn SP -off 0.5 -end 0.98 -pow 1 -c 1.0    \
      | nmrPipe  -fn ZF -auto                               \
      | nmrPipe  -fn FT -verb                               \
      | nmrPipe  -fn PS -p0 -135 -p1 180 -di                \
      | nmrPipe  -fn TP                                     \
      | nmrPipe  -fn POLY -auto                             \
          -verb -ov -out test.ft2
```
Zero-order automated baseline correction (POLY -auto -ord 0) is commonly applied to dimensions that have a missing first point (P1 = 360).
Automated baseline correction should only be applied as needed.
In most cases, only the directly-detected dimension, or dimensions with one-dwell delay (P1=360) need a baseline correction.
NOTE WELL that cases for P1=0, P1=180, and P1=360 can all be handled by the proper combination of first point scaling and zero-order baseline correction. All other cases have the potential to introduce more difficult baseline distortions, as well as phase distortions for folded peaks. For these reasons, great care should be taken to set the acquisition delays to 0, 1/2, or 1 point whenever possible. This also eliminates the need to choose phase correction values manually, which can be very difficult with some spectra.
In the case of digital oversampled data from Bruker instruments, the oversampling correction can result in baseline distortions which are especially problematic for homonuclear 2D and 3D cases or for many 1D 1H spectra. By default, oversampling correction is performed during conversion rather than during processing; when applied in this way, the correction is not ideal, although it has the convenience of creating a result which is ordinary time-domain data, with no artificial leading points. So, in many cases, baselines can be improved when needed by converting the data with the option Digital Oversampling Correction: During Processing selected in the bruker graphical conversion interface.
Reversed Spectra, and Left/Right Swapped Spectra
- Use "FT -neg" if a given dimension is reversed.
- Use "FT -alt" if a given dimension is left/right swapped.
- The "-alt" and "-neg" arguments can be used together if needed.
NOTE WELL that "FT -neg" is NOT exactly the same as simply reversing the spectrum with "REV". "FT -neg" is equivalent to REV followed by a one-point circular shift. So, if REV is applied instead, the PPM calibration will be off by one point.
Of course, the "-neg" or "-alt" arguments should not be used if the data are not reversed or left/right swapped.

Common nmrPipe Processing Functions

The following is an alphabetical list of the most common nmrPipe processing functions used in the examples.

EXT Extract Region
Extracts a region from the current dimension with limits specified by the arguments -x1 and -xn; the limits can be labeled in points, percent, Hz, or PPM. Alternatively, the left or right half of the data can be extracted with the arguments -left and -right.

FT Fourier Transform
Applies a complex forward or inverse Fourier transform, with sign alternation for first half/second half rotated data (-alt) or complex conjugation for reversed data (-neg).

HT Hilbert Transform
Performs a Hilbert transform to reconstruct imaginary data.

LP Linear Prediction Extrapolation
By default uses forward LP method to extend the data to twice its original size via 8 complex coefficients. The number of predicted points can be adjusted via the -pred option, and the number of LP coefficients is specified by argument -ord. Mixed forward-backward LP is performed if the -fb argument is used. Mirror-image LP for data with no acquisition delay is performed if the argument -ps0-0 is used; mirror-image LP for data with a half-dwell acquisition delay is performed if the argument -ps90-180 is used.

MEM Maximum Entropy Reconstruction with Deconvolution
Applies Maximum Entropy reconstruction according to the method of Gull and Daniell: argument -ndim specifies the number of dimensions to reconstruct, arguments -pos and -neg are used to choose between all-positive mode and two-channel mode for reconstruction of data with both positive and negative signals. Argument -sigma specifies the estimated standard deviation of the noise in the time-domain. Argument -alpha specifies the fraction of a given iterate which will be added to the current MEM spectrum. Arguments -xconv -xcQ1 ... and -yconv ... etc. specify deconvolution in the form of an nmrPipe window function such as EM (exponential multiply) or an NMRPipe-format file which contains a deconvolution kernel. Other arguments can be used to optimize convergence speed, or to increase stability for reconstruction of data with high dynamic range.

POLY Subtract a Polynomial for Baseline Correction (frequency-domain)
Applies polynomial baseline correction of the order specified by argument -ord, via an automated baseline detection method when used with argument -auto. The default is a fourth order polynomial. The automated baseline mode works as follows: a copy of a given vector is divided into a series of adjacent sections, typically 8 points wide. The average value of each section is subtracted from all points in that section, to generate a "centered" vector. The intensities of the entire centered vector are sorted, and the standard deviation of the noise is estimated under the assumption that a given fraction (typically about 30%) of the smallest intensities belong to the baseline, and that the noise is normally distributed. This noise estimate is multiplied by a constant, typically about 1.5, to yield a classification threshold. Then, each section in the centered vector is classified as baseline only if none of the points in that section exceeds the threshold. These classifications are used to correct the original vector.

POLY Subtract a Polynomial for Solvent Suppression (time-domain)
When used with the argument -time, fits all data points to a polynomial, which is then subtracted from the original data. It is intended to fit and subtract low-frequency solvent signal in the FID, a procedure which often causes less distortion than time-domain convolution methods. By default, a fourth order polynomial is used. For speed, successive averages of regions are usually fit, rather than fitting all of the data points.

PS Phase Correction
Applies the zero and first order phase corrections as specified in degrees by the arguments -p0 and -p1. PS is commonly applied with the generic nmrPipe option -di which deletes imaginary data in the current dimension after processing.

SOL Solvent Suppression by Convolution Subtraction
Suppresses solvent signal by subtracting the results of a moving average filter with a default window of +/- 16 points.

SP Sine Bell Window with Adjustable Phase
Applies a sine-bell window extending from sin^r(a*PI) to sin^r(b*PI) with offset a, endpoint b, and exponent r specified by arguments -off, -end, and -pow, first-point scaling specified by argument -c. The default length is taken from the recorded time-domain size of the current dimension. By default, a = 0.0, b = 1.0, r = 1.0 (sine bell), and the first point scale factor is 1.0 (no scaling). In most examples, -off 0.5 is used to generate a cosine-like window which starts at height 1.0, while values for -end are usually around 0.95; settings of -end 1.0 are avoided, because this would result in a window function with the last point equal to zero. This effectively destroys information in one point of the given dimension, and also makes inverse processing problematic, since it will no longer be possible to divide data by the original window function.

TP 2D X/Y Transpose
Exchanges vectors from the X-axis and Y-axis of the data stream, so that the resultant data stream consists of vectors from the Y-axis of the original data.

ZF Zero Fill
Pads the data with zeros; the amount of padding can be specified by argument -zf, which defines the number of times to double the data size, or by the argument -size, which specifies the desired complex size after zero filling. By default, the data size is doubled by zero filling. Note that data should always be at least doubled by zero fill before FT, to prevent loss of information when imaginary data is deleted. The argument -auto will cause the zero-fill size to be rounded up to the nearest power of two for faster FT.

ZTP 3D X/Z Transpose
Exchanges vectors from the X-axis and Z-axis of the data stream, so that the resultant data stream consists of vectors from the Z-axis of the original data.

Conversion, Processing, Peak Detection: a example using 3D HNCO Data

Directory: pipe/demo/hnco

Use the bruker or varian command to create and execute a format conversion script.
Use nmrDraw to view time-domain data, process 1D vectors, and choose phase correction.
Use the Macro Edit text editor in nmrDraw to create and adjust a 2D processing scheme.
Use nmrDraw to inspect processed 2D data, adjust contour levels, or rephase selected 1D vectors.
Use the Macro Edit text editor in nmrDraw to create and and execute a 3D processing scheme.
Use nmrDraw to inspect processed 3D data, and perform 3D peak detection.
Adjust a 3D processing script to include automatic strip plot display.

Pseudo-3D Analysis; Quantification of a 2D Relaxtion Series

Directory: pipe/demo/relax

proc.com
Uses scripts "fid.com" and "ft2.com" below to convert and process each spectrum in the series, and to record the tau value with each spectrum. Note that this script sets the header of the result spectra via "series.com" so that the data are pseudo-3D.
fid.com
The usual conversion script, but adjusted to take input name, output name, and tau value from the command-line. In the example script here, the tau value is set during conversion. The tau value recorded with a particular spectrum is included in the output of the "showhdr" command (only non-zero tau values are listed). Note that you can also set or change the recorded tau value of an NMRPipe FID or spectrum with the "sethdr" command, for example:
```
      sethdr ft/test001.dat -tau 8
```
ft2.com
The usual processing script, but adjusted to take input name, and output name from the command-line. In the example script here, the "GM" window function is used to make the final lineshapes as close to Gaussian as possible, since the spectrum will later be fit to Gaussian lineshapes (see the section on GM below).
nmrDraw
Pick the peaks in the first 2D plane with "nmrDraw"; create "relax.master.tab". Be sure to delete noise peaks, insert unresolved peaks, and to inspect and edit the CLUSTID values so that overlapping peaks all have the same CLUSTID value. Note well that the success of the analysis depends strongly on the quality of the peak table.
fit.com (autoFit.tcl)
Use "fit.com" to fit evolution curves to the pseudo-3D data; this produces both a seriesTab result "aux.tab" and a lineshape fitting result "nlin.tab". This script also produces a simulated spectral series "sim/test%03d.ft2" and a residual series "dif/test%03d.ft2". The simulated spectra and residual should be inspected carefully to evaluate the success of the lineshape fitting procedure.
model.com (modelExp.tcl)
Create commands which fit each evolution curve, either using the seriesTab result "aux.tab" or lineshape fitting result "nlin.tab". Produces output tables and postscript plots of each fitted evolution curve (this requires plotting program "gnuplot").
showEvolve.tcl
This is an interactive alternative to "model.com" above. This script will display the evolution curves in "nlin.tab" on the screen one at a time, with an option to fit the current curve to an exponential model.

Output

   nlin.tab         Results of pseudo-3D Gaussian fitting.
   sim/test%03d.ft2 Simulated Spectrum Series from pseudo-3D fitting.
   dif/test%03d.ft2 Residual Spectrum Series from pseudo-3D fitting.

   txt/fit*.txt     X/Y Tables for each evolution curve.
   mod/fit*.tab     Output tables of evolution curve fitting results.
   gnu/fit*.gnu     Gnuplot plot commands for each curve.
   plot/fit*.ps     PostScript output of the fit results.

   autoFit.com      Created by autoFit.tcl to run seriesTab and nlinLS. 
   modelExp.com     Created via modelExp.tcl to fit each evolution.

How to adjust a peak parameter in NMRDraw

Peak Detection/Edit

This enables simple editing of the peak table in memory. There must already be a current peak table, either from "Peak Detection/Read" or "Peak Detection/Detect". When the "Edit" option is selected, the mouse can be used to insert or delete peaks, or to modify parameters. As with other mouse modes, the mouse button functions are given in the top border of nmrDraw.

NOTE! Once a peak table is edited, it should be saved using the "Write" option.

In the peak editing mode, the [Left] mouse button inserts a peak, the [Middle] mouse button can be used to adjust a peak value, and the [Right] mouse button will delete the nearest peak.

When using the middle mouse button to adjust a peak value, the value to be adjusted is the one selected for the label. By default, the peak "INDEX" number is displayed as a label. This can be changed through the "Variables" menu in the "Peak Detection" pop-up. For example, if the variable "ASS" is selected from the "Variables" menu, the existing assignments will be displayed, and these can then be modified. The "CLUSTID" value can also be displayed and modified this way.

After clicking the middle mouse button over the peak to change, type to erase the existing value, then type in the new value followed by the key. Because of keyboard focus issues, you may have to move the mouse in and out of the graphics area before typing; using the -focus option of nmrDraw may help this.

The keyboard commands "[" and "]" can be used to toggle the display of peak labels on and off.

Reconstruction of Non-Uniform Sampling (NUS) Data

Directory: pipe/demo/nusdemo2d In a conventional 2D experiment, the indirect dimension would be acquired with a set of N uniformly-spaced time increments in increasing order:

   time point = k*dT + delay

where dT is the time increment, delay is the initial acquisition delay, and k is the increment number which goes from 0 to N - 1. So, we can represent the measurement scheme for the indirect dimension as a list of N increment numbers. For a conventional data set with 512 indirect points, this would simply correspond to the list 0 1 2 3 4 ... 509 510 511.

In a non-uniform sampling scheme, one or more increments are skipped (not measured at all) during the acquisition. And, in common NUS schemes, the increments are measured in random order. So, in a typical NUS acquisition scheme, there is a sampling schedule in the form of a text file which lists the increments which were measured, in the order that they are recorded. One example of such a text file (here, hsqc.hdr_3) is shown below. In this example, a subset of 128 increments out of 512 possible increments are recorded in (pseudo) random order:

     0 511  30  20   1  26  79  12  32  41
     4  70  40  29   7  59  34  74  77  53 
     2  15  57  27  55   8  33  13  16  10 
    61  17 106  37  43 113  64  66  31  95 
     6  25  52   3  42  51  24  60  44  49 
    11  39  50 129  48  18 140  68  23  93 
   117  54   5  81  78 112 107  38   9  14 
    76  75  72  90 141 116  19  46 114 128 
    94  56 127 103  21  92  28 161 125  73 
    62  22 158  58  35 191  45 176 121  36 
   163 165  65  99 152 102  85  89  47  67 
   154 217 151 145 206  98 171  82 108 149 
   109 164 131 138  91  84  63 101

In NMRPipe NUS schemes, sampling schedule files like the one above are used to re-shuffle and expand the original NUS time-domain data so that the measured increments are in increasing order as in conventional uniform data, with missing increments filled by zeros.

In addition, NMRPipe NUS schemes also create a sampling schedule kernel file (here, prof.y). This is an NMRPipe-format data file which contains 1.0 in every position where an increment was measured, and 0.0 in every position where an increment was skipped. This file can be used as input for Maximum Entropy deconvolution or Maximum Likelihood analysis:

   nusSort.tcl -in test.fid -sample hsqc.hdr_3 -out sort.fid -expand -prof prof.y

   nmrPipe -in sort.fid \
   | nmrPipe -fn POLY -time \
   | nmrPipe -fn ZF -auto \
   | nmrPipe -fn FT \
   | nmrPipe -fn PS -p0 -52 -p1 0.0 \
   | nmrPipe -fn EXT -x1 10.5ppm -xn 5.5ppm -sw -verb \
   | nmrPipe -fn FT -inv \
   | nmrPipe -fn ZF -inv \
   | nmrPipe -fn MEM -sigma 200 -report 2 -ndim 2 \
             -xzf 1 -xconv EM   -xcQ1 12 \
             -yzf 1 -yconv FILE prof.y   \
             -out deco.ft2 -ov

TALOS Demo: Chemical Shift Database Search for Phi/Psi Angles