NMRPipe Files and Examples for EMBO NMR Course 2009
[See Also: The Big NMRPipe Reference Page]
![]() |
NMRPipe Files for EMBO NMR Course 2009
Windows Internet Explorer: Right-Click, "Save Target As" [pipedemo.tar] Demo Data Directory (pipe/demo) (September 2, 2009. 933,314,560 bytes) |
EMBO Practicum
The NMRPipe installation can be found under the pipe
directory of your EMBO computer account. You are encouraged to
copy the software and data for your own use.
pipe/tar
pipe/nmr
pipe/demo/hnco
pipe/demo/relax
pipe/demo/jmod
pipe/demo/nusdemo2d
pipe/demo/valpha/cbcaconh
pipe/demo/valpha/prime
pipe/demo/talos
pipe/demo/ubiq
pipe/demo/dchn
pipe/demo/mfr
pipe/demo/titr
pipe/demo/apf
NMRPipe: Introduction
NMRPipe has its genesis as a spectral processing engine, emphasizing multidimensional NMR applications. The use of NMRPipe is noted in roughly 50% of the NMR structures accepted into the Protein Data Bank (PDB) since 2000. Over the years, NMRPipe has been augmented as part of a plan to provide a set of tools under a common framework for all aspects of biomolecular NMR. The key philosophy is a bottom-up approach to software and application development, where simpler components are combined using standard scripting techniques (here, UNIX C-Shell and TCL) to achieve complex goals. An early focus of the software was flexibility, since protein NMR methods were rapidly changing and expanding, typical protein structure calculation projects took months or even years, and no completely standard protocol was used. Now, computers are fast enough to process 3D spectra in seconds, experimental methods for high-throughput NMR structure determination are available, and NMR structural biology is practiced by those who might not be completely familiar with details of multidimensional signal processing. Also, 1D and 2D spectral series analysis are now common tools for protein-ligand screening and characterization. In response, our current software development includes an emphasis on automation and batch processing, and spectral series analysis.
A complete list of all programs, functions, and scripts in the NMRPipe system is given in The Big NMRPipe Reference Page. Also, most every program and script can be invoked with the -help argument to list the command-line arguments which can be used, for example. If the help text is very long, it can be displayed a page at a time by a UNIX pipeline (in this case, redirecting standard error to the UNIX page-by-page viewing command more):nmrPipe -help |& moreIn the case of the nmrPipe program itself, help text for a specific function can be displayed by including the -fn option with the -help flag, for example:
nmrPipe -help -fn FT |& more
Since it is script-based, NMRPipe is highly customizable. The following is a list of some things that the software can do; the sample data and tutorials for this NMR course will include examples of most of these facilities:
Special applications developed using NMRPipe include:
Some General Tips About Spectral Processing
A delay in the acquisition introduces a first-order phase distortion. If there is no delay, no first order phase correction is required. Each delay of 1 dwell (1 point) introduces an additional 360 first order phase correction.
Many current NMR pulse sequences are designed so that there is no delay in the directly acquired dimension. So, when phasing a spectrum interactively, its best to try phasing with P0 only first.
Depending on the delay, the first point of the FID should be adjusted before Fourier Transform. The first point scaling factor is selected by the window function argument "-c".
If the required first order phase P1 for the given dimension is 0.0, the first point scaling factor should be 0.5. This is because the discrete Fourier transform does the equivalent of counting the point at t=0 twice. If the first point is not scaled properly in this case, ridge-like baseline offsets in the spectrum will result.
In all other cases (P1 is not zero), this scale factor should be 1.0. This is because the first point of the FID no longer corresponds to t=0, and so it shouldn't be scaled.
If the scale factor is not set correctly, it will introduce a baseline distortion which is either zero-order or sinusoidal, depending on what first-order phase is applied.
When possible, it is best to set up experiments with either exactly 0, 1/2, or 1-point delay. There are several reasons:
Data with P1 = 360 have the first point t=0 missing (i.e. 1 point delay). Since the first point of the FID corresponds to the sum of points in the corresponding spectrum, this missing first point can be "restored" by adding a constant to the phased spectrum. This can be done conveniently by automated zero-order baseline correction, as shown below:
Summary
Delay: 0 point P1 = 0 FID: Scale -c 0.5 Delay: 1/2 point P1 = 180 FID: Scale -c 1.0 Spectrum: Folded peaks have opposite sign Delay: 1 point P1 = 360 FID: Scale -c 1.0 Spectrum: Use "POLY -auto -ord 0" after FT and PS.
As a general rule, each dimension in the time-domain should always be at least doubled in size by zero filling before Fourier Transform. If this is not done, the real part of the transformed result will not contain all of the information in the original complex time-domain data. This means that information will be lost when imaginary data is deleted during the usual transform schemes. So, in most every case, data can be suitably processed by Zero-Fill in auto mode:
| nmrPipe -fn ZF -auto \
With no other arguments, this will double the size of the data by zero filling,
and continue to zero fill if needed so that the final result
has a power-of-two number of points. The power-of-two size
is not a requirement, but it will usually make Fourier
processing faster.
The default automated baseline correction (POLY -auto) should usually only be used when at least two dimensions of the data have been transformed, so that there is as much empty baseline as possible for automated detection. So, it is common practice to create a processing scheme like this, which processes the first two dimensions, then transposes the data to correct the first dimension:
nmrPipe -in fid/test001.fid \ | nmrPipe -fn POLY -time \ | nmrPipe -fn SP -off 0.5 -end 0.98 -pow 2 -c 0.5 \ | nmrPipe -fn ZF -auto \ | nmrPipe -fn FT -verb \ | nmrPipe -fn PS -p0 43 -p1 0.0 -di \ | nmrPipe -fn EXT -left -sw \ | nmrPipe -fn TP \ | nmrPipe -fn SP -off 0.5 -end 0.98 -pow 1 -c 1.0 \ | nmrPipe -fn ZF -auto \ | nmrPipe -fn FT -verb \ | nmrPipe -fn PS -p0 -135 -p1 180 -di \ | nmrPipe -fn TP \ | nmrPipe -fn POLY -auto \ -verb -ov -out test.ft2Zero-order automated baseline correction (POLY -auto -ord 0) is commonly applied to dimensions that have a missing first point (P1 = 360).
Automated baseline correction should only be applied as needed.
In most cases, only the directly-detected dimension, or dimensions with one-dwell delay (P1=360) need a baseline correction.
NOTE WELL that cases for P1=0, P1=180, and P1=360 can all be handled by the proper combination of first point scaling and zero-order baseline correction. All other cases have the potential to introduce more difficult baseline distortions, as well as phase distortions for folded peaks. For these reasons, great care should be taken to set the acquisition delays to 0, 1/2, or 1 point whenever possible. This also eliminates the need to choose phase correction values manually, which can be very difficult with some spectra.
In the case of digital oversampled data from Bruker instruments, the oversampling correction can result in baseline distortions which are especially problematic for homonuclear 2D and 3D cases or for many 1D 1H spectra. By default, oversampling correction is performed during conversion rather than during processing; when applied in this way, the correction is not ideal, although it has the convenience of creating a result which is ordinary time-domain data, with no artificial leading points. So, in many cases, baselines can be improved when needed by converting the data with the option Digital Oversampling Correction: During Processing selected in the bruker graphical conversion interface.
NOTE WELL that "FT -neg" is NOT exactly the same as simply reversing the spectrum with "REV". "FT -neg" is equivalent to REV followed by a one-point circular shift. So, if REV is applied instead, the PPM calibration will be off by one point.
Of course, the "-neg" or "-alt" arguments should not be used if the data are not reversed or left/right swapped.
Common nmrPipe Processing Functions
The following is an alphabetical list of the most common nmrPipe processing functions used in the examples.
EXT Extract Region
Extracts a region from the current dimension with limits
specified by the arguments -x1 and -xn; the limits can be
labeled in points, percent, Hz, or PPM. Alternatively, the left or right
half of the data can be extracted with the arguments -left and -right.
FT Fourier Transform
Applies a complex forward or inverse Fourier transform,
with sign alternation for first half/second half rotated data (-alt)
or complex conjugation for reversed data (-neg).
HT Hilbert Transform
Performs a Hilbert transform to reconstruct imaginary data.
LP Linear Prediction Extrapolation
By default uses forward LP method to extend the data to twice its
original size via 8 complex coefficients.
The number of predicted points can be adjusted via the -pred option,
and the number of LP coefficients is specified by argument -ord.
Mixed forward-backward LP is performed if the -fb argument is used.
Mirror-image LP for data with no acquisition delay is performed if the argument
-ps0-0 is used; mirror-image LP for data with a half-dwell acquisition
delay is performed if the argument -ps90-180 is used.
MEM Maximum Entropy Reconstruction with Deconvolution
Applies Maximum Entropy reconstruction according to the method
of Gull and Daniell: argument -ndim specifies the number of
dimensions to reconstruct, arguments -pos and -neg are used to choose between all-positive mode and two-channel mode for reconstruction of
data with both positive and negative signals.
Argument -sigma specifies the estimated standard deviation of the noise in the time-domain.
Argument -alpha specifies the fraction of a given iterate
which will be added to the current MEM spectrum.
Arguments -xconv -xcQ1 ... and -yconv ... etc. specify
deconvolution in
the form of an nmrPipe window function such as EM (exponential
multiply) or an NMRPipe-format file which contains a deconvolution kernel.
Other arguments can be
used to optimize convergence speed, or to increase stability for reconstruction
of data with high dynamic range.
POLY Subtract a Polynomial for Baseline Correction
(frequency-domain)
Applies polynomial baseline correction of
the order specified by argument -ord, via an automated baseline
detection method when used with argument -auto. The default is a fourth
order polynomial. The automated baseline mode works as follows: a copy of a
given vector is divided into a series of adjacent sections, typically 8 points
wide. The average value of each section is subtracted from all points in that
section, to generate a "centered" vector. The intensities of the
entire centered vector are sorted, and the standard deviation of the noise is
estimated under the assumption that a given fraction (typically about 30%) of
the smallest intensities belong to the baseline, and that the noise is normally
distributed. This noise estimate is multiplied by a constant, typically about
1.5, to yield a classification threshold. Then, each section in the centered
vector is classified as baseline only if none of the points in that section
exceeds the threshold. These classifications are used to correct the original
vector.
POLY Subtract a Polynomial for Solvent Suppression
(time-domain)
When used with the argument -time, fits
all data points to a polynomial, which is then subtracted from the original
data. It is intended to fit and subtract low-frequency solvent signal in the
FID, a procedure which often causes less distortion than time-domain convolution
methods. By default, a fourth order polynomial is used. For speed, successive
averages of regions are usually fit, rather than fitting all of the data points.
PS Phase Correction
Applies the zero and first order phase corrections as specified
in degrees by the arguments -p0 and -p1.
PS is commonly applied with the generic nmrPipe option -di
which deletes imaginary data in the current dimension after processing.
SOL Solvent Suppression by Convolution Subtraction
Suppresses solvent signal by subtracting the results of a moving average
filter with a default window of +/- 16 points.
SP Sine Bell Window with Adjustable Phase
Applies a sine-bell window extending from sinr(a*PI) to
sinr(b*PI)
with offset a, endpoint b, and exponent r specified by arguments -off,
-end, and -pow, first-point scaling specified by argument
-c. The default length is taken from the recorded time-domain size of
the current dimension. By default, a = 0.0, b = 1.0, r = 1.0 (sine bell), and
the first point scale factor is 1.0 (no scaling). In most examples,
-off 0.5 is used to generate a cosine-like window which starts
at height 1.0, while values for -end are usually around 0.95; settings
of -end 1.0 are avoided, because this would result in a window function
with the last point equal to zero. This effectively destroys information
in one point of the given dimension, and also makes inverse processing
problematic, since it will no longer be possible to divide data by the
original window function.
TP 2D X/Y Transpose
Exchanges vectors from the X-axis and Y-axis of the data stream,
so that the resultant data stream consists of vectors from the Y-axis of the
original data.
ZF Zero Fill
Pads the data with zeros; the amount of padding can be specified
by argument -zf, which defines the number of times to double the data
size, or by the argument -size, which specifies the desired complex size
after zero filling. By default, the data size is doubled by zero filling.
Note that data should always be at least doubled by zero fill before FT,
to prevent loss of information when imaginary data is deleted.
The argument -auto will cause the zero-fill size to be rounded up to
the nearest power of two for faster FT.
ZTP 3D X/Z Transpose
Exchanges vectors from the X-axis and Z-axis of the data stream,
so that the resultant data stream consists of vectors from the Z-axis of the
original data.
Conversion, Processing, Peak Detection: a example using 3D HNCO Data
Directory: pipe/demo/hnco
Pseudo-3D Analysis; Quantification of a 2D Relaxtion Series
Directory: pipe/demo/relax
sethdr ft/test001.dat -tau 8
Output
nlin.tab Results of pseudo-3D Gaussian fitting. sim/test%03d.ft2 Simulated Spectrum Series from pseudo-3D fitting. dif/test%03d.ft2 Residual Spectrum Series from pseudo-3D fitting. txt/fit*.txt X/Y Tables for each evolution curve. mod/fit*.tab Output tables of evolution curve fitting results. gnu/fit*.gnu Gnuplot plot commands for each curve. plot/fit*.ps PostScript output of the fit results. autoFit.com Created by autoFit.tcl to run seriesTab and nlinLS. modelExp.com Created via modelExp.tcl to fit each evolution.
How to adjust a peak parameter in NMRDraw
Peak Detection/Edit
This enables simple editing of the peak table in memory. There must already be a current peak table, either from "Peak Detection/Read" or "Peak Detection/Detect". When the "Edit" option is selected, the mouse can be used to insert or delete peaks, or to modify parameters. As with other mouse modes, the mouse button functions are given in the top border of nmrDraw.
NOTE! Once a peak table is edited, it should be saved using the "Write" option.
In the peak editing mode, the [Left] mouse button inserts a peak, the [Middle] mouse button can be used to adjust a peak value, and the [Right] mouse button will delete the nearest peak.
When using the middle mouse button to adjust a peak value, the value to be adjusted is the one selected for the label. By default, the peak "INDEX" number is displayed as a label. This can be changed through the "Variables" menu in the "Peak Detection" pop-up. For example, if the variable "ASS" is selected from the "Variables" menu, the existing assignments will be displayed, and these can then be modified. The "CLUSTID" value can also be displayed and modified this way.
After clicking the middle mouse button over the peak to
change, type
The keyboard commands "[" and "]" can be used
to toggle the display of peak labels on and off.
Reconstruction of Non-Uniform Sampling (NUS) Data
Directory: pipe/demo/nusdemo2d
In a conventional 2D experiment, the indirect dimension
would be acquired with a set of N uniformly-spaced time increments
in increasing order:
In a non-uniform sampling scheme, one or more increments are skipped
(not measured at all) during the acquisition. And, in common NUS schemes,
the increments are measured in random order. So, in a typical NUS
acquisition scheme, there is a sampling schedule in
the form of a text file which lists the increments
which were measured, in the order that they are recorded. One
example of such a text file (here, hsqc.hdr_3) is shown below.
In this example, a subset
of 128 increments out of 512 possible increments are recorded
in (pseudo) random order:
In NMRPipe NUS schemes, sampling schedule files like the one above
are used to re-shuffle and expand the original NUS time-domain data so that
the measured increments are in increasing order as in conventional
uniform data, with missing increments filled by zeros.
In addition, NMRPipe NUS schemes also create a sampling schedule
kernel file (here, prof.y). This is an NMRPipe-format
data file which contains
1.0 in every position where an increment was measured, and 0.0 in
every position where an increment was skipped. This file can
be used as input for Maximum Entropy deconvolution or Maximum
Likelihood analysis:
TALOS Demo: Chemical Shift Database Search for Phi/Psi Angles
See Also: the TALOS Web Site
Directory: pipe/demo/talos
Refinement of an Existing Structure to HN/N Dipolar Couplings
See Also: the DYNAMO Web Site
This example was kindly provided by Prof. Jaison Jacob, then
at Vanderbilt University.
In this example, an existing structure is refined against a set of HN-N
dipolar couplings. The initial structure agrees to only ~7 Hz RMSD with the
dipolar couplings, but the refined structure to better than 1 Hz RMSD.
Interestingly, the backbone of the refined structure is less than 0.3A RMSD
from the initial structure; in this case, only a small change in the
structure is needed to substantially improve the dipolar coupling agreement.
In order to use the DYNAMO structure calculation environment on a given
molecular system, we first must use the tools of DYNAMO to create tables
describing the covalent geometry of the molecules involved.
Then we must create a PDB file with a complete set of atoms in the
DYNAMO nomenclature. In this example, we start with a PDB file
produced by some other molecular analysis software. So, in the first
steps, we use the tools of DYNAMO to read the protein sequence
information from the given PDB file, and to create a complete
DYNAMO PDB file whose structure is refined to mimic the structure
in the given PDB file. Then, this DYNAMO PDB file is refined
using HN-N dipolar couplings.
The specific steps in the demonstration are:
Conventional Structure Calculation with NOEs and Dipolar Couplings
See Also: the DYNAMO Web Site
Molecular Fragment Replacement
Directory: pipe/demo/mfr
The MFR method determines elements of protein structure by finding
small fragments (5-15 residues) in the PDB database whose simulated
dipolar couplings and shifts match those measured for the target
protein. These small homologous fragments can then be used in various ways
to reconstitute larger elements of protein structure.
This demo uses several types of dipolar couplings measured in two
alignment media, which allows fragments to be assembled into larger
structures of 10-50 residues or more.
The steps in this demo are:
time point = k*dT + delay
where dT is the time increment, delay is the
initial acquisition delay, and k is the increment number
which goes from 0 to N - 1. So, we can represent
the measurement scheme for the indirect dimension as a
list of N increment numbers.
For a conventional data set with 512 indirect points,
this would simply correspond
to the list 0 1 2 3 4 ... 509 510 511
.
0 511 30 20 1 26 79 12 32 41
4 70 40 29 7 59 34 74 77 53
2 15 57 27 55 8 33 13 16 10
61 17 106 37 43 113 64 66 31 95
6 25 52 3 42 51 24 60 44 49
11 39 50 129 48 18 140 68 23 93
117 54 5 81 78 112 107 38 9 14
76 75 72 90 141 116 19 46 114 128
94 56 127 103 21 92 28 161 125 73
62 22 158 58 35 191 45 176 121 36
163 165 65 99 152 102 85 89 47 67
154 217 151 145 206 98 171 82 108 149
109 164 131 138 91 84 63 101
nusSort.tcl -in test.fid -sample hsqc.hdr_3 -out sort.fid -expand -prof prof.y
nmrPipe -in sort.fid \
| nmrPipe -fn POLY -time \
| nmrPipe -fn ZF -auto \
| nmrPipe -fn FT \
| nmrPipe -fn PS -p0 -52 -p1 0.0 \
| nmrPipe -fn EXT -x1 10.5ppm -xn 5.5ppm -sw -verb \
| nmrPipe -fn FT -inv \
| nmrPipe -fn ZF -inv \
| nmrPipe -fn MEM -sigma 200 -report 2 -ndim 2 \
-xzf 1 -xconv EM -xcQ1 12 \
-yzf 1 -yconv FILE prof.y \
-out deco.ft2 -ov
cd
cd pipe/demo/talos
clean.com
more valpha.tab
talos.tcl -in valpha.tab
(OR for if needed, use the fast version):
vina.tcl -in valpha.tab -ref valpha.pdb -AUTO
rama.tcl -in valpha.tab -ref valpha.pdb -sd -ras
(exit rasmol if needed)
talos2xplor.tcl -cs valpha.tab -pdb valpha.pdb > xplorTorsion.tbl
more xplorTorsion.tbl
DC -inCS valpha.tab -outCS csCalc.tab -pdb valpha.pdb -verb
more csCalc.tab
showCS.tcl -in csCalc.tab
cd
cd pipe/demo/dchn
clean.com
more all.com
all.com
(quit all graphs, quit rasmol)
cd
cd pipe/demo/ubiq
clean.com
more README
more init.com
init.com
(exit rasmol)
ls ubiq.gmc
(edit simpleSA.tcl ... change the initial random number "54321")
simpleSA.tcl
(exit rasmol)
ls ubiq.gmc
ov.tcl -r1 2 -rN 72 -ref 1ubq.pdb -in ubiq.gmc/dyn*pdb
rasmol overlay.pdb
(exit rasmol)
set goodList = (`pdbSelect.tcl -n 5 -noe 0.1 -pdb ubiq.gmc/dyn*pdb`)
ov.tcl -r1 2 -rN 72 -ref 1ubq.pdb -in $goodList
rasmol overlay.pdb
scrollRama.tcl -pdb 1ubq.pdb $goodList
cd
cd pipe/demo/mfr
clean.com
ls
more README
more dObsA.tab
more ext.com
ext.com
(exit rasmol)
mfr.tcl -excl 1ubq -csThresh 2.5
more mfr.tab
plotTab.tcl -in mfr.tab -x A_DA -y A_DR
plotTab.tcl -in mfr.tab -x D_RES1 -y A_DA B_DA -yMax 0.0
mfr2init.tcl
rasmol init.pdb
scrollRama.tcl -pdb ref.pdb init.pdb -mfr mfr.tab