**Instructions
for using the code for protein structure refinement against SAXS data Using a
FastSAXS module**

**
***Alexander Grishaev*

*(alexanderg@intra.niddk.nih.gov)*

*Sep 3, 2009*

*
Program names are in bold, input file names in
bold/italic*

This mode of refining the structure against SAXS data achieves a fast
calculation of the scattering data predicted from the structural model by
approximating an exact average of molecular orientations with a discrete sum
over the set of equi-distant directions. General procedures for data processing
and preparation prior to their input to the structural refinement modul are
listed below. In the FastSAXS mode the complex scattering amplitude and its
approximate angular average are used instead of the Debye formula. Therefore,
globbic correction factors are not required as the summation includes all heavy
atoms. The setup is illustrated in the approximate_angular_averaging archive.
The approach requires calculation of the form factors for the heavy atoms in the
macromolecule and surface water corrections using the procedure described below.
The required input parameters are: number of fitted data points, extrapolated
experimental scattering intensity at zero angle (can be calculated either from a
Guinier plot, or a P(r) analysis), force constant for the fit (usually in the
25-100 range), the grid type (Fibonacci or spiral), and the parameter for the
number of vectors on the grid (number order for the Fibonacci grid or number of
vectors for the spiral grid). For the Fibonacci grid, values of 10-12 should
provide an accurate representation for the scattering data up to q_{max}=0.3-0.4A^{-1}.
For the spiral grid, values of ~100 produce a similar quality of the predicted
scattering data. The last parameters are the chain name and residue range for
the segments active in the scattering data calculation. Scripts and programs
used for structure refinement against SAXS data are listed in
the archive.

**General comments on data collection, processing and analysis.**

Perfect match between the sample and buffer is essential for an accurate
interpretation of the subtracted scattering data. Using the buffer the protein
powder was dissolved in is not acceptable as minute salt concentration
differences will cause a non-negligible difference between the solvent and
buffer baselines. Acceptable ways to match solvent with the sample are (i) an
extensive dialysis or (ii) passing of multiple fractions of solvent though the
sample with ultracentrifugation though a semi-permeable membrane that keeps the
solute inside the centrifugation unit. The latter method is recommended in case
of unstable samples. The mismatch between the sample and buffer can only be
detected if precise wide-angle data (q_{max} at least ~2.2 A^{-1})
are recorded. When such data are not available, the effects of mismatch can
include inaccurate determination of Rg and I(0). In cases of over-subtracted
data (that decays too rapidly with increasing q relative to the properly
recorded data) the sample might appear aggregated from the (P) analysis.

Before interpreting SAXS data, you have to convince yourself that your data corresponds to an isolated-particle form factor, free from any inter-particle correlation effects or aggregation/oligomerization. Several methods can be used to determine whether this is the case:

The absence of concentration dependence of the scattering data at low q (as the concentration is decreased by 50%, 75%, etc). indicates the absence of structure factor. If there is concentration dependence, the data can be linearly extrapolated to zero concentration using 3-6 concentration points going as low as the signal allows – 0.2-0.5 mg/mL on a synchrotron or 2-3 mg/mL on a lab source. A possible alternative it to find a concentration (not necessarily the lowest one in the series) which gives data indistinguishable from the lower concentrations, and use it for further analysis. Salt can also be added (100-200 mM) to suppress the electrostatic interaction between the molecules. This may lead to increased aggregation as well, so one has to be careful. In 150 mM salt buffer, absence of structure factor is often reached at ~3-5 mg/mL for proteins and 0.5-1 mg/mL for DNA/RNA. Data at higher concentrations are likely (especially for RNA/DNA) to contain an effect of the structure factor.

A possibility of aggregation or oligomerization equilibrium in the sample has to
be ruled out. Linearity of the Guinier plot (up to q*Rg = 1.3) or smooth decay
to zero of the P(r) at d_{max} can be used to rule out aggregation. In
the opposite case, when the tail of P(r) keeps extending as d_{max} is
increased, non-specific aggregation is likely. If the amount of dimer is small
(below 10%), the effects are more subtle. The P(r) may still go to zero nicely,
but the d_{max} and R_{g} are going to be higher than expected
from the monomeric form. These effects are easier to detect only when the amount
of dimer is large enough to yield parameters that are clearly incompatible with
the monomeric ones. In some cases the inter-particle interference and
aggregation can mask each other, making Rg appear normal at a particular
concentration (they produce opposing effects on the low-q data). One method is
to record scattering data of a standard with accurately known concentration
(could be lysozyme, you’ll have to do zero-concentration extrapolation for it
and watch out for radiation damage at synchrotron intensities), accurately
measure the concentration of your protein (could be UV-VIS at denaturating
conditions), and use the fact that I(q=0) should be proportional to
MW*concentration (mg/mL). If your experimental I(0) value is substantially
higher that predicted, your sample may be (at least, partially) aggregated. This
method is quite sensitive because I(0) is the most precise parameter from the
scattering data, but it takes more time.

At this point I am assuming that the data are free from both structure factor and aggregation, and are consistent with the modeled oligomerization state. I will also assume that the input data are not smeared by wavelength or geometry. It means that this software would be inapplicable to the neutron scattering data (polychromatic radiation). Any geometry smearing, such as due to the slit shape of the incident beam, need to be corrected for before structure refinement (for example using GNOM software). In summary, I am assuming that the input data comes from a monochromatic source in pinhole geometry with negligible detector distortions. The software is also inapplicable for very high q data (q> ~1), for which accurate data measurement and predictions become difficult.

**fsglob_prot.f** program is used to obtain the form factors of each heavy
atom at every experimental q value in the fitted data file. For this the
following is needed: a family of structures (typically the best structures
before SAXS data fit, not the initial random coil/extended strand/etc models);
atomic form factors in the 4-Gaussian parameterization (** formfact.inp**,
included); solvent volumes displaced by each atom (

After this is done, the surface water layer correction will need to be computed.
This is done by running **CRYSOL 2.5** (no explicit hydrogens) in a
particular way, described below. Basically, the current structure is fitted to
the experimental data by CRYSOL twice: first, when forcing drho=0.0 and second,
when letting drho float within its default range. All other adjustable
parameters (average atomic radius and total excluded volume) are kept fixed at
their default values. This is what I do after I start the program:

Read in the PDB file (option 0)

Order of harmonics: **50
**Order of Fibonacci grid:

Fit the experimental curve: press

Angular units: q units should be A

Solvent electron density – press

Plot the data: press

Minimize again with new limits: type

Minimum radius: type in whatever it shows as the average atomic radius (see in the program output above, should be close to 1.616 A for proteins).

Maximum atomic radius: repeat the same value.

Smax in the fitting range: press

Minimum contrast in the shell: press

Maximum contrast for the shell: here, you’ll have to run the program twice. Once you enter

Minimum excluded volume: enter the value reported by CRYSOL as

Maximum excluded volume: re-enter the same value.

Plot the fit: press

The two *.fit files that CRYSOL will produce (while forcing dr=0.0 and while
setting it free) will be used to extract the fitted curves produced by these
runs (I use Excel for this). The fitted curve is the last column of the *.fit
file. I normalize, in Excel, each of these two curves *by the fitted values at
q=0.0*. Be careful here since, in addition to predicting the q=0.0 value
(which you will need), CRYSOL will also predict intensities between q=0.0 and
the q_{min} of your data file. So I normalize the curves first and then
discard the lines corresponding to the spurious q values. The end result will be
two curves that each start at 1.00 for q=0.0. and go to 10^{-2}-10^{-4}
at the high end of the typical q range. The dr=0.0 curve will need to be divided
by the “floating dr” curve, point by point, and generate a two-column “bound
water correction” file that has q as the first column and the ratio as the
second column. If this is done properly, the correction will start at 1.00 at
q=0.0 and will eventually reach values larger than 1.0 at the high end of the
q-range and contain only the q values that are present in the experimental (sparsened)
data file. This file will be needed as input to the Xplor/CNS structure
refinement. This procedure will have to be repeated after every cycle of
structure refinement. Keep an eye on the fitted drho value. As it converges, the
surface water correction will converge as well. If the structure was
over-expanded to begin with, you may see fitted dr starting at 0.0 and
increasing to higher values after subsequent rounds of structure refinement. In
this case, you will also see a decrease in the gyration radius of the
macromolecule. The final bound layer density is often close to 1.1r_{o}.
When starting from an over-expanded model (and, thus from dr=0) one can end up
with artificially low final values for dr. In these cases dr can be fixed at 1.1
r_{o} from the beginning.

A. Grishaev, J. Ying, M. Canny, A.
Pardi,^{ }& A. Bax “Solution structure of tRNA^{Val} from joint
refinement against dual alignment residual dipolar couplings and SAXS data”,
*J. Biomol. NMR *(2008)* *42, 99-109.

The script including the SAXS term is summarized below with the saxs-related parts highlighted in blue.

*define( md.seed=823641; )
set seed=&md.seed end
parameter @dna-rna-allatom.param @axis.par end
structure @tRNA_val.mtf @tRNA_mod.mtf @axis1.mtf @axis2.mtf end
coor initialize end
coor @trnaval_newpf1_msa_3_1.pdb @tVt7.pdb
do (x=x-400.0) (segi TPHE)*

*! *********************************** SAXS data
***********************************
saxs
ndat = 33 ! number of fitted data points
kfor = 100.0 ! force constant
I0 = 2.32 ! extrapolated zero angle
intercept for the scattering intensity
updf = 50 ! update frequency
ngrd = 90 ! number of vectors on the equi-distant
grid
grid = spir ! grid type - spiral
segm = TVAL ! SEGID included in the saxs term
nmin = 1 ! min RESID included in the saxs
term
nmax = 76 ! max RESID included in the saxs
term
@saxs_files_zeroC_075.dat ! control file for the
data and correction terms
end*

*... *

* flags exclude * include bond angl impr vdw ncs noe sani
saxs end ! saxs data term has to be active
igroup interaction (all) (all) weights * 1 end end*

*...*

* evaluate ($filename="trnaval_newpf1_msa_4_1.pdb")
set print=$filename end*

* ...*

*print saxs end ! this will print saxs fitting info
into the header of the pdb file*

*...*

* write coordinates sele= (not segi TPHE ) output =$filename end
stop*

*[
Home ] [
NIH ] [
NIDDK ] [
Disclaimer ] [
Copyright ]*
*last updated: Nov 2009 /
Webmaster*