From the Bax Group at the National Institutes of Health ...
MICS: Prediction of Protein Structural Motifs from NMR Chemical Shifts
As described in the paper:
Identification of helix capping and beta-turn motifs from NMR chemical shifts
Yang Shen, and Ad Bax
J. Biomol. NMR, 52, 211-232 (2012)
RedHat Linux (Fedora Core)/Mac/Win32 version (v 1.00, last updated Nov 23, 2011)
The download archive can be unpacked in unix by:
tar -zxvf mics.tar.Z
or with traditional Windows zip software.
What is MICS?
MICS (Motif Identification from Chemical Shifts) is a hybrid system for empirical prediction of distinct protein structural motifs, including N-terminal and C-terminal helix capping motifs and five types of beta-turns: I, II, I', II' and VIII, using a combination of six kinds (HN, HA, CA, CB, CO, N) of chemical shift assignments for a given residue sequence. Those structural motifs are well known to play a key role in stabilizing protein structure and likely to be important in the protein folding process. The MICS program is developed based on a systematic study of the NMR chemical shift (and amino acid sequence) patterns observed for each type of structural motif by using a database of proteins of known structure and known NMR chemical shifts. The chemical shifts, together with the PDB-extracted amino acid preference of the helix capping and beta-turn motifs, are then used as input data for training an artificial neural network algorithm, which outputs the statistical probability of finding each motif at any given position in the protein.
MICS prediction from NMR chemical shifts has been trained and validated using a recently constructed database. The trained neural networks, contained in the MICS program, provide values ranging from ca 0.85-0.92 for the Matthews correlation coefficient (MCC) of its N-terminal and C-terminal helix capping motif predictions, and from ca 0.67-0.83 for the MCC of its five types of beta-turn predictions, which far exceed that attainable by other bioinformatics sequence analysis (MCC of <0.4-0.5). The trained neural networks were further validated using a smaller database which contains 11 new proteins not present in the training database. The MICS output assigns a normalized probability for each residue to participate in any of the specific motifs, or to be part of a regular element of secondary structure.
The MICS program is implemented in C++ language, and includes a graphical interface to display the prediction results. The graphical interface, called jRAMA+, is implemented in the Java language.
The MICS system comprises two main scripts:
The mics script can be invoked with the -help command-line argument to generate a complete list of options. The jRAMA+ can be also run from a web browser with a proper Java plugin installed (see details here).
Other files of the MICS system include:
Use of MICS is much the same as for the TALOS/TALOS+ program:
Similar to TALOS+, MICS includes a feature that pre-checks chemical shift referencing and possible chemical shift errors
mics -in myshifts.tab -check
It checks the referencing for 13CA, 13CB, 1HA and 13C' chemical shifts, using the empirical correlation between certain sets of chemical shifts data (Wang et al., 2005 J Biol NMR, 32:13-22). The estimated chemical shift referencing offsets, as well as the chemical shifts which largely deviate from their expected ranges, will be printed with the following format:
Chemical shift outlier checking... ... 64 E CB Secondary Shift: -3.800 Limit: -3.765 76 G C Secondary Shift: 4.250 Limit: 1.925 ! Chemical shift referencing checking... Estimated Referencing Offset for CA/CB: 0.795 +/- 0.104 ppm (Size: 66)
Note that (1) a chemical shift referencing correction is likely required whenever the estimated referencing error approaches the average uncertainty in the database chemical shifts (~1.0 ppm for 13CA/CB and 13C' shifts; ~0.3 ppm for 1HA shifts), and/or the estimated referencing error is larger than five times the reported referencing offset uncertainty; (2) chemical shift outliers, which fall far outside (>2-3 times of) the expected range of secondary chemical shifts (and marked by "!"), are unlikely to be correct (or like in the above example correspond to a C-terminal carboxylate instead of a backbone carbonyl) and need to be checked carefully.
MICS also uses an option "-offset" to automatically apply chemical shift offset correction if needed:
mics -in myshifts.tab -offset
mics -in myshifts.tab -iso
Chemical Shift Input Format Used by MICS
MICS use the same format for its chemical shift input as TALOS+. An example portion of the required shift table format is shown below. Full Example: ubiq.tab. Other examples can be found in the mics/demo directory of the MICS installation, or at the MICS Server site, more details regarding the required format can be found at the TALOS+ webpage:
Example shift table (excerpt):
REMARK Ubiquitin input for TALOS, HA2/HA3 assignments arbitrary. DATA FIRST_RESID 1 DATA SEQUENCE MQIFVKTLTG KTITLEVEPS DTIENVKAKI QDKEGIPPDQ QRLIFAGKQL DATA SEQUENCE EDGRTLSDYN IQKESTLHLV LRLRGG VARS RESID RESNAME ATOMNAME SHIFT FORMAT %4d %1s %4s %8.3f 1 M HA 4.23 1 M C 170.54 1 M CA 54.45 1 M CB 33.27 2 Q HN 8.90 2 Q N 123.22 2 Q HA 5.25 2 Q C 175.92 2 Q CA 55.08 2 Q CB 30.76 ...
[ Home ]
[ NIH ]
[ NIDDK ]
last update: May 23 2012 / sy