Shape Recovery
From SASTBX Wiki
In case there is available model, we used refinement methods (Structural Refinement) to tweak the model agree with the experimental data.
When there is no existing model fitting to the data, we need to use shape recovery method mentioned in this section to get some plausible models.
Again, type:
sastbx.shapeup
to see the help information:
Usage:
sastbx.shapeup <target=target.iq> [rmax=rmax nmax=nmax scan=True*/False buildmap=True*/False pdb=pdbfile path=database_path]
The intensity profile is the only required input file (in theory)
Optional control parameters:
rmax : radius of the molecule (default: guessed from Rg)
nmax : maximum order of the zernike polynomial expansion (<=20 for precomputed database; 10 is the default)
qmax : maximum q value, beyond which the intensity profile will not be considered (default 0.20)
path : path to the database (this MUST be correct to execute the searching)
buildmap : build electron density map in xplor format, all the map will be aligned
pdb : any pdb model to be compared, and the maps will be aligned to the first pdb file
prefix : the output prefix
Lysozyme experimental data will be used to go through the 'easy' process:
sastbx.shapeup target=lyso.dat
On the output screen, you will see the following:
#phil __OFF__
-------------------Searching the protein DATABASE for similar shapes-------------------
#phil __ON__
query {
target = "lyso.dat"
nmax = 10
pdb_files = None
qmax = 0.2
q_level = 0.01
q_background = None
rmax = None
scan = True
prefix = "query"
dbpath = None
db_choice = *pisa piqsi allpdb user
db_user_prefix = "mydb"
buildmap = True
calc_cc = True
smear = True
weight = *i s
delta_q = None
ntop = 10
fraction = 0.9
}
#phil __END__
ATTENT: database path was set to default:
>>>> dbpath = /data/users/hgliu/CCTBX/Sources/sastbx/database/ <<<<
Performing Rg/Io analyses at various truncation levels ...
==== Estimates of Rg and Io ====
Analyses of Rg estimates at various data truncation levels suggests
Rg : 15.26
I0 : 8.25e-04
No evidence for aggregation / structure / bad data found
delta_q is 0.01
LEVEL=0.010000 and Q_MAX=0.200000
10 elements, 1 clusters, @cutoff=0.800000
( ( ( ( ( ( ( 7 6 ) ( 8 0 ) ) 4 ) 5 ) 3 ) ( 9 1 ) ) 2 ) 0.869020714536 0.92537346528 0.814541632365 0.110831832914
Rank pdb_code
1 3I0P
2 1QYN
3 1SRU
4 3BFN
5 2OZE
6 2NPI
7 1VKK
8 1L5X
9 1N0W
10 2ZDO
Volume is 19725.4568601 +/- 1014.0184793 (A^3)
total time used: 82.3 (seconds)
There will be 10 models generated, and the corresponding intensity profiles will be saved to query_1.iq, query_2.iq, ..., query_10.iq. The correlation between the models are saved to query.cc the clustering information is already printed on the screen:
10 elements, 1 clusters, @cutoff=0.800000 ( ( ( ( ( ( ( 7 6 ) ( 8 0 ) ) 4 ) 5 ) 3 ) ( 9 1 ) ) 2 ) 0.869020714536 0.92537346528 0.814541632365 0.110831832914
indices of the models (0->model number 1, et al) average_cc maximum_cc mininum_cc sigma_cc
Here the average_cc, maximum_cc, minimum_cc, and sigma_cc are from the statistics of pairwise cc's between the models.
All the models are aligned to the first model : m1_3I0P.xplor , in CNS or Xplor format.
Next, let's try something more interesting:
sastbx.shapeup target=lyso.dat pdb=6lyz.pdb
Then you will see a few more lines in the standard output (screen) :
Compared to the PDB model (6lyz.pdb) mean cc: 0.86115 first cc: 0.90917 best cc: 0.90917 worst cc: 0.79144 Rmax: estimated vs PDB 27.6590554738 24.7719276575
This time, there will be one more output file: query_cc2pdb.dat
In which you will see the correlation coefficient between the recovered model and the given PDB model:
0.909168166512 0.842904565586 0.791441995411 0.872753631895 0.877494048353 0.854825622068 0.875184854363 0.847651498317 0.871085813499 0.868958504947 mean: 0.86115
At nmax=10, if cc>=0.82, the models are similar (of course, the larger the cc is, the more similar the models are).
The estimated Rmax is slightly larger than the PDB model: 27.66 vs 24.77. Considering the solvation layer, this is within reasonable range.
Now, how do the models compared to the PDB model (exactly or visually):
The average Model vs PDB model:
The number_1 Model vs PDB model:
There are a couple of other things I have not mentioned:
(1) Intensity comparison:
this can be easily compared using any plotting program, i.e., xmgrace, gnuplot
with xmgrace
xmgrace -log y lyso.dat query*.iq &
You will see (after some manipulation of the plot, not the data):
(2) Rmax optimization:
the optimization process is saved to query.rmax__profile
24.5627046727 0.0749562612054 32.6690563112 0.104924403898 19.5527038353 0.161048008423 27.6590554738 0.0736216196808 29.57270551 0.0783059265996 26.4763547089 0.0741611395068 28.3900047451 0.076795567585 27.2073039801 0.0748940314019 27.9382532514 0.0758594621123 27.4865017577 0.0748066534758 27.7656995353 0.0739345916335
The left column is Rmax, and the right column is the chi-score at that Rmax.
(3) recovered model comparison and clustering
In query.cc, you can see the pair-wise cc's between any models
Based on this information, the models are clustered (with default cutoff, cc=0.80).
the clustering can be visualized in two format (a) tree;
and (b) distance mapping
The images are not generated by sastbx package, and an external program neato has to be used to get them.
The scripts are query.tree and neato.dot.
If you have neato installed, type:
neato -Tpng < query.tree > tree.png neato -Tpng < neato.dot > neato.png
You will be able to see the tree and map shown above.
Check out more about neato at http://graphviz.org/
More optional to sastbx.shapeup can be found in shapeup extra.
Theoretical Intensity Calculation and Comparison to experimental data
Shape Recovery

