Shape Recovery

From SASTBX Wiki

Jump to: navigation, search

In case there is available model, we used refinement methods (Structural Refinement) to tweak the model agree with the experimental data.

When there is no existing model fitting to the data, we need to use shape recovery method mentioned in this section to get some plausible models.

Again, type:

 sastbx.shapeup

to see the help information:

 Usage: 
 
  sastbx.shapeup <target=target.iq> [rmax=rmax nmax=nmax scan=True*/False buildmap=True*/False pdb=pdbfile path=database_path]
 
  The intensity profile is the only required input file  (in theory)
 
  Optional control parameters:
    rmax     : radius of the molecule (default: guessed from Rg)
    nmax     : maximum order of the zernike polynomial expansion (<=20 for precomputed database; 10 is the default)
    qmax     : maximum q value, beyond which the intensity profile will not be considered (default 0.20)
    path     : path to the database (this MUST be correct to execute the searching)
    buildmap : build electron density map in xplor format, all the map will be aligned
    pdb      : any pdb model to be compared, and the maps will be aligned to the first pdb file
    prefix   : the output prefix

Lysozyme experimental data will be used to go through the 'easy' process:

 sastbx.shapeup target=lyso.dat

On the output screen, you will see the following:

 #phil __OFF__
 -------------------Searching the protein DATABASE for similar shapes-------------------
 
 #phil __ON__
 query {
   target = "lyso.dat"
   nmax = 10
   pdb_files = None
   qmax = 0.2
   q_level = 0.01
   q_background = None
   rmax = None
   scan = True
   prefix = "query"
   dbpath = None
   db_choice = *pisa piqsi allpdb user
   db_user_prefix = "mydb"
   buildmap = True
   calc_cc = True
   smear = True
   weight = *i s
   delta_q = None
   ntop = 10
   fraction = 0.9
 }
 #phil __END__
   
 ATTENT: database path was set to default:
 >>>>  dbpath =  /data/users/hgliu/CCTBX/Sources/sastbx/database/   <<<<  
 
 Performing Rg/Io analyses at various truncation levels ...
 
  ==== Estimates of Rg and Io ====
  
  Analyses of Rg estimates at various data truncation levels suggests   
  Rg :    15.26
  I0 : 8.25e-04
  No evidence for aggregation / structure / bad data found 
 
 delta_q is  0.01
 LEVEL=0.010000 and Q_MAX=0.200000
   10 elements,    1 clusters, @cutoff=0.800000
 ( ( ( ( ( ( ( 7 6 ) ( 8 0 ) ) 4 ) 5 ) 3 ) ( 9 1 ) ) 2 ) 0.869020714536 0.92537346528 0.814541632365 0.110831832914
 
 Rank  pdb_code
 1 3I0P 
 2 1QYN 
 3 1SRU 
 4 3BFN 
 5 2OZE 
 6 2NPI 
 7 1VKK 
 8 1L5X 
 9 1N0W 
 10 2ZDO 
 Volume is  19725.4568601 +/- 1014.0184793 (A^3)
 total time used:  82.3 (seconds)

There will be 10 models generated, and the corresponding intensity profiles will be saved to query_1.iq, query_2.iq, ..., query_10.iq. The correlation between the models are saved to query.cc the clustering information is already printed on the screen:

     10 elements,    1 clusters, @cutoff=0.800000
  ( ( ( ( ( ( ( 7 6 ) ( 8 0 ) ) 4 ) 5 ) 3 ) ( 9 1 ) ) 2 ) 0.869020714536 0.92537346528 0.814541632365 0.110831832914
   indices of the models (0->model number 1, et al)         average_cc      maximum_cc    mininum_cc    sigma_cc

Here the average_cc, maximum_cc, minimum_cc, and sigma_cc are from the statistics of pairwise cc's between the models.

All the models are aligned to the first model : m1_3I0P.xplor , in CNS or Xplor format.

Next, let's try something more interesting:

 sastbx.shapeup target=lyso.dat pdb=6lyz.pdb

Then you will see a few more lines in the standard output (screen) :

 Compared to the PDB model (6lyz.pdb)
 mean cc:  0.86115
 first cc:  0.90917
 best cc:  0.90917
 worst cc:  0.79144
 Rmax: estimated vs PDB 27.6590554738 24.7719276575


This time, there will be one more output file: query_cc2pdb.dat

In which you will see the correlation coefficient between the recovered model and the given PDB model:

 0.909168166512
 0.842904565586
 0.791441995411
 0.872753631895
 0.877494048353
 0.854825622068
 0.875184854363
 0.847651498317
 0.871085813499
 0.868958504947
 mean:  0.86115

At nmax=10, if cc>=0.82, the models are similar (of course, the larger the cc is, the more similar the models are).

The estimated Rmax is slightly larger than the PDB model: 27.66 vs 24.77. Considering the solvation layer, this is within reasonable range.


Now, how do the models compared to the PDB model (exactly or visually):

The average Model vs PDB model:

The number_1 Model vs PDB model:

There are a couple of other things I have not mentioned:

(1) Intensity comparison:

this can be easily compared using any plotting program, i.e., xmgrace, gnuplot

with xmgrace

 xmgrace -log y lyso.dat query*.iq &

You will see (after some manipulation of the plot, not the data):

(2) Rmax optimization:

the optimization process is saved to query.rmax__profile

 24.5627046727 0.0749562612054
 32.6690563112 0.104924403898
 19.5527038353 0.161048008423
 27.6590554738 0.0736216196808
 29.57270551 0.0783059265996
 26.4763547089 0.0741611395068
 28.3900047451 0.076795567585
 27.2073039801 0.0748940314019
 27.9382532514 0.0758594621123
 27.4865017577 0.0748066534758
 27.7656995353 0.0739345916335

The left column is Rmax, and the right column is the chi-score at that Rmax.

(3) recovered model comparison and clustering

In query.cc, you can see the pair-wise cc's between any models

Based on this information, the models are clustered (with default cutoff, cc=0.80).

the clustering can be visualized in two format (a) tree;

and (b) distance mapping

The images are not generated by sastbx package, and an external program neato has to be used to get them.

The scripts are query.tree and neato.dot.

If you have neato installed, type:

 neato -Tpng < query.tree > tree.png
 neato -Tpng < neato.dot > neato.png

You will be able to see the tree and map shown above.

Check out more about neato at http://graphviz.org/


More optional to sastbx.shapeup can be found in shapeup extra.


Theoretical Intensity Calculation and Comparison to experimental data

P(r) Estimation

Structural Refinement

Shape Recovery

Personal tools