Quantum|Refinement: final-stage refinement with restraints from Quantum Chemistry.

Authors

Min Zheng, Pavel Afonine, Mark Waller, Nigel Moriarty

Purpose

qr.refine is a command line tool for refining bio-macromolecules using restraints from Quantum Chemistry (QM).

Usage

Q|R is a new open-source module that carries out refinement of bio-macromolecules. To maintain a small and agile code-base, qr is built on top of cctbx and Terachem. The cctbx library provides most of the routines needed for x-ray refinement. The key feature of the qr code is that it interfaces to Terachem to obtain chemical restraints using ab initio methods.

In principle, qr.refine only needs a data file (e.g. mtz) and a model (e.g. pdb):

qr.refine input.pdb input.mtz

Sensible default options are selected and can be viewed with qr.refine --defaults

Examples

The default restraints are the classical cctbx. These restraints are taken from the parameterized library.

QM interface

List of QM interfaces

Examples and Tips

checking the gradient from the clustering approach

Literature

https://journals.iucr.org/d/issues/2017/01/00/lp5021/lp5021.pdf

https://journals.iucr.org/d/issues/2017/12/00/lp5024/lp5024.pdf

List of all available keywords

max_atoms = 15000 maximum number of atoms
debug = False flag to control verbosity of output for debugging problematic code.
restraints = cctbx *qm
output_file_name_prefix = None
output_folder_name = "pdb"
shared_disk = True this is deprecated no because we now only use the parallel map.
rst_file = None Restart file to use for determining location in run. Loads previous results of weight calculations.
dump_gradients = None used for debugging gradients when clustering.
input
- sequence = None
- scattering_table = wk1995 it1992 *n_gaussian neutron electron
- wavelength = None
- energy = None
- twin_law = Auto Enter twin law if known.
- xray_dataScope of X-ray data and free-R flags
  - file_name = None
  - labels = None
  - high_resolution = None
  - low_resolution = None
  - outliers_rejection = True Remove basic wilson outliers , extreme wilson outliers , and beamstop shadow outliers
  - french_wilson_scale = True
  - sigma_fobs_rejection_criterion = None
  - sigma_iobs_rejection_criterion = None
  - ignore_all_zeros = True
  - force_anomalous_flag_to_be_equal_to = None
  - convert_to_non_anomalous_if_ratio_pairs_lone_less_than_threshold = 0.5
  - french_wilson
    - max_bins = 60 Maximum number of resolution bins
    - min_bin_size = 40 Minimum number of reflections per bin
  - r_free_flags
    - file_name = None This is normally the same as the file containing Fobs and is usually selected automatically.
    - label = None
    - test_flag_value = None This value is usually selected automatically - do not change unless you really know what you're doing!
    - ignore_r_free_flags = False Use all reflections in refinement (work and test)
    - disable_suitability_test = False
    - ignore_pdb_hexdigest = False If True, disables safety check based on MD5 hexdigests stored in PDB files produced by previous runs.
    - generate = False Generate R-free flags (if not available in input files)
    - fraction = 0.1
    - max_free = 2000
    - lattice_symmetry_max_delta = 5
    - use_lattice_symmetry = True
    - use_dataman_shells = False Used to avoid biasing of the test set by certain types of non-crystallographic symmetry.
    - n_shells = 20
- pdb
  - file_name = None Model file(s) name (PDB)
- monomers
  - file_name = None Monomer file(s) name (CIF)
- maps
  - map_file_name = None A CCP4-formatted map
  - d_min = None Resolution of map
  - map_coefficients_file_name = None MTZ file containing map
  - map_coefficients_label = None Data label for complex map coefficients in MTZ file
cluster
- charge_cutoff = 8.0 distance for point charge cutoff
- clustering = False enable/disable clustering
- charge_embedding = False point charge embedding
- two_buffers = False two-buffers are used when gradients for the whole system do not match the joined gradients from fragments when only as single buffer was used to surround each clusters.
- maxnum_residues_in_cluster = 15 maximum number of residues in a cluster
- clustering_method = gnc *bcc type of clustering algorithm
- altloc_method = *average subtract two strategies on how to join energies from multiple energy and gradient calculations are performed for alternate locations.
- g_scan = 10 15 20 sequence of numbers specifying maxnum_residues_in_cluster for gradient convergence test (mode=gtest), treat as string!
- g_ref = None name of the previously save gradient file that is to be used as reference
- g_mode = None manual control over gradient test loops (1=standard, 2=standard + point-charges, 3=two buffer 4=two_buffer+ point-charges)
quantum
- engine_name = *mopac ani torchani terachem turbomole pyscf orca gaussian xtb choose the QM program
- charge = None The formal charge of the entire molecule
- basis = Auto pre-defined defaults
- method = Auto Defaults to HF for all but MOPAC (PM7), xTB (GFN2) and TorchANI (ani-1x_8x)
- memory = None memory for the QM program
- nproc = None number of parallel processes for the QM program
- qm_addon = gcp dftd3 gcp-d3 allows additional calculations of the gCP and/or DFT-D3 corrections using their stand-alone programs
- qm_addon_method = None specifies flags for the qm_addon. See manual for details.
refine
- dry_run = False do not perform calculations, only setup steps
- sf_algorithm = *direct fft algorithm used to compute structure factors, either the direct method or fast fourier transform.
- refinement_target_name = *ml ls_wunit_k1
- mode = opt *refine gtest choose between refinement, geometry optimization or gradient test
- number_of_macro_cycles = 1 number of macro cycles used in the refinement procedure.
- number_of_weight_search_cycles = 50
- number_of_refine_cycles = 5 maximum number of refinement cycles
- number_of_micro_cycles = 50 maximum number of micro cycles used in refinement
- data_weight = None
- choose_best_use_r_work = False
- skip_initial_weight_optimization = False
- adjust_restraints_weight_scale_value = 2
- max_iterations_weight = 50
- max_iterations_refine = 50
- use_ase_lbfgs = False used for debugging the lbfgs minimizer from cctbx.
- line_search = True flag to use a line search in minimizer.
- stpmax = 3 maximum step length, empirically we find 3 for cctbx, but 0.2 is better for QM methods.
- gradient_only = False use the gradient only line search according to JA Snyman 2005.
- update_all_scales = True
- refine_sites = True only refine the cartesian coordinates of the molecular system.
- refine_adp = False adp refinement are not currently supported.
- restraints_weight_scale = 1.0
- shake_sites = False
- use_convergence_test = True
- max_bond_rmsd = 0.03
- max_r_work_r_free_gap = 5.0
- r_tolerance = 0.001
- rmsd_tolerance = 0.01 maximum acceptable tolerance for the rmsd.
- opt_log = False additional output of the L-BFGS optimizer
- pre_opt = False pre-optimization using steepest decent (SD) and conjugate gradient (CG) techniques w/o line search
- pre_opt_stpmax = 0.1 step size
- pre_opt_iter = 10 max. iterations for pre-optimizer
- pre_opt_switch = 2 max. iterations before switching from SD to CG
- pre_opt_gconv = 3000 gradient norm convergence threshold for pre-optimizer
parallel
- method = *multiprocessing slurm pbs sge lsf threading type of parallel mode and efficient method of processes on the current computer. The others are queueing protocols with the expection of threading which is not a safe choice.
- nproc = None Number of processes to use
- qsub_command = None Specific command to use on the queue system