DeePKS-kit Documentation

DeePKS-kit is a program to generate accurate energy functionals for quantum chemistry systems (in connection with PySCF) and periodic systems (in connection with ABACUS), for both perturbative scheme (DeePHF) and self-consistent scheme (DeePKS).

This documentation will currently focus on running DeePKS-kit for periodic systems, i.e., in connection with ABACUS. For molecular systems, we refer the users to DeePKS-kit documentation on GitHub.

Important

The project DeePKS-kit is licensed under GNU LGPLv3.0. If you use this code in any future publications, please cite Yixiao Chen, Linfeng Zhang, Han Wang, and Weinan E. “DeePKS-kit: a package for developing machine learning-based chemically accurate energy and density functional models.” arXiv:2012.14615v2.

Contents

Installation

DeePKS-kit

DeePKS-kit is a pure python library so it can be installed following the standard git clone then pip install procedure. Note that the two main requirements pytorch and ABACUS will not be installed automatically so you will need to install them manually in advance. Below is a more detailed instruction that includes installing the required libraries in the environment.

We use conda here as an example. So first you may need to install Anaconda or Miniconda.

To reduce the possibility of library conflicts, we suggest create a new environment (named deepks) with basic dependencies installed (optional):

conda create -n deepks numpy scipy h5py ruamel.yaml paramiko
conda activate deepks

Now you are in the new environment called deepks. Next, install PyTorch

# assuming a GPU with cudatoolkit 10.2 support
conda install pytorch cudatoolkit=10.2 -c pytorch

Once the environment has been setup properly, using pip to install DeePKS-kit:

$ pip install git+https://github.com/deepmodeling/deepks-kit@abacus

ABACUS with DeePKS enabled

To run DeePKS-kit in connection with ABACUS, users first need to install ABACUS with DeePKS enabled. Details of such installation guide can be found at installation with DeePKS.

DPDispatcher (optional)

While DeePKS-kit has its built-in job dispacther, users are welcome to use DPDispatcher for automatic job submission. The usage of these two types of dispatchers is given in xxx. DPDispacther can simply be installed via

$ pip install dpdispatcher

More details about DPDispacther can be found via DPDispatcher’s documentation.

Getting Started

To give it a shot on a DeePKS-ABACUS sample run, users may try the single water example provided here.

In this example, 1000 structures of the single water molecules with corresponding PBE property labels (including energy and force) have been prepared in advance. Four subfolders, i.e., group.00-03 can be found under the folder systems. group.00-group.02 contain 300 frames each and can be applied as training sets, while group.03 contains 100 frames and can be applied as testing set. More details about the file structures and preparation are introduced at Label preperation.

This sample job can either be run on a local machine or on Bohrium. Users may modify the input files to make it run on various environment following the instruction in Input files preperation. To run this job on a local machine, simply issue:

cd deepks-kit/examples/water_single_lda2pbe_abacus/iter
bash run.sh

To run this job on Bohrium (which uses DPDispacther for job submission and data gathering), simply issue:

cd deepks-kit/examples/water_single_lda2pbe_abacus/iter
bash run_dpdispatcher.sh

Outputs generated during the process are introduced in Important outputs explanation.

Label preperation

System structure file

To train a DeePKS model, users must provide the structures of the interested system(s). Structures can be obtained either from a short AIMD run or by adding structural perturbations on top of an optimized geometry. The structures of the system can be provided via three formats as follows

  • (recommended) grouped into atom.npy

    The shape of atom.npy file is [nframes, natoms, 4]:
    • nframes refers to the number of frames (structures) of the interested system;

    • natoms refers to the number of atoms of the interested system, e.g., for single water system, natoms = 3;

    • the last dimension 4 corresponds to the nuclear charge of the given atom and its xyz coordinates (either in Cartesian form or in fractional form, which needs to be specified with keyword coord_type in scf_abacus.yaml).

Note

If coordinates saved in atom.npy are in unit of Bohr, then lattice_constant should be set to 1 in scf_abacus.yaml. If coordinates saved in atom.npy are in unit of Angstrom, then lattice_constant should be set to 1.8897259886 in scf_abacus.yaml. If fractional coordinates are used, lattice_constant also needs to be set accordingly. See ABACUS documentation for more details.

  • grouped into coord.npy and type.raw

    coord.npy is very similar to atom.npy with the shape [nframes, natoms, 3]. The only difference is that the nuclear charge is not included in the last dimension, which is included in type.raw instead. Note that this format has not been fully tested for periodic systems.

  • single xyz

    Save the xyz coordinate of each frame as single xyz file, e.g., 0000.xyz, 0001.xyz,… Note that this format has not been fully tested for periodic systems.

It should be noted that if the lattice vectors of each frame are not the same, users should specify the lattice vector for each frame via box.npy, of which the shape is [nframe, 9]. If the prepared structures share the same lattice vector, then users may specify it as a keyword in input files. See scf_abacus.yaml for details.

Property labels

To train a DeePKS model, the target energy of the interested system is required, and its format should follow the format of the structure file. Additional properties can also be trained, including force, stress, and bandgap. Note that the bandgap label corresponds to the energy difference between the valence band and the conduction band for each k-point, and currently this label only works for semiconductors with even numbers of electrons. The formats of structure files (taking atom.npy as an example) and the corresponding formats of various property labels are summarized as follows:

Filename

Description

Shape

Unit

atom.npy

structural file, required

[nframes, natoms, 4]

Bohr or Angstrom or fractional

box.npy

lattice vector file, optional

[nframes, 9]

Bohr or Angstrom

energy.npy

energy label, required

[nframes,1]

Hartree

force.npy

force label, optional

[nframes, natoms, 3]

Hartree/Bohr

stress.npy

virial vector file, optional

[nframes, 9]

Hartree

orbital.npy

bandgap label, optional

[nframes,nkpt,1]

Hartree

Input files preperation

To run DeePKS-kit in connection with ABACUS, a bunch of input files are required so as to iteratively perform the SCF jobs on ABACUS and the training jobs on DeePKS-kit. Here we will use single water molecule as an example to show the required input files for the training of an LDA-based DeePKS model that provides PBE target energies and forces.

As can be seen in this example, 1000 structures of the single water molecules with corresponding PBE property labels (including energy and force) have been prepared in advance. Four subfolders, i.e., group.00-03 can be found under the folder systems. group.00-group.02 contain 300 frames each and can be applied as training sets, while group.03 contains 100 frames and can be applied as testing set. The prepared file structure of a ready-to-run DeePKS iterative traning process should basically look like

_images/deepks_tree1.jpg

scf_abacus.yaml

This file controls the SCF jobs performed in ABACUS. The scf_abacus block controls the SCF jobs after the init iteration, i.e., with DeePKS model loaded, while the init_scf_abacus controls the initial SCF jobs, i.e., bare LDA or PBE SCF calculaiton. The reason to divide this file into two blocks is that after the init iteration, the SCF calculaitons with DeePKS model loaded are sometimes found hard to converge to a tight threshold, e.g., scf_thr = 1e-7. Therefore we might want to slightly loose that threshold after the init iteration. Also, even users need to train the model with force label, there is no need to calculate force during the init SCF cycle, since the init training will include the energy label only.

Below is a sample scf_abacus.yaml file for single water molecule, with the explanation of each keyword. Please refer to ABACUS input file documentation for a more detailed explanation of the input parameters in ABACUS.

scf_abacus:
  # INPUT args; keywords that related to INPUT file in ABACUS
  ntype: 2                    # int; number of different atom species in this calculations, e.g., 2 for H2O
  nbands: 8                   # int; number of bands to be calculated; optional
  ecutwfc: 50                 # real; energy cutoff, unit: Ry
  scf_thr: 1e-7               # real; SCF convergence threshold for density error; 5e-7 and below is acceptable
  scf_nmax: 50                # int; maximum SCF iteration steps
  dft_functional: "lda"       # string; name of the baseline density functional
  gamma_only: 1               # bool; 1 for gamma-only calculation
  cal_force: 1                # bool; 1 for force calculation
  cal_stress: 0               # bool; 1 for stress calculation

  # STRU args; keywords that related to INPUT file in ABACUS
  # below are default STRU args, users can also set them for each group in
  # ../systems/group.xx/stru_abacus.yaml
  orb_files: ["O_gga_6au_60Ry_2s2p1d.orb", "H_gga_6au_60Ry_2s1p.orb"] # atomic orbital file list for each element;
                                                                      # order should be consistent with that in atom.npy
  pp_files: ["O_ONCV_PBE-1.0.upf", "H_ONCV_PBE-1.0.upf"]              # pseudopotential file list for each element;
                                                                      # order should be consistent with that in atom.npy
  proj_file: ["jle.orb"]                                              # projector file; generated in ABACUS; see file desriptions for more details
  lattice_constant: 1                                                 # real; lattice constant
  lattice_vector: [[28, 0, 0], [0, 28, 0], [0, 0, 28]]                # [3, 3] matrix; lattice vectors
  coord_type: "Cartesian"                                             # "Cartesian" or "Direct"; the latter is for fractional coordinates

  # cmd args; keywords that related to running ABACUS
  run_cmd : "mpirun"                                                  # run command
  abacus_path: "/usr/local/bin/abacus"                                # ABACUS executable path

# below is the init_scf_abacus block, which is basically same as above
# just note that the recommended value for scf_thr is 1e-7,
# and force calculation can be omitted since the init training includes energy label only.
init_scf_abacus:
  orb_files: ["O_gga_6au_60Ry_2s2p1d.orb", "H_gga_6au_60Ry_2s1p.orb"]
  pp_files: ["O_ONCV_PBE-1.0.upf", "H_ONCV_PBE-1.0.upf"]
  proj_file: ["jle.orb"]
  ntype: 2
  nbands: 8
  ecutwfc: 50
  scf_thr: 1e-7
  scf_nmax: 50
  dft_functional: "lda"
  gamma_only: 1
  cal_force: 0
  lattice_constant: 1
  lattice_vector: [[28, 0, 0], [0, 28, 0], [0, 0, 28]]
  coord_type: "Cartesian"
  #cmd args
  run_cmd : "mpirun"
  abacus_path: "/usr/local/bin/abacus"

For multi k-points systems, the number of k-points can either be set explicitly as:

scf_abacus:
  <...other keywords>
  k_points: [4,4,4,0,0,0]
init_scf_abacus:
  <...other keywords>
  k_points: [4,4,4,0,0,0]

or via kspacing as:

scf_abacus:
  <...other keywords>
  kspacing: 0.1
init_scf_abacus:
  <...other keywords>
  kspacing: 0.1

machine.yaml

Note

This file is not required when running jobs on Bohrium via DPDispachter. In such case, users need to prepare machine_dpdispatcher.yaml instead.

To run ABACUS-DeePKS training process on a local machine or on a cluster via slurm or PBS, it is recommended to use the DeePKS built-in dispatcher and prepare machine.yaml file as follows.

# this is only part of input settings.
# should be used together with systems.yaml and params.yaml
scf_machine:
  group_size: 125        # number of SCF jobs that are grouped and submitted together; these jobs will be run sequentially
  resources:
    task_per_node: 1     # number of CPUs for one SCF job

  sub_size: 1            # keyword for PySCF; set to 1 for ABACUS SCF jobs
  dispatcher:
    context: local       # "local" to run on local machine, or "ssh" to run on a remote machine
    batch: shell         # set to shell to run on local machine, you can also use `slurm` or `pbs`

train_machine:
  dispatcher:
    context: local       # "local" to run on local machine, or "ssh" to run on a remote machine
    batch: shell         # set to shell to run on local machine, you can also use `slurm` or `pbs`
    remote_profile: null # use lazy local
  # resources are no longer needed, and the task will use gpu automatically if there is one.
  python: "python"       # use python in path


# other settings (these are default; can be omitted)
cleanup: false           # whether to delete slurm and err files
strict: true             # do not allow undefined machine parameters

#paras for abacus
use_abacus: true         # use abacus in scf calculation

To run ABACUS-DeePKS via PBS or slurm, the following parameters can be specified under resources block in both scf_machine and train_machine:

# this is only part of input settings.
# should be used together with systems.yaml and params.yaml
scf_machine:
  <...other kerwords>
  resources:
    numb_node:          # int; number of nodes; default value is 1
    task_per_node:      # int; ppn required; default value is 1;
    numb_gpu:           # int; number of GPUs; default value is 1
    time_limit:         # time limit; default value is 1:0:0
    mem_limit:          # int; memeory limit in GB
    partition:          # string; queue name
    account:            # string; account info
    qos:                # string;
    module_list:        # e.g., [abacus]
    source_list:        # e.g., [/opt/intel/oneapi/setvars.sh; conda activate deepks]
    <... other keywords>
 train_machine:
   <...other kerwords>
   resources:
     <... same as above>

machine_dpdispatcher.yaml

Note

This file is not required when running jobs on a local machine or on a cluster via slurm or PBS with the built-in dispatcher. In such case, users may prepare machine.yaml instead. That being said, users may also modify keywords in this file to submit jobs to a cluster via slurm or PBS. Please refer to DPDispatcher documentation for more details on slurm/PBS job submission.

To run ABACUS-DeePKS on Bohrium or via slurm, users need to use DPDispatcher and prepare machine_dpdispatcher.yaml file as follows. Most of the keyword in this file share the same meaning as those in machine.yaml. The unique part here is to specify keywords in dpdispatcher_resources: block. Below is an example for running jobs in Bohrium:

# this is only part of input settings.
# should be used together with systems.yaml and params.yaml
scf_machine:
  resources:
    task_per_node: 4
  dispatcher: dpdispatcher
  dpdispatcher_resources:
    number_node: 1
    cpu_per_node: 8
    group_size: 125
    source_list: [/opt/intel/oneapi/setvars.sh]
  sub_size: 1
  dpdispatcher_machine:
    context_type: lebesguecontext
    batch_type: lebesgue
    local_root: ./
    remote_profile:
      email: (your-account-email)         # email address registered on Bohrium
      password: (your-passward)           # password on Bohrium
      program_id: (your-program-id)       # program ID on Bohrium
      input_data:
        log_file: log.scf
        err_file: err.scf
        job_type: indicate
        grouped: true
        job_name: deepks-scf
        disk_size: 100
        scass_type: c8_m8_cpu             # machine type
        platform: ali
        image_name: abacus-workshop       # image name
        on_demand: 0
train_machine:
  dispatcher: dpdispatcher
  dpdispatcher_machine:
    context_type: lebesguecontext
    batch_type: lebesgue
    local_root: ./
    remote_profile:
      email: (your-account-email)
      password: (your-passward)
      program_id: (your-program-id)
      input_data:
        log_file: log.train
        err_file: err.train
        job_type: indicate
        grouped: true
        job_name: deepks-train
        disk_size: 100
        scass_type: c8_m8_cpu
        platform: ali
        image_name: abacus-workshop
        on_demand: 0
  dpdispatcher_resources:
    number_node: 1
    cpu_per_node: 8
    group_size: 1
    source_list: [~/.bashrc]
  python: "/usr/bin/python3" # use python in path
  # resources are no longer needed, and the task will use gpu automatically if there is one

# other settings (these are default; can be omitted)
cleanup: false # whether to delete slurm and err files
strict: true # do not allow undefined machine parameters

#paras for abacus
use_abacus: true # use abacus in scf calculation

params.yaml

This file controls the init and iterative training processes performed in DeePKS-kit. Default values for hyperparameters set for the training process (as given below) are recommended for users who are not very experienced in machine-learning, while machine-learning gurus are welcome to play with them.

# this is only part of input settings.
# should be used together with systems.yaml and machines.yaml

# number of iterations to do, can be set to zero for DeePHF training
n_iter: 1

# directory setting (these are default choices, can be omitted)
workdir: "."
share_folder: "share" # folder that stores all other settings

# scf settings, set to false when n_iter = 0 to skip checking
scf_input: false


# train settings for training after init iteration,
# set to false when n_iter = 0 to skip checking
train_input:
  # model_args is omitted, which will inherit from init_train
  data_args:
    batch_size: 16          # training batch size; 16 is recommended
    group_batch: 1          # number of batches to be grouped; set to 1 for ABACUS-related training
    extra_label: true       # set to true to train the model with force, stress, or bandgap labels.
                            # note that these extra labels will only be included after the init iteration
                            # only energy label will be included for the init training
    conv_filter: true       # if set to true (recommended), will read the convergence data from conv_name
                            # and only use converged datapoints to train; including any unconverged
                            # datapoints may screw up the training!
    conv_name: conv         # npy file that records the converged datapoints
  preprocess_args:
    preshift: false         # restarting model already shifted. Will not recompute shift value
    prescale: false         # same as above
    prefit_ridge: 1e1       # the ridge factor used in linear regression
    prefit_trainable: false # make the linear regression fixed during the training
  train_args:
    # start learning rate (lr) will decay a factor of `decay_rate` every `decay_steps` epoches
    decay_rate: 0.5
    decay_steps: 1000
    display_epoch: 100      # show training results every n epoch
    force_factor: 1         # the prefactor multiplied infront of the force part of the loss
    n_epoch: 5000           # total number of epoch needed in training
    start_lr: 0.0001        # the start learning rate, will decay later

# init training settings, these are for DeePHF task
init_model: false           # do not use existing model to restart from

init_scf: True              # whether to perform init SCF;

init_train:                 # parameters for init nn training; basically the same as those listed in train_input
  model_args:
    hidden_sizes: [100, 100, 100] # neurons in hidden layers
    output_scale: 100             # the output will be divided by 100 before compare with label
    use_resnet: true              # skip connection
    actv_fn: mygelu               # same as gelu, support force calculation
  data_args:
    batch_size: 16
    group_batch: 1
  preprocess_args:
    preshift: true                # shift the descriptor by its mean
    prescale: false               # scale the descriptor by its variance (can cause convergence problem)
    prefit_ridge: 1e1             # do a ridge regression as prefitting
    prefit_trainable: false
  train_args:
    decay_rate: 0.96
    decay_steps: 500
    display_epoch: 100
    n_epoch: 5000
    start_lr: 0.0003

Even though the DeePKS training scheme is relativley robust, there might be a chance that the SCF procedure fails to converge after loading the DeePKS model. Such convergence failure might be caused by insufficient variety of the training data, and/or the discontinuities issue due to the sorting of the eigenvalues in eigenvalue decomposition step when constructing the descriptors. One thing that worths trying is to add more training data with sufficient variety in structure, and if the convergence failure remains or the training data is indeed sufficient, users may further symmetrize the descriptors by modifing the init_train block in params.yaml as follows:

init_train:                 # parameters for init nn training; basically the same as those listed in train_input
  proj_basis: [[0,[0,...,0]],
              [1, [0,...,0]],
              [2, [0,...,0]]] # projected basis for thermal embedding, 0, 1, and 2 in the first column correspond to s, p, and d orbitals,
                              # and the number of zeros afterwards should equal the number of Bessel functions in jle.orb.
  model_args:
    hidden_sizes: [100, 100, 100] # neurons in hidden layers
    output_scale: 100             # the output will be divided by 100 before compare with label
    use_resnet: true              # skip connection
    actv_fn: mygelu               # same as gelu, support force calculation
    embedding: {embd_sizes: null, init_beta: 5, type: thermal} # apply thermal averaging to further symmetrize the descriptors
  <...other keywords>

projector file

The descriptors applied in DeePKS model is generated from the projected density matrix, therefore a set of projectors are required in advance. To obtain these projectors for periodic system, users need to run a specific sample job in ABACUS. These projectors are products of spherical Bessel functions (radial part) and spherical harmonic functions (angular part), which are similar to numerical atomic orbitals. The number of Bessel functions are controled by the radial and wavefunction cutoff, for which 5 or 6 Bohr and ecutwfc set in scf_abacus.yaml are recommeded, respectively.

Note that it is not necessary to change the STRU file of this sample job, since all elements share the same descriptor. Basically, users only need to specify calculation as gen_bessel and then adjust the energy cutoff and the radial cutoff of the wavefunctions. The angular part is controled via the keyword bessel_lmax and the value 2 (including s, p, and d orbitals) is strongly recommended. See below for related input parameters:

calculation gen_bessel # calculation type should be gen_bessel
bessel_lmax 2   # maximum angular momentum for projectors; 2 is recommended
bessel_rcut 5   # radial cutoff in unit Bohr; 5 or 6 is recommended
ecutwfc   100   # kinetic energy cutoff in unit Ry; should be consistent with that set for ABACUS SCF calculation

After running this sample job, users will find jle.orb in folder OUT.abacus and will need to copy this file to the iter folder.

Note

Note that the jle.orb file provided in the example is with extremely low cutoff for efficient job running and therefore is not indended for any practical production-level projects. Users need to generate a more practical projector file based on the recommended cutoffs provided above.

orbital files and pseudopotential files

The DeePKS-related calculations are implemented with lcao basis set in ABACUS, therefore the orbital and pseudopotential files for each elements are required. Since the numerical atomic orbitals in ABACUS are generated based on SG15 optimized Norm-Conserving Vanderbilt (ONCV) pseudopotentials, users are required to use this set of pseudopotentials. Atomic orbitals with 100Ry energy cutoff are recommended, and ewfcut is recommended to set to 100 Ry, i.e., consistent with the one applied in atomic orbital generation.

Both the pseudopotential and the atomic orbital files can be downloaded from ABACUS official website. The required files are recommended to be placed on iter folder, as shown in the file structure .

Important outputs explanation

During the training process, a bunch of outputs will be generated. First, ABACUS folder will be generated under each training/testing group (group.xx under systems), which further includes N=nframes subfolders, 0, 1, …, ${nframes}. For example, for water_single_lda2pbe_abacus, ABACUS in folder systems/group.00 contains 300 subfolders, while ABACUS in folder systems/group.03 contains 100 subfolders. Each subfolder contains the input and output file of the ABACUS SCF job of corresponding frame at current iteration, and will be overwritten on the next iteration.

For each iteration, error statistics and training outputs are generated in iter.xx folder. For example, the file structure of iter.init basically looks like:

If niter is larger than 0, then iter.00, iter.01, …, will be generated at corresponding iteration. These folders share similar file structures as iter.init does. Important output files during the training processes are explained as below.

log.data

path: iter/iter.xx/00.scf/log.data

This file contains error statistics as well as SCF convergence ratio of each iteration. For example, for water_single_lda2pbe_abacus, log.data of the init iteration (located at iter/iter.init/00.scf) looks like

Training:
  Convergence:
    900 / 900 =          1.00000
  Energy:
    ME:          -0.09730528149450003
    MAE:         0.09730528149450003
    MARE:        0.00030881151639484673
Testing:
  Convergence:
    100 / 100 =          1.00000
  Energy:
    ME:          -0.09730505954754445
    MAE:         0.09730505954754445
    MARE:        0.0003349933606729039

where ME = mean error, MAE = mean absolute error, MARE = mean relative absolute error. MARE is calculated via removing any constant energy shift between the target and base energy. Note that only energy error is included here since only energy label is trained in the init iteration.

In this example, force label is triggered on after the init iteration by setting extra_label to be true and force_factor to be 1 in params.yaml. And log.data in iter.00/00.scf therefore has the force error statistics:

Training:
  Convergence:
    899 / 900 =          0.99889
  Energy:
    ME:          1.707869318132222e-05
    MAE:         3.188871711078968e-05
    MARE:        3.054509587845316e-05
  Force:
    MAE:         0.00030976685248761896
Testing:
  Convergence:
    100 / 100 =          1.00000
  Energy:
    ME:          1.8457155353139854e-05
    MAE:         3.5420404788446546e-05
    MARE:        3.3798956665677724e-05
  Force:
    MAE:         0.0003271656570860149

To judge whether the DeePKS model has converged, users may compare error statistics in log.data between current and former iterations, if the errors almost remain the same, the model can be considered as converged.

log.train

path: iter/iter.xx/01.train/log.train

This file records the learning curve of the training process at each iteration. It should be noted that for iterations after the initial one, train error (trn err) recorded in this file corresponds to the total error of the training set, i.e., energy error plus the error from extra labels, while test error (tst err) corresponds to only the energy error of the testing set. For init training, both the train error and the test error correspond to the energy error since no extra label is included.

For a successful training process, users would expect a remarkable decrease in both the train and the test error, especially during the first one or two iterations. As the iterative training goes on, the decrease in errors will gradually become subtle.

RECORD

path: iter/RECORD

This file records every step taken in the iterative training process and is crucial when resubmitting the job. Each row of this RECORD file corresponds to a unique step, and details are given as follows:

  • (X 0 0): at iteration number X (X=0 corresponds to iter.init; X=1 corresponds to iter.00; X=2 corresponds to iter.01; etc), pre process of SCF, generate ABACUS work directory and input files in each group of systems

  • (X 0 1): run SCF calculations in ABACUS

  • (X 0 2): concatenate and check the SCF result and print convergence and accuracy in log.data in iter.xx/00.scf.

  • (X 0): current SCF job done; prepare for training

  • (X 1 0): train a new model using the old one (if any) as starting point

  • (X 1 1): current training done; learning curve is recorded in log.train in iter.xx/01.train

  • (X 1): test the model on all data to see the pure fitting error in log.test in iter.xx/01.train

  • (X): current iteration done

For example, if we want to restart the training process for iter.00, then the corresponding RECORD file should look like

0 0 0
0 0 1
0 0 2
0 0
0 1 0
0 1 1
0 1
0
1 0 0
1 0 1
1 0 2
1 0

Note

To re-run the whole procedure, make sure that all iter.xx folder, share folder and RECORD file are deleted! In addition, if previous jobs were submiited via DPDsipatcher and resubmission is desired for some reason, maks sure the .json file located at ~/.dpdispatcher/dp_cloud_server/ is removed.

Model file

path: iter/iter.xx/01.train/model.pth; this is the model file generated directly by the neural network in DeePKS-kit

path: iter/iter.{xx+1}/00.scf/model.ptg; this is the adjusted format of model.pth which will be loaed in ABACUS

To manually convert model.pth to model.ptg, one needs to run the following script:

import torch
import torch.nn as nn
from torch.nn import functional as F
from deepks.model import CorrNet
mp = CorrNet.load("model.pth")
mp.compile_save("model.ptg")

Running ABACUS with DeePKS model

Once the DeePKS training process is converged, users may perform ABACUS SCF calculation with the DeePKS model loaded. Compared to a normal ABACUS SCF job with lcao basis, one needs to add the following keywords to INPUT file:

<...other keywords>
deepks_scf: 1                 # run SCF job with DeePKS model
deepks_model: model.ptg       # provide the model file; should be correctly located

Note that the path of model.ptg should be provided along with the file itself. The above input works only if model.ptg and INPUT are placed under the same directory.

Users also need to provide the projector file along with the path in STRU:

<...other keywords>
NUMERICAL_DESCRIPTOR
jle.orb

An example of running ABACUS SCF with trained DeePKS model has been provided here.