Input files preperation
To run DeePKS-kit in connection with ABACUS, a bunch of input files are required so as to iteratively perform the SCF jobs on ABACUS and the training jobs on DeePKS-kit. Here we will use single water molecule as an example to show the required input files for the training of an LDA-based DeePKS model that provides PBE target energies and forces.
As can be seen in this example, 1000 structures of the single water molecules with corresponding PBE property labels (including energy and force) have been prepared in advance. Four subfolders, i.e., group.00-03
can be found under the folder systems
. group.00-group.02
contain 300 frames each and can be applied as training sets, while group.03
contains 100 frames and can be applied as testing set.
The prepared file structure of a ready-to-run DeePKS iterative traning process should basically look like

scf_abacus.yaml
This file controls the SCF jobs performed in ABACUS. The scf_abacus
block controls the SCF jobs after the init iteration, i.e., with DeePKS model loaded, while the init_scf_abacus
controls the initial SCF jobs, i.e., bare LDA or PBE SCF calculaiton. The reason to divide this file into two blocks is that after the init iteration, the SCF calculaitons with DeePKS model loaded are sometimes found hard to converge to a tight threshold, e.g., scf_thr = 1e-7
. Therefore we might want to slightly loose that threshold after the init iteration. Also, even users need to train the model with force label, there is no need to calculate force during the init SCF cycle, since the init training will include the energy label only.
Below is a sample scf_abacus.yaml
file for single water molecule, with the explanation of each keyword. Please refer to ABACUS input file documentation for a more detailed explanation of the input parameters in ABACUS.
scf_abacus:
# INPUT args; keywords that related to INPUT file in ABACUS
ntype: 2 # int; number of different atom species in this calculations, e.g., 2 for H2O
nbands: 8 # int; number of bands to be calculated; optional
ecutwfc: 50 # real; energy cutoff, unit: Ry
scf_thr: 1e-7 # real; SCF convergence threshold for density error; 5e-7 and below is acceptable
scf_nmax: 50 # int; maximum SCF iteration steps
dft_functional: "lda" # string; name of the baseline density functional
gamma_only: 1 # bool; 1 for gamma-only calculation
cal_force: 1 # bool; 1 for force calculation
cal_stress: 0 # bool; 1 for stress calculation
# STRU args; keywords that related to INPUT file in ABACUS
# below are default STRU args, users can also set them for each group in
# ../systems/group.xx/stru_abacus.yaml
orb_files: ["O_gga_6au_60Ry_2s2p1d.orb", "H_gga_6au_60Ry_2s1p.orb"] # atomic orbital file list for each element;
# order should be consistent with that in atom.npy
pp_files: ["O_ONCV_PBE-1.0.upf", "H_ONCV_PBE-1.0.upf"] # pseudopotential file list for each element;
# order should be consistent with that in atom.npy
proj_file: ["jle.orb"] # projector file; generated in ABACUS; see file desriptions for more details
lattice_constant: 1 # real; lattice constant
lattice_vector: [[28, 0, 0], [0, 28, 0], [0, 0, 28]] # [3, 3] matrix; lattice vectors
coord_type: "Cartesian" # "Cartesian" or "Direct"; the latter is for fractional coordinates
# cmd args; keywords that related to running ABACUS
run_cmd : "mpirun" # run command
abacus_path: "/usr/local/bin/abacus" # ABACUS executable path
# below is the init_scf_abacus block, which is basically same as above
# just note that the recommended value for scf_thr is 1e-7,
# and force calculation can be omitted since the init training includes energy label only.
init_scf_abacus:
orb_files: ["O_gga_6au_60Ry_2s2p1d.orb", "H_gga_6au_60Ry_2s1p.orb"]
pp_files: ["O_ONCV_PBE-1.0.upf", "H_ONCV_PBE-1.0.upf"]
proj_file: ["jle.orb"]
ntype: 2
nbands: 8
ecutwfc: 50
scf_thr: 1e-7
scf_nmax: 50
dft_functional: "lda"
gamma_only: 1
cal_force: 0
lattice_constant: 1
lattice_vector: [[28, 0, 0], [0, 28, 0], [0, 0, 28]]
coord_type: "Cartesian"
#cmd args
run_cmd : "mpirun"
abacus_path: "/usr/local/bin/abacus"
For multi k-points systems, the number of k-points can either be set explicitly as:
scf_abacus:
<...other keywords>
k_points: [4,4,4,0,0,0]
init_scf_abacus:
<...other keywords>
k_points: [4,4,4,0,0,0]
or via kspacing
as:
scf_abacus:
<...other keywords>
kspacing: 0.1
init_scf_abacus:
<...other keywords>
kspacing: 0.1
machine.yaml
Note
This file is not required when running jobs on Bohrium via DPDispachter. In such case, users need to prepare machine_dpdispatcher.yaml instead.
To run ABACUS-DeePKS training process on a local machine or on a cluster via slurm or PBS, it is recommended to use the DeePKS built-in dispatcher and prepare machine.yaml
file as follows.
# this is only part of input settings.
# should be used together with systems.yaml and params.yaml
scf_machine:
group_size: 125 # number of SCF jobs that are grouped and submitted together; these jobs will be run sequentially
resources:
task_per_node: 1 # number of CPUs for one SCF job
sub_size: 1 # keyword for PySCF; set to 1 for ABACUS SCF jobs
dispatcher:
context: local # "local" to run on local machine, or "ssh" to run on a remote machine
batch: shell # set to shell to run on local machine, you can also use `slurm` or `pbs`
train_machine:
dispatcher:
context: local # "local" to run on local machine, or "ssh" to run on a remote machine
batch: shell # set to shell to run on local machine, you can also use `slurm` or `pbs`
remote_profile: null # use lazy local
# resources are no longer needed, and the task will use gpu automatically if there is one.
python: "python" # use python in path
# other settings (these are default; can be omitted)
cleanup: false # whether to delete slurm and err files
strict: true # do not allow undefined machine parameters
#paras for abacus
use_abacus: true # use abacus in scf calculation
To run ABACUS-DeePKS via PBS or slurm, the following parameters can be specified under resources
block in both scf_machine
and train_machine
:
# this is only part of input settings.
# should be used together with systems.yaml and params.yaml
scf_machine:
<...other kerwords>
resources:
numb_node: # int; number of nodes; default value is 1
task_per_node: # int; ppn required; default value is 1;
numb_gpu: # int; number of GPUs; default value is 1
time_limit: # time limit; default value is 1:0:0
mem_limit: # int; memeory limit in GB
partition: # string; queue name
account: # string; account info
qos: # string;
module_list: # e.g., [abacus]
source_list: # e.g., [/opt/intel/oneapi/setvars.sh; conda activate deepks]
<... other keywords>
train_machine:
<...other kerwords>
resources:
<... same as above>
machine_dpdispatcher.yaml
Note
This file is not required when running jobs on a local machine or on a cluster via slurm or PBS with the built-in dispatcher. In such case, users may prepare machine.yaml instead. That being said, users may also modify keywords in this file to submit jobs to a cluster via slurm or PBS. Please refer to DPDispatcher documentation for more details on slurm/PBS job submission.
To run ABACUS-DeePKS on Bohrium or via slurm, users need to use DPDispatcher and prepare machine_dpdispatcher.yaml
file as follows. Most of the keyword in this file share the same meaning as those in machine.yaml
. The unique part here is to specify keywords in dpdispatcher_resources:
block. Below is an example for running jobs in Bohrium:
# this is only part of input settings.
# should be used together with systems.yaml and params.yaml
scf_machine:
resources:
task_per_node: 4
dispatcher: dpdispatcher
dpdispatcher_resources:
number_node: 1
cpu_per_node: 8
group_size: 125
source_list: [/opt/intel/oneapi/setvars.sh]
sub_size: 1
dpdispatcher_machine:
context_type: lebesguecontext
batch_type: lebesgue
local_root: ./
remote_profile:
email: (your-account-email) # email address registered on Bohrium
password: (your-passward) # password on Bohrium
program_id: (your-program-id) # program ID on Bohrium
input_data:
log_file: log.scf
err_file: err.scf
job_type: indicate
grouped: true
job_name: deepks-scf
disk_size: 100
scass_type: c8_m8_cpu # machine type
platform: ali
image_name: abacus-workshop # image name
on_demand: 0
train_machine:
dispatcher: dpdispatcher
dpdispatcher_machine:
context_type: lebesguecontext
batch_type: lebesgue
local_root: ./
remote_profile:
email: (your-account-email)
password: (your-passward)
program_id: (your-program-id)
input_data:
log_file: log.train
err_file: err.train
job_type: indicate
grouped: true
job_name: deepks-train
disk_size: 100
scass_type: c8_m8_cpu
platform: ali
image_name: abacus-workshop
on_demand: 0
dpdispatcher_resources:
number_node: 1
cpu_per_node: 8
group_size: 1
source_list: [~/.bashrc]
python: "/usr/bin/python3" # use python in path
# resources are no longer needed, and the task will use gpu automatically if there is one
# other settings (these are default; can be omitted)
cleanup: false # whether to delete slurm and err files
strict: true # do not allow undefined machine parameters
#paras for abacus
use_abacus: true # use abacus in scf calculation
params.yaml
This file controls the init and iterative training processes performed in DeePKS-kit. Default values for hyperparameters set for the training process (as given below) are recommended for users who are not very experienced in machine-learning, while machine-learning gurus are welcome to play with them.
# this is only part of input settings.
# should be used together with systems.yaml and machines.yaml
# number of iterations to do, can be set to zero for DeePHF training
n_iter: 1
# directory setting (these are default choices, can be omitted)
workdir: "."
share_folder: "share" # folder that stores all other settings
# scf settings, set to false when n_iter = 0 to skip checking
scf_input: false
# train settings for training after init iteration,
# set to false when n_iter = 0 to skip checking
train_input:
# model_args is omitted, which will inherit from init_train
data_args:
batch_size: 16 # training batch size; 16 is recommended
group_batch: 1 # number of batches to be grouped; set to 1 for ABACUS-related training
extra_label: true # set to true to train the model with force, stress, or bandgap labels.
# note that these extra labels will only be included after the init iteration
# only energy label will be included for the init training
conv_filter: true # if set to true (recommended), will read the convergence data from conv_name
# and only use converged datapoints to train; including any unconverged
# datapoints may screw up the training!
conv_name: conv # npy file that records the converged datapoints
preprocess_args:
preshift: false # restarting model already shifted. Will not recompute shift value
prescale: false # same as above
prefit_ridge: 1e1 # the ridge factor used in linear regression
prefit_trainable: false # make the linear regression fixed during the training
train_args:
# start learning rate (lr) will decay a factor of `decay_rate` every `decay_steps` epoches
decay_rate: 0.5
decay_steps: 1000
display_epoch: 100 # show training results every n epoch
force_factor: 1 # the prefactor multiplied infront of the force part of the loss
n_epoch: 5000 # total number of epoch needed in training
start_lr: 0.0001 # the start learning rate, will decay later
# init training settings, these are for DeePHF task
init_model: false # do not use existing model to restart from
init_scf: True # whether to perform init SCF;
init_train: # parameters for init nn training; basically the same as those listed in train_input
model_args:
hidden_sizes: [100, 100, 100] # neurons in hidden layers
output_scale: 100 # the output will be divided by 100 before compare with label
use_resnet: true # skip connection
actv_fn: mygelu # same as gelu, support force calculation
data_args:
batch_size: 16
group_batch: 1
preprocess_args:
preshift: true # shift the descriptor by its mean
prescale: false # scale the descriptor by its variance (can cause convergence problem)
prefit_ridge: 1e1 # do a ridge regression as prefitting
prefit_trainable: false
train_args:
decay_rate: 0.96
decay_steps: 500
display_epoch: 100
n_epoch: 5000
start_lr: 0.0003
Even though the DeePKS training scheme is relativley robust, there might be a chance that the SCF procedure fails to converge after loading the DeePKS model. Such convergence failure might be caused by insufficient variety of the training data, and/or the discontinuities issue due to the sorting of the eigenvalues in eigenvalue decomposition step when constructing the descriptors. One thing that worths trying is to add more training data with sufficient variety in structure, and if the convergence failure remains or the training data is indeed sufficient, users may further symmetrize the descriptors by modifing the init_train
block in params.yaml
as follows:
init_train: # parameters for init nn training; basically the same as those listed in train_input
proj_basis: [[0,[0,...,0]],
[1, [0,...,0]],
[2, [0,...,0]]] # projected basis for thermal embedding, 0, 1, and 2 in the first column correspond to s, p, and d orbitals,
# and the number of zeros afterwards should equal the number of Bessel functions in jle.orb.
model_args:
hidden_sizes: [100, 100, 100] # neurons in hidden layers
output_scale: 100 # the output will be divided by 100 before compare with label
use_resnet: true # skip connection
actv_fn: mygelu # same as gelu, support force calculation
embedding: {embd_sizes: null, init_beta: 5, type: thermal} # apply thermal averaging to further symmetrize the descriptors
<...other keywords>
projector file
The descriptors applied in DeePKS model is generated from the projected density matrix, therefore a set of projectors are required in advance. To obtain these projectors for periodic system, users need to run a specific sample job in ABACUS. These projectors are products of spherical Bessel functions (radial part) and spherical harmonic functions (angular part), which are similar to numerical atomic orbitals. The number of Bessel functions are controled by the radial and wavefunction cutoff, for which 5 or 6 Bohr and ecutwfc
set in scf_abacus.yaml are recommeded, respectively.
Note that it is not necessary to change the STRU file of this sample job, since all elements share the same descriptor. Basically, users only need to specify calculation
as gen_bessel
and then adjust the energy cutoff and the radial cutoff of the wavefunctions. The angular part is controled via the keyword bessel_lmax
and the value 2 (including s, p, and d orbitals) is strongly recommended. See below for related input parameters:
calculation gen_bessel # calculation type should be gen_bessel
bessel_lmax 2 # maximum angular momentum for projectors; 2 is recommended
bessel_rcut 5 # radial cutoff in unit Bohr; 5 or 6 is recommended
ecutwfc 100 # kinetic energy cutoff in unit Ry; should be consistent with that set for ABACUS SCF calculation
After running this sample job, users will find jle.orb
in folder OUT.abacus
and will need to copy this file to the iter
folder.
Note
Note that the jle.orb
file provided in the example is with extremely low cutoff for efficient job running and therefore is not indended for any practical production-level projects. Users need to generate a more practical projector file based on the recommended cutoffs provided above.
orbital files and pseudopotential files
The DeePKS-related calculations are implemented with lcao basis set in ABACUS, therefore the orbital and pseudopotential files for each elements are required. Since the numerical atomic orbitals in ABACUS are generated based on SG15 optimized Norm-Conserving Vanderbilt (ONCV) pseudopotentials, users are required to use this set of pseudopotentials. Atomic orbitals with 100Ry energy cutoff are recommended, and ewfcut
is recommended to set to 100 Ry, i.e., consistent with the one applied in atomic orbital generation.
Both the pseudopotential and the atomic orbital files can be downloaded from ABACUS official website. The required files are recommended to be placed on iter
folder, as shown in the file structure .