Overview of example use

This is an example workflow for doing the whole calculation, except calculating potential itself that has been explained else where.

Molecular dynamics itself consist of 4 phases:

Equilibration of initial geometry in gas phase
Condensing gas to solid.
Equilibration to DFT
Spectrum calculation

Fitting Potential

First we need to fit a potential. In this example we fit a potential for trans-formic acid and argon. PotentialDB has pre calculated data than we can use for fitting the potential.

using PotentialDB
r = defaultregistry()
data = loadpotential(r, "4")

Dict{String, Any} with 6 entries:
  "Method"   => "CCSD(T)-F12/RI"
  "cluster2" => Cluster{AtomOnlySymbol} of 1 atoms
  "Energy"   => [0.0172099 0.0232891 … 0.0210108 0.0160872; 0.0104979 0.0147825…
  "Points"   => Cluster{AtomOnlySymbol}[Cluster{AtomOnlySymbol} of 6 atoms Clus…
  "cluster1" => Cluster{AtomOnlySymbol} of 5 atoms
  "Basis"    => "cc-pVDZ-F12 cc-pVDZ-F12-CABS cc-pVTZ/C TIGHTSCF"

The fitting procedure is the same as explained in Usage section.

First we create a potential for fitting.

using PotentialFitting, PotentialCalculation

m1, m2 = get_molecules(data)

mpp = MoleculePairPotential(m1,m2, GeneralPowers(-6,-8,-10,-12,-7,-9,-11))

Molecule Pair Potential
Molecule 1 has 5 atoms (5 unique) 
Molecule 2 has 1 atoms (1 unique) 

Potential topology:

r^-6  r^-8  r^-10  r^-12  r^-7  r^-9  r^-11   *** 0.0  0.0  0.0  0.0  0.0  0.0  0.0   :  1C :  6Ar
r^-6  r^-8  r^-10  r^-12  r^-7  r^-9  r^-11   *** 0.0  0.0  0.0  0.0  0.0  0.0  0.0   :  2O :  6Ar
r^-6  r^-8  r^-10  r^-12  r^-7  r^-9  r^-11   *** 0.0  0.0  0.0  0.0  0.0  0.0  0.0   :  3O :  6Ar
r^-6  r^-8  r^-10  r^-12  r^-7  r^-9  r^-11   *** 0.0  0.0  0.0  0.0  0.0  0.0  0.0   :  4H :  6Ar
r^-6  r^-8  r^-10  r^-12  r^-7  r^-9  r^-11   *** 0.0  0.0  0.0  0.0  0.0  0.0  0.0   :  5H :  6Ar

Then we prepare fitting data and apply some constrains. The reason why we applying constrains is that our potential is under fitting and as a result depends heavily on the constrains. You can verify this, by taking constrains off and comparing to constrained potential, or by changing the constrains considerably.

fdata = FitData(mpp, data["Points"], data["Energy"])

# setweight_e_more!(fdata, weight, minimum_energy)
setweight_e_more!(fdata, 0.1, 1000u"cm^-1")
setweight_e_more!(fdata, 0.01, 5000u"cm^-1")
setweight_e_less!(fdata, 2, -50u"cm^-1")
setweight_e_less!(fdata, 4, -75u"cm^-1")
setweight_e_less!(fdata, 16, -100u"cm^-1");

Optional defining a model

Then we prepare fitting tools for fitting. Standard linear regression is most likely good enough. But you can try others as well.

Here we load LinearRegressor that does not fit intercept

using MLJ
LinearRegressor = @load LinearRegressor pkg=GLM

model = LinearRegressor(fit_intercept=false)

LinearRegressor(
  fit_intercept = false, 
  dropcollinear = false, 
  offsetcol = nothing, 
  report_keys = [:deviance, :dof_residual, :stderror, :vcov, :coef_table])

Fitting potential

Fit using default model

fit_potential!(mpp, fdata)

Molecule Pair Potential
Molecule 1 has 5 atoms (5 unique) 
Molecule 2 has 1 atoms (1 unique) 

Potential topology:

r^-6  r^-8  r^-10  r^-12  r^-7  r^-9  r^-11   *** -22538.764664060964  -5.243025376695279e6  -3.2391005886259116e7  1.5156798055682138e8  551011.5920472831  2.2660963253455512e7  -5.316652440075644e7   :  1C :  6Ar
r^-6  r^-8  r^-10  r^-12  r^-7  r^-9  r^-11   *** 13740.236148585838  5.472515665297194e6  1.6959907070386767e8  3.752328806901236e8  -416091.7439865846  -3.998932225141039e7  -3.899070213341327e8   :  2O :  6Ar
r^-6  r^-8  r^-10  r^-12  r^-7  r^-9  r^-11   *** 9792.504974051504  3.139013079090014e6  7.626585584376974e7  1.3066879800125138e8  -268152.3469762912  -2.0302156130230505e7  -1.550009093303048e8   :  3O :  6Ar
r^-6  r^-8  r^-10  r^-12  r^-7  r^-9  r^-11   *** -2432.4139113706888  -633362.37890601  -1.0428608883981721e7  -1.0705245941539187e7  60927.96548650013  3.458886999782353e6  1.6484328349663341e7   :  4H :  6Ar
r^-6  r^-8  r^-10  r^-12  r^-7  r^-9  r^-11   *** 4524.673108913862  858781.5222092883  7.271737286364344e6  -3.2132918849226506e6  -99324.2346945478  -3.629952540388673e6  -4.340602429711378e6   :  5H :  6Ar

or if you defined your own model

fit_potential!(model, mpp, fdata)

Check root mean square deviation for potential.

rmsd(data["Points"], data["Energy"], mpp, emax=-10u"cm^-1")

8.36797599293565 cm^-1

Note

We are under fitting our potential, so we can safely compare the potential to the data we used for the fitting. But it would be a good idea to confirm this, by comparing to some data that was not used for fitting.

To inspect the potential we can use use scan_potential which will calculate potential for given points and visualizes the geometry.

scan_potential(data, mpp)

You can call external programs like VMD to to visualize the data also.

# Visualize line 32 using vmd
visualize_points(data["Points"][:,32], command="vmd")

Preparing CP2K input files

Geometry inputs with AtomsBase (new recommended)

To generate input data for simulations we supply a set of tools. To generate structures you can use

# structure in 86u"Å" cubic box where cluster2 has been added 1000 times
# to cluster 1
str = generate_initial_structure(data["cluster1"], data["cluster2"], 86u"Å", 1000)

FlexibleSystem(Ar₁₀₀₀CH₂O₂, periodic = TTT):
    bounding_box      : [      86        0        0;
                                0       86        0;
                                0        0       86]u"Å"

                      .-------------------------------------------------------------------------.  
                     /|                                               Ar                        |  
                    / |                                         Ar   Ar    Ar                   |  
                   /  Ar                                                              Ar   Ar   |  
                  /   |         Ar             Ar    Ar       Ar       Ar       Arr          Ar |  
                 /Ar  |AAr   Ar    Ar                  Ar     Ar     Ar      Ar    Ar         Ar|  
                /     |Ar            Ar    AAr ArAr     Ar                          Ar       Ar |  
               /      |                  Arr             Ar    ArAr            Ar       Ar      |  
              Ar  ArAr|        Ar  AAr ArAAr                 Ar                       Ar        |  
             /   Ar   |  Ar           Ar       Ar      Ar      Ar     Ar          ArAr Ar       |  
            Ar        |      ArAr      ArAArr                         Ar      Ar Ar ArAr        |  
           /      AAr |AAr      Arr  AAr  Ar Ar Ar  AArAr  Ar        Ar         ArAr  ArAr      |  
          /   Ar  AAr |     Ar Ar  Ar  ArAr         Ar      Ar Ar Ar Ar      ArrAAr             |  
         /          Ar|  Arr Arr          Ar Arr    AAr      Ar         ArAr          Ar        |  
        /    Ar  ArrArArAr  Ar        Ar       Ar     Arr     Ar     AAr         Ar  Ar         |  
       /           Ar |    Ar   Arr   Ar       AAr   AAr   ArAr Ar Ar Ar    Ar    ArAr     ArAr |  
      /     Ar        |AAr      Ar  Ar  Ar  Ar    Ar   Arr    Ar  ArArAr     ArAr Ar            |  
     /  Ar      Ar   ArrArr     AArAr   Ar Ar   Ar ArrAr ArAr   ArrArAr  Ar     Arrr  Ar        |  
    *              Ar |         Ar     Ar     Ar     Arr Ar  Ar      Arr   Ar       Ar    Ar    |  
    |      Ar      AAr|  ArAr ArAArr Ar Ar  ArAr Ar  ArArrrAAr      Ar   Ar  Ar      Arr Ar     |  
    |         Arr Ar  AAr   Ar Ar Ar ArrrAr ArArAr Ar   AAr AArAr      Ar ArrArAr   ArAr        |  
    |    Ar        ArAr  Ar  Ar   ArArArAr      Ar  Ar ArArrAr Arr ArArr          Ar Ar Ar      |  
    |             Ar  ArrAAr Ar ArAr  Ar       Ar   ArArr  AArAr   ArArAAAr           ArAr      |  
    |       Ar   Ar Ar|  Ar  ArArArr  ArAr Arr      ArAr        ArAAr  Ar AAr    ArAr Ar        |  
    |       ArAAr    Ar         Ar      ArAr           Arr ArrAArArArArrArAAr  ArAr             |  
    |         Ar     Ar       Ar   AArArArr   Ar  Ar ArAAr        Ar        Ar ArAr       Ar    |  
    |    Ar   Ar   Ar AAr ArrAr ArAr Ar AAr AArAr ArrAr  Arr           Ar  ArrArrAr Ar    ArAr  |  
    |          Ar Arr |AAr            Ar Arr   ArHArrr AArAr   Arr Ar      Ar  Ar   Ar          |  
    |                Ar     Ar Ar ArArrAr Arr AArArH ArAr    Ar  AAr  Ar AArr    Ar          Ar |  
    |     Ar Ar     ArrArrArArArArrrAr     Ar              Ar   ArAr       Ar  Ar    Ar         |  
    |          Ar     | Ar  ArArArrAr   ArAr  ArArAr ArrAr   Ar   Ar   Ar    Ar ArAr Ar ArAr    |  
    |   ArAr  AAr   Ar|     Ar         AAr   Ar   ArrAr   ArAArAr  Arr           ArAr AAr       |  
    |       Ar        |ArAr ArArAr  ArArr             AAArArr        Ar Ar      AAr   Ar        |  
    |       AAr Ar AArArAr   ArAr     Ar  Ar Ar   Ar  Ar     Arr    AAr         Ar      ArAr    |  
    |     ArAr   Ar  AAr AAr ArAr   ArAr Arr ArAr Ar      ArAAr Ar Ar AArArArArrAr   Ar    Ar   |  
    |     Ar  ArAr   Ar   ArAr   ArAr          ArAr      Ar Ar      Arr      Ar  AAr    Ar      |  
    |         Ar      |    ArAArr Ar Ar    Ar   Ar AAr   Ar Arr  Ar   Arrr   AAAr        Ar     |  
    |      Ar Ar  ArAr.-Ar--Ar------------Ar-ArAr-----Ar---AArArArAr-Ar--Ar-ArArArAr-Ar-Ar--Ar--.  
    |          ArAr  Ar  ArAr  AAr  Ar ArAr   Ar    AArAr     ArAAr Ar        AAr  Ar          /   
    | Ar          ArArr Ar            AAr    Ar   Ar    ArAAr    Ar     Ar  AAAr Ar  Ar       /    
    |      Ar Ar  Ar Ar  Ar   ArAr AAr  ArArAr  Ar    ArAr  Ar       ArAr     Ar   Ar AAr    /     
    |             /     Ar  AArr  Ar  Ar   Ar      ArAr Ar       Ar    Ar  ArrAr            /      
    |    Arr    Ar Arrr              Ar        Arr              Ar Ar Ar     ArAr          /       
    |           / ArAArAr    Ar     ArrArr ArAr    ArAr   Arr   Ar   Ar   Ar Ar Ar        /        
    |          /Ar Ar    Ar  ArArArr      Ar AAr      Ar AAr     Ar         Ar  Ar Ar    /         
    |         /Ar Ar              Ar    Ar    Arr   Ar    ArAAr      Ar  Ar      Ar     /          
    |        /AAr      Ar            Ar    Ar        Ar        Ar    Ar Ar Arr Ar      /           
    |       /      Ar         Ar Ar    ArArAAr      AAr      Ar     ArAr           Ar /            
    |   Ar /          ArAr           Ar                                Ar Ar         /             
    |     /      Ar        Ar     Ar                Ar      AAAr               Ar   /              
    |    /  Ar       Ar       Ar        Ar                           Ar  Ar     Ar /               
    |   /Ar         Ar      Ar Ar  Ar    Ar          Ar  Ar                    Ar /                
    |  / Ar         Ar   Ar        Ar  Ar  Arr     Ar      Ar   Ar  Ar     Ar    /                 
    | /    Ar        Ar                 Ar    Ar                    Ar          /                  
    |/        Ar                                   Ar                          /                   
    *-------------------------------------------------------------------------*

You can edit the atomic symbols to reflect the potential

set_atomic_symbol!(str, :Oh, 2)
set_atomic_symbol!(str, :Oo, 3)
set_atomic_symbol!(str, :Ho, 4)
set_atomic_symbol!(str, :Hc, 5)
show(str[1:6])

Atom{3, Quantity{Float64, 𝐋, Unitful.FreeUnits{(Å,), 𝐋, nothing}}, Quantity{Float64, 𝐋 𝐓^-1, Unitful.FreeUnits{(a₀, s^-1), 𝐋 𝐓^-1, nothing}}, Quantity{Float64, 𝐌, Unitful.FreeUnits{(u,), 𝐌, nothing}}}[Atom(C,  [ 42.6718,  43.0575,  43.2551]u"Å"), Atom(Oh, [ 43.8511,  43.3774,  42.7534]u"Å"), Atom(Oo, [ 42.1775,  42.0041,  43.0524]u"Å"), Atom(Ho, [ 44.1017,  42.7007,  42.0086]u"Å"), Atom(Hc, [ 42.1978,  43.8603,  43.9304]u"Å"), Atom(Ar, [ 60.9724,   63.212,  54.7653]u"Å")]

You can then save the structure using AtomsIO

using AtomsIO

save_system("init_struct.xyz", str) # xyz-file
save_system("init_struct.pdb", str) # pdb-file

CP2K needs residue information for QM and MM parts. This can be done by using

# Save pdb file that has residue name "tfa" for atoms 1 till 5
# residue name "LIG" for other atoms
save_with_residue_data("tfa-ar-init.pdb", str, "tfa", 1:5)

This file can be used as an initial file in CP2K.

Geometry inputs with Gromacs tools (old way)

First grab your favorite and make two file, one for the molecule/cluster you are studying and one for matrix and save them in PDB-format. Here we have a file for formic acid and an other for argon.

COMPND    UNNAMED
AUTHOR    GENERATED BY OPEN BABEL 2.3.2
HETATM    1  C   tfa     1      42.585  42.840  42.933  1.00  0.00           C
HETATM    2  Oh  tfa     1      43.930  42.838  42.829  1.00  0.00           O
HETATM    3  Oo  tfa     1      42.014  43.738  43.531  1.00  0.00           O
HETATM    4  Ho  tfa     1      44.463  43.543  43.222  1.00  0.00           H
HETATM    5  Hc  tfa     1      42.008  42.039  42.484  1.00  0.00           H
CONECT    1    2    3    5                                            
CONECT    2    1    4                                                 
CONECT    3    1                                                      
CONECT    4    2                                                      
CONECT    5    1                                                      
MASTER        0    0    0    0    0    0    0    0    5    0    5    0
END

COMPND    UNNAMED
AUTHOR    GENERATED BY OPEN BABEL 2.3.2
HETATM    6 AR   LIG     2       1.472  80.636  12.825  1.00  0.00           Ar
MASTER        0    0    0    0    0    0    0    0    1    0    1    0
END

Note

You need to have different residue names for QM and MM atoms, here "tfa" (trans-formic acid) and "LIG". This is demanded by CP2K.

You also need to put different identifiers to atoms at this point. Here we have:

Oh : hydroxyl group oxygen
Oo : carbonyl group oxygen
Ho : hydroxyl group hydrogen
Hc : hydrogen connected to carbon

Next we need to decide initial simulation box size, which depends on number of atoms in simulation. If we have 1000 argon atoms then good initial box size is 86 Å. To put the molecule in box we need extra tools. Gromacs in example has that kind of tools.

To to put the molecule into the box we use gmx editconf command

gmx editconf -f tfa.pdb -box 8.6 8.6 8.6 -o tfa-b86.pdb

Here we use 86x86x86 Å box, which is ok when using 1000-matrix atoms. But you can adjust this, depending on how many matrix atoms you have.

Next we need to add argon atoms into the box. For this we use gmx insert-molecules command

gmx insert-molecules -f tfa-b86.pdb -ci ar.pdb -nmol 1000 -o tfa-ar.pdb

And now we have the final geometry input.

CP2K input file

The best way to make CP2K input file is to split the file to several parts and using @include-commands to gather them to one file.

To do this we make

MM-potential file
QM-calculation file
QMMM-boundary file
CP2K file to gather

First we make the MM-potential file.

&MM
    &FORCEFIELD
      &CHARGE
         ATOM Ar
         CHARGE 0.0
      &END CHARGE
      &CHARGE
         ATOM C
         CHARGE 0.0
      &END CHARGE
      &CHARGE
         ATOM Oo
         CHARGE 0.0
      &END CHARGE
      &CHARGE
         ATOM Ho
         CHARGE 0.0
      &END CHARGE
      &CHARGE
         ATOM Hc
         CHARGE 0.0
      &END CHARGE
      &CHARGE
         ATOM Oh
         CHARGE 0.0
      &END CHARGE

      IGNORE_MISSING_CRITICAL_PARAMS .TRUE.
      &NONBONDED
        # Argon-argon potential from http://www.sklogwiki.org/SklogWiki/index.php/Argon
        &LENNARD-JONES
          ATOMS Ar Ar
          EPSILON [K_e] 125.7
          SIGMA [angstrom] 3.345
          RCUT [angstrom] 9.0
        &END LENNARD-JONES

        # trans Formic acid - argon
        # Just copy the potential terms from potential fit here
        &GENPOT
            atoms Ar C
            FUNCTION Cvi/(r^6) + Cviii/(r^8) + Cx/(r^10) +  Cxii/(r^12) + Cvii/(r^7) + Cix/(r^9) + Cxi/(r^11)
            VARIABLES r
            PARAMETERS Cvi Cviii Cx Cxii Cvii Cix Cxi
            VALUES -22540.021633788096  -5.243549409354186e6  -3.2406578590178587e7  1.515360096536201e8  551050.9077903178  2.266474955203762e7  -5.3132065936126545e7
            RCUT [angstrom] 9.0
        &END GENPOT
        &GENPOT
            atoms Ar Oh
            FUNCTION Cvi/(r^6) + Cviii/(r^8) + Cx/(r^10) +  Cxii/(r^12) + Cvii/(r^7) + Cix/(r^9) + Cxi/(r^11)
            VARIABLES r
            PARAMETERS Cvi Cviii Cx Cxii Cvii Cix Cxi
            VALUES 13739.083360114846  5.471838619459709e6  1.6957303994470203e8  3.7516840650039065e8  -416048.44688283757  -3.998370403083761e7  -3.898432560739321e8
            RCUT [angstrom] 9.0
        &END GENPOT
        &GENPOT
            atoms Ar Oo
            FUNCTION Cvi/(r^6) + Cviii/(r^8) + Cx/(r^10) +  Cxii/(r^12) + Cvii/(r^7) + Cix/(r^9) + Cxi/(r^11)
            VARIABLES r
            PARAMETERS Cvi Cviii Cx Cxii Cvii Cix Cxi
            VALUES 9793.326640804324  3.1394020151115586e6  7.627842607574251e7  1.3069576949556419e8  -268179.8831127547  -2.0305102118835907e7  -1.550294791017888e8
            RCUT [angstrom] 9.0
        &END GENPOT
        &GENPOT
            atoms Ar Ho
            FUNCTION Cvi/(r^6) + Cviii/(r^8) + Cx/(r^10) +  Cxii/(r^12) + Cvii/(r^7) + Cix/(r^9) + Cxi/(r^11)
            VARIABLES r
            PARAMETERS Cvi Cviii Cx Cxii Cvii Cix Cxi
            VALUES -2432.1617929808244  -633286.4451121971  -1.042711058845794e7  -1.070334816166878e7  60921.18782589104  3.4584354851854523e6  1.6481702154332923e7
            RCUT [angstrom] 9.0
        &END GENPOT
        &GENPOT
            atoms Ar Hc
            FUNCTION Cvi/(r^6) + Cviii/(r^8) + Cx/(r^10) +  Cxii/(r^12) + Cvii/(r^7) + Cix/(r^9) + Cxi/(r^11)
            VARIABLES r
            PARAMETERS Cvi Cviii Cx Cxii Cvii Cix Cxi
            VALUES 4525.254249630937  858980.4449115618  7.2762912659367565e6  -3.2064841129619866e6  -99340.84352925561  -3.6312235927495877e6  -4.349257632528463e6
            RCUT [angstrom] 9.0
        &END GENPOT

      &END NONBONDED

    &END FORCEFIELD
    &POISSON
      PERIODIC XYZ
      # CP2K needs ewald summation section in input. But for us it is useless,
      # and you can put whatever values here.
      &EWALD
        EWALD_TYPE spme
        ALPHA .44
        GMAX  40
      &END EWALD
    &END POISSON

  &END MM

Then we make a QM-file with semi-empirical calculation method to condense the system. Here we use AM1-method that is fine for formic acid.

&DFT
    BASIS_SET_FILE_NAME ${LIBDIR}/BASIS_MOLOPT
    POTENTIAL_FILE_NAME ${LIBDIR}/POTENTIAL
    &MGRID
      NGRIDS 4
      CUTOFF 350
      REL_CUTOFF 50
    &END MGRID

    &QS
       # We use semi-empirical to condence the system
       METHOD AM1
       &SE
         &COULOMB
           CUTOFF [angstrom] 10.0
         &END
         &EXCHANGE
           CUTOFF [angstrom] 10.0
         &END
      &END
    &END

    &SCF
      SCF_GUESS ATOMIC
      EPS_SCF 1.0E-6         # convergence threshold for total energy
      MAX_SCF 30
      &OUTER_SCF
        MAX_SCF 10
        EPS_SCF 1.0E-6
      &END
    &END SCF

    # Our QM-box is non periodic, so we need a Poison-solver
    &POISSON
      POISSON_SOLVER MT
      PERIODIC NONE
    &END POISSON

&END DFT

and a second file for DFT. We use BLYP-D3BJ here.

&DFT
    BASIS_SET_FILE_NAME ${LIBDIR}/BASIS_MOLOPT
    POTENTIAL_FILE_NAME ${LIBDIR}/POTENTIAL
    &MGRID
      NGRIDS 4
      CUTOFF 350
      REL_CUTOFF 50
    &END MGRID

    &QS
       METHOD GPW
       EPS_DEFAULT 1.0E-10
       EXTRAPOLATION ASPC
    &END


    &SCF
      SCF_GUESS ATOMIC
      EPS_SCF 1.0E-6         # convergence threshold for total energy
      MAX_SCF 30
      &OUTER_SCF
        MAX_SCF 10
        EPS_SCF 1.0E-6
      &END
    &END SCF

    &XC
      &XC_FUNCTIONAL BLYP
      &END XC_FUNCTIONAL
      &VDW_POTENTIAL
         POTENTIAL_TYPE PAIR_POTENTIAL
         &PAIR_POTENTIAL
            TYPE DFTD3(BJ)
            R_CUTOFF 16
            REFERENCE_FUNCTIONAL BLYP
            PARAMETER_FILE_NAME ${LIBDIR}/dftd3.dat
         &END
      &END
    &END XC

    &POISSON
      POISSON_SOLVER MT
      PERIODIC NONE
    &END POISSON

&END DFT

Then we need to make the QM-MM file

&QMMM
    CENTER_GRID TRUE

    # For formic acid 10Å box is enough
    &CELL                    
      ABC [angstrom] 10 10 10
      PERIODIC NONE
    &END CELL

    # Define QM atoms
    &QM_KIND C
      MM_INDEX 1
    &END QM_KIND
    &QM_KIND O
      MM_INDEX 2 3
    &END QM_KIND
    &QM_KIND H
      MM_INDEX 4 5
    &END QM_KIND

    # Apply periodic potential for MM system (default)
    &PERIODIC
      # We can turn multipole off, because argon has no partial charge
      &MULTIPOLE OFF   
      &END
    &END PERIODIC

&END QMMM

Note

CENTER_GRID TRUE is very important option. Without it QM system will heat up on its own and calculations will fail.

Finally we make a file that gathers all the other files. The idea here is that you only need to edit this file in the future.

@SET SYSTEM equilib
@SET LIBDIR #add path to cp2k-data directory

&GLOBAL
  PROJECT ${SYSTEM}
  PRINT_LEVEL LOW
  RUN_TYPE MD
&END GLOBAL

&FORCE_EVAL
  STRESS_TENSOR ANALYTICAL
  METHOD QMMM

  # These are the files we created previously
  @include qm-am1.inc
  @include mm.inc
  @include qmmm.inc

  &SUBSYS
    &CELL
      # box containing the system,
      # using 86x86x86 box we made earlier for the inputs
      ABC [angstrom] 86 86 86
    &END CELL
    &TOPOLOGY
      COORD_FILE_NAME tfa-ar.pdb
      COORD_FILE_FORMAT PDB
      CONN_FILE_FORMAT USER
      &GENERATE
         &ISOLATED_ATOMS
            # This need to span to all atoms in the simulation
            # We don't have any unbreakable mm-bonds in simulation
            LIST 1..1005
         &END
      &END
    &END TOPOLOGY
    &KIND Ho
      BASIS_SET TZVP-MOLOPT-GTH
      POTENTIAL GTH-BLYP-q1
      ELEMENT H
    &END KIND
    &KIND H                  
      BASIS_SET TZVP-MOLOPT-GTH
      POTENTIAL GTH-BLYP-q1
      ELEMENT H
    &END KIND
    &KIND Hc                 
      BASIS_SET TZVP-MOLOPT-GTH
      POTENTIAL GTH-BLYP-q1
      ELEMENT H
    &END KIND
    &KIND Oh
      BASIS_SET TZVP-MOLOPT-GTH
      POTENTIAL GTH-BLYP-q6
      ELEMENT O
    &END KIND
    &KIND Oo
      BASIS_SET TZVP-MOLOPT-GTH
      POTENTIAL GTH-BLYP-q6
      ELEMENT O
    &END KIND
    &KIND O
      BASIS_SET TZVP-MOLOPT-GTH
      POTENTIAL GTH-BLYP-q6
      ELEMENT O
    &END KIND
    &KIND C
      BASIS_SET TZVP-MOLOPT-GTH
      POTENTIAL GTH-BLYP-q4
    &END KIND
  &END SUBSYS

&END FORCE_EVAL

&MOTION

    &MD
        ENSEMBLE NVT
        TEMPERATURE [K] 80
        TEMP_TOL 50
        TIMESTEP [fs] 0.5
        STEPS 20000
        &THERMOSTAT
            REGION MASSIVE
            TYPE CSVR
            &CSVR
              TIMECON [fs] 100
            &END CSVR
        &END THERMOSTAT
    &END MD

    &PRINT
        &TRAJECTORY
            &EACH
                MD 20
            &END EACH
            FORMAT DCD
        &END TRAJECTORY
        &RESTART_HISTORY
            &EACH
                MD 5000
            &END EACH
        &END RESTART_HISTORY
    &END PRINT

&END MOTION

Molecular dynamics

Overview here is to start from gas phase and have the system to condense to solid matrix. To do this the simulation consist of 4 phases.

Equilibration from initial coordinates to some kind of gas.
Condensing gas solid.
Changing calculation method from semi-empirical to DFT and equilibrate it.
Calculate the spectrum.

The first equilibration

Aim here is to create a super cooled gas that will compress to solid/liquid in the next phase. To do this we need to set simulation temperature to lower than boiling point. For argon 80 K is a good temperature. This can be done with the input file that was constructed earlier.

Condensing

To condense the system we need to change the dynamics to NTP, so that the external pressure will compress it. To do it we'll use following input file

@SET SYSTEM condence
@SET LIBDIR # add path to cp2k-data directory

&GLOBAL
  PROJECT ${SYSTEM}
  PRINT_LEVEL LOW
  RUN_TYPE MD
&END GLOBAL

&FORCE_EVAL
  STRESS_TENSOR ANALYTICAL
  METHOD QMMM


  @include qm-am1.inc
  @include mm.inc
  @include qmmm.inc

  @include subsys.inc


&MOTION

    &MD
        ENSEMBLE NPT_I
        TEMPERATURE [K] 20
        TIMESTEP [fs] 0.5
        STEPS 200000
        &THERMOSTAT
            REGION MASSIVE
            TYPE CSVR
            &CSVR
              TIMECON [fs] 250
            &END CSVR
        &END THERMOSTAT
        &BAROSTAT
             PRESSURE [bar] 1.0
             TEMPERATURE [K] 20
             TEMP_TOL [K] 20
             TIMECON [fs] 250
        &END BAROSTAT
    &END MD

    &PRINT
        &TRAJECTORY
            &EACH
                MD 40
            &END EACH
            FORMAT DCD
        &END TRAJECTORY
        &RESTART_HISTORY
            &EACH
                MD 50000
            &END EACH
        &END RESTART_HISTORY
    &END PRINT

&END MOTION

Note

The rate at which the system condenses is heavily affected by thermostats coupling constant, here 250 fs. The system will condense slower, if the constant is smaller. 250 fs has proven to be a good value for many different systems.

This input file has additional file subsys.inc that includes &SUBSYSTEM section form the previous run. You can get the section by using grep on restart file.

grep -A 10000 "&SUBSYS"

In this phase we also set the final temperature, now 20 K.

Change to DFT

Next we'll change QM calculation method to DFT. To do this we'll use slightly modified input file from the previous phase

@SET SYSTEM dftequilib
@SET LIBDIR # add path to cp2k-data directory

&GLOBAL
  PROJECT ${SYSTEM}
  PRINT_LEVEL LOW
  RUN_TYPE MD
&END GLOBAL

&FORCE_EVAL
  STRESS_TENSOR ANALYTICAL
  METHOD QMMM


  @include qm-blyp-d3bj.inc
  @include mm.inc
  @include qmmm.inc

  @include condensed_subsys.inc


&MOTION

    &MD
        ENSEMBLE NPT_I
        TEMPERATURE [K] 20
        TIMESTEP [fs] 0.5

        # You can increase this if needed
        STEPS 10000
        &THERMOSTAT
            REGION MASSIVE
            TYPE CSVR
            &CSVR
              # This constant is not that important anymore
              TIMECON [fs] 25
            &END CSVR
        &END THERMOSTAT
        &BAROSTAT
             PRESSURE [bar] 1.0
             TEMPERATURE [K] 20
             TEMP_TOL [K] 20
             TIMECON [fs] 250
        &END BAROSTAT
    &END MD

    &PRINT
        &TRAJECTORY
            &EACH
                MD 40
            &END EACH
            FORMAT DCD
        &END TRAJECTORY
        &RESTART_HISTORY
            &EACH
                MD 5000
            &END EACH
        &END RESTART_HISTORY
    &END PRINT

&END MOTION

IR spectrum calculation

To calculate spectrum we need dipole moments, to do this we need to add command to print them to our QM-file

&DFT
    BASIS_SET_FILE_NAME ${LIBDIR}/BASIS_MOLOPT
    POTENTIAL_FILE_NAME ${LIBDIR}/POTENTIAL
    &MGRID
      NGRIDS 4
      CUTOFF 350
      REL_CUTOFF 50
    &END MGRID

    &QS
       METHOD GPW
       EPS_DEFAULT 1.0E-10
       EXTRAPOLATION ASPC
    &END


    &SCF
      SCF_GUESS ATOMIC
      EPS_SCF 1.0E-6         # convergence threshold for total energy
      MAX_SCF 30
      &OUTER_SCF
        MAX_SCF 10
        EPS_SCF 1.0E-6
      &END
    &END SCF

    &XC
      &XC_FUNCTIONAL BLYP
      &END XC_FUNCTIONAL
      &VDW_POTENTIAL
         POTENTIAL_TYPE PAIR_POTENTIAL
         &PAIR_POTENTIAL
            TYPE DFTD3(BJ)
            R_CUTOFF 16
            REFERENCE_FUNCTIONAL BLYP
            PARAMETER_FILE_NAME ${LIBDIR}/dftd3.dat
         &END
      &END
    &END XC

    &POISSON
      POISSON_SOLVER MT
      PERIODIC NONE
    &END POISSON

    &PRINT
      &MOMENTS
       PERIODIC FALSE
       &EACH
         MD 1
       &END
       MAX_MOMENT 1
       FILENAME =dipole.traj
      &END MOMENTS
     &END

&END DFT

We'll also need to edit input file to change the mode to NVE

@SET SYSTEM spectrum
@SET LIBDIR # add path to cp2k-data directory

&GLOBAL
  PROJECT ${SYSTEM}
  PRINT_LEVEL LOW
  RUN_TYPE MD
&END GLOBAL

&FORCE_EVAL
  STRESS_TENSOR ANALYTICAL
  METHOD QMMM


  @include qm-blyp-d3bj-dipoles.inc
  @include mm.inc
  @include qmmm.inc

  @include dft_subsys.inc


&MOTION

    &MD
        ENSEMBLE NVE
        TIMESTEP [fs] 0.5
        STEPS 80000
    &END MD

    &PRINT
        &TRAJECTORY
            &EACH
                MD 5
            &END EACH
            FORMAT DCD
        &END TRAJECTORY
        &RESTART_HISTORY
            &EACH
                MD 5000
            &END EACH
        &END RESTART_HISTORY
    &END PRINT

&END MOTION

This will create dipole.traj-file that contains dipole moments. You can use IRSpectrum.jl to calculate IR-spectrum form the dipole trajectory file.