Test case \(\ce{LiFePO4}\)#

In this section, the existing data of \(\ce{LiFePO4}\) search and check if the low energy structures can be find by relaxation using M3GNet.

Hide code cell source
import os
os.environ["CUDA_VISIBLE_DEVICES"]="-1"    

import warnings
from pathlib import Path
from ase.io import read
import ase
# from m3gnet.models import Relaxer
# from pymatgen.ext.matproj import MPRester

import pandas as pd
from tqdm import tqdm
tqdm = lambda x: x


from ase.io import read
from pymatgen.io.ase import AseAtomsAdaptor
from pymatgen.core import Structure
Hide code cell source
# Load pre-computed relaxation results
lfp_mp = Structure.from_file("LiFePO4-mp-19017.vasp")
lfp_cmcm = Structure.from_file("LiFePO4-mp-18951.vasp")

Load pre-computed results

lfp_relaxed = read("exp-m3gnet/LiFePO4-mp-19017.res")
lfp_relaxed_energy = lfp_relaxed.info['energy'] / len(lfp_relaxed)
lfp_relaxed_volume = lfp_relaxed.get_volume() / len(lfp_relaxed)

lfp_cmcm_relaxed = read("exp-m3gnet/LiFePO4-mp-18951.res")
lfp_cmcm_relaxed_energy = lfp_cmcm_relaxed.info['energy'] / len(lfp_cmcm_relaxed)
lfp_cmcm_relaxed_volume = lfp_cmcm_relaxed.get_volume()/ len(lfp_cmcm_relaxed)
Hide code cell source
def load_dataset(names):
    """Load dataset"""
    cells = [read(x) for x in  tqdm(names)]
    for atoms, name in zip(cells, input_names):
        atoms.info['fname'] = name.stem

    dataset = []
    for atoms in cells:
        dataset.append(
        {
            'atoms': atoms,
            'label': atoms.info['fname'],
            'energy': atoms.info['energy'],
            'energy_per_atom': atoms.info['energy'] / len(atoms),
            'volume': atoms.get_volume(),
            'volume_per_atom': atoms.get_volume() / len(atoms),

        }
        )

    return pd.DataFrame(dataset).sort_values('energy_per_atom')

def show_compact(df):
    return df[['label', 'energy_per_atom', 'volume_per_atom']]

Load calculated data:

Hide code cell source
# AIRSS Search with CASTEP QC5
input_names = list(Path("with-u-rerun").glob("*.res"))
df_dft = load_dataset(input_names)

# AIRSS Search with structures relaxed buy M3GNet
input_names = list(Path("with-u-rerun-m3gnet").glob("*.res"))
df_m3g = load_dataset(input_names)

# AIRSS Search with structures relaxed buy M3GNet
input_names = list(Path("with-u-rerun-relaxed-m3gnet").glob("*.res"))
df_relaxed_m3g = load_dataset(input_names)

# AIRSS Search results recalculated using VASP PBE + U
input_names = list(Path("with-u-rerun-refine").glob("*.res"))
df_dft_refine = load_dataset(input_names)

# Extra AIRSS search (new structure from buildcell) relaxed with M3GNet focusing on four formula units
input_names = list(Path("with-u-rerun-extras-m3gnet/").glob("*.res"))
df_m3g_extras = load_dataset(input_names)

Energy - Volume distribution#

Here, we plot the energy per atom against the volume per atom. Such plots are useful for checking the search results and how the relaxed structure distribute in the configuration space. Very often the unique structure can be identified this way, as different polymorphs tend to have different volumes

The input structures relaxed by M3GNet are those used in the previous search. Experimental structures as labelled with diamond and they are also relaxed using M3GNet. There are two experimental structures, \(Pnma\) (olivine) and \(Cmcm\) (high-temperature / pressure phase). The orange dots represent the structure that are first relaxed by DFT, then re-relaxed by M3GNet.

Hide code cell source
def add_labels(ax):
    """Add axis labels"""
    ax.set_ylabel('Energy per atom (eV)')
    ax.set_xlabel(r'Volume per atom ($\mathrm{\AA^3}$)')

ax = df_m3g.plot.scatter('volume_per_atom', 'energy_per_atom', label='m3gnet')
ax = df_relaxed_m3g.plot.scatter('volume_per_atom', 'energy_per_atom', label='DFT relaxed - m3gnet', color='C1', ax=ax,s=3)

ax.scatter( [lfp_relaxed_volume, lfp_cmcm_relaxed_volume], [lfp_relaxed_energy, lfp_cmcm_relaxed_energy],  marker='d', color='r', label='Experimental')
ax.set_xlim(10, 15)
ax.set_ylim(-6.85, -6.6)
ax.legend(loc=1)
ax.set_title('M3GNet LiFePO4 AIRSS rerun')
add_labels(ax)
_images/c41e74668c13a3366fb0f38f85c05884e43f34627d33f306c4a10d6d0b66ac23.png

It appears that the M3GNet search is capable of finding the \(Cmcm\) structure. But for the \(Pnma\) phase it is not so clear. There are two structure that can be the \(Pnma\) structure (close to the lowest energy diamond) upon inspection. However, they are not identical to the actual \(Pnma\) structure after relaxation. Interestingly, relaxing the \(Pnma\) structure from the DFT search gives the same result as starting from the experimental \(Pnma\). This confirms that the former is indeed the ground state structure.

One possible explanation of such strange behaviour is that relaxation is stuck due to a noisy potential energy surface.

The top three structures in the DFT search are the \(Pnma\) structure are tabulated below:

Hide code cell source
show_compact(df_dft.iloc[:3])
label energy_per_atom volume_per_atom
2367 4LFP_actual-201127-012020-107264 -433.948175 10.498721
1135 4LFP_actual-201126-142402-eb7dce -433.945196 10.428370
164 4LFP_actual-201126-144638-9aee88 -433.940921 10.493866

The table below shows the top three structures in the M3GNet search. The first structure is in fact very close to the \(Pnma\) structure. The second and the third structure has much larger volume, however, which are potentially from an false minimum.

Hide code cell source
show_compact(df_m3g.iloc[:3])
label energy_per_atom volume_per_atom
164 4LFP_actual-201126-144638-9aee88 -6.808464 10.955443
1651 4LFP_actual-201126-194829-b3779f -6.805621 12.959209
1252 4LFP-201124-120454-c024e6 -6.804693 12.802004

Top three entries in the M3GNet-relaxed-DFT-optimised structure, note that they are the same as the top three in the DFT dataset above.

Hide code cell source
show_compact(df_relaxed_m3g.iloc[:3])
label energy_per_atom volume_per_atom
164 4LFP_actual-201126-144638-9aee88 -6.812143 10.830344
2367 4LFP_actual-201127-012020-107264 -6.811954 10.808982
1135 4LFP_actual-201126-142402-eb7dce -6.811675 10.769253

Due to the low cost of M3GNet relaxation, we can afford to sample much more structures than DFT, even if the latter already uses low-cost setting focusing on the speed. But this time the \(Pnma\) structure is not found, maybe it is just bad luck? The search includes random structures containing 4 formula units only.

Search settings for random structure generation

%BLOCK LATTICE_CART
10.452988  0.000000  0.000000
 0.000000  6.086495  0.000000
 0.000000  0.000000  4.753437
%ENDBLOCK LATTICE_CART

#TARGVOL=80

%BLOCK POSITIONS_ABS
Li  0.000000  0.000000  0.000000
O  4.217576  4.564871  5.905294 #  PO4-1 %  
O  3.494553  3.329881  3.736885 #  PO4-1 %
O  3.494553  5.799862  3.736885 #  PO4-1 % 
O  5.672957  4.564871  3.771439 #  PO4-1 % 
P  4.236933  4.564871  4.367667  #  PO4-1 % 
Fe  8.168949  1.521624  2.490639 
%ENDBLOCK POSITIONS_ABS

#SYMMOPS=2-4
###SGRANK=20
#NFORM=4
#SLACK=0.25
#OVERLAP=0.1
#COMPACT
#CELLADAPT
#MINSEP=1.0 Li-Li=3.04 Li-O=2.11 Li-P=2.69 Li-Fe=3.31 O-O=2.47 O-P=1.54 O-Fe=2.10 P-P=3.71 P-Fe=2.87 Fe-Fe=3.92
#ADJGEN=0-1
#SPIN=4 4 Fe

KPOINTS_MP_SPACING 0.07
 
SYMMETRY_GENERATE
SNAP_TO_SYMMETRY
 
%BLOCK SPECIES_POT
QC5
%ENDBLOCK SPECIES_POT
 
%BLOCK EXTERNAL_PRESSURE
0 0 0
0 0
0
%ENDBLOCK EXTERNAL_PRESSURE
Hide code cell source
ax = df_m3g_extras.plot.scatter('volume_per_atom', 'energy_per_atom', label='m3gnet')

ax.scatter( [lfp_relaxed_volume, lfp_cmcm_relaxed_volume], [lfp_relaxed_energy, lfp_cmcm_relaxed_energy],  marker='d', color='r', label='Experimental')
ax.set_xlim(10, 15)
ax.set_ylim(-6.85, -6.6)
ax.legend(loc=1)
ax.set_title('M3GNet LiFePO4 extra cases')
add_labels(ax)
_images/e2ca98035c026bf31eef3ba922b38cdccc34a5f9b591c9106638818d4ed1a91f.png

The top structure has an increased volume this time.

show_compact(df_m3g_extras.iloc[:4])
label energy_per_atom volume_per_atom
3455 4LFP-extras-2879 -6.803350 12.987641
2307 4LFP-extras-3721 -6.801261 10.452332
3898 4LFP-extras-2798 -6.801075 10.475961
4133 4LFP-extras-2148 -6.800354 10.459634

Results of the DFT PBE + U search using CASTEP with the QC5 pseudopotential is shown below.

Hide code cell source
ax = df_dft.plot.scatter('volume_per_atom', 'energy_per_atom', xlim=(9, 14), ylim=(-434, -433.75), label='PBE + U Search')
ax.legend(loc=1)
add_labels(ax)
_images/1d3f033cf9a85c804e820357f3022b698af85dcacd4c2e04a24135479ea03463.png

Both the \(Pnma\) and the \(Cmcm\) phases have been found. The DFT energies have large spreads due to the coarse calculation settings used. As we show above, the top three lowest energy structures are indeed the \(Pnma\) phase.

Below is a plot for the structures from the AIRSS search, but re-relaxed using VASP PBE + U. The settings used the different from those in the Materials Project (used for M3GNet training) hence the energies are not comparable.

Nevertheless, the low energy \(Pnma\) and \(Cmcm\) phases can be identified.

Hide code cell source
ax = df_dft_refine.plot.scatter('volume_per_atom', 'energy_per_atom', label='PBE + U VASP')
ax.legend(loc=1)
ax.set_xlim(10, 15)
ax.set_ylim(-6.90, -6.65)
add_labels(ax)
_images/345326794948c2ed6adbac77961ce35a0c163269c43579385804e81cf03b1164.png

Energy distribution of the relaxed structures#

In this section, we compare the distributions of the energy in the relaxed structures. This gives some hint of distribution of the local minima in the potential energy surface. Very different distributions mean the potential energy surfaces are very different.

Hide code cell source
import matplotlib.pyplot as plt

engs = df_m3g['energy_per_atom'].values.copy()
engs_m3g = engs - lfp_relaxed_energy
engs = df_m3g_extras['energy_per_atom'].values.copy()
engs_m3g_extras = engs - lfp_relaxed_energy

engs = df_dft['energy_per_atom'].values.copy()
engs_dft = engs - engs.min()

plt.hist(engs_m3g, bins=100,alpha=0.5,density=True, label='m3g');
plt.hist(engs_m3g_extras, range=(0,1), bins=100,alpha=0.5,density=True, label='m3g-extras-4fu');

plt.hist(engs_dft, bins=100, range=(0,1), alpha=0.5, density=True, label='PBE + U')
plt.legend()
plt.xlabel('Energy per atom (eV)')
plt.ylabel('Probability density');
_images/3a95c123f0993367dceac4da68aca3b598884d334fda3da412ca10ec84e50e05.png

The two distributions, one from the DFT search and the other from M3GNet, do not differ much, indicating the M3GNet has done a good job here reproducing the energy landscape of the DFT.

Discussion#

For \(\ce{LiFePO4}\) M3GNet is capable of locating the low energy \(Cmcm\) phase as well as the \(Pnma\) phase, although there seems to be some complication with the latter, but it is very likely the true DFT ground state can be recovered after further DFT relaxation.

So can we use M3GNet for searching? Probably fine for \(\ce{LiFePO4}\), but are something one should note:

  • M3GNet appears to generate some false minima, which is unlikely to happen for DFT calculation despite the use of coarse settings.

  • On the other hand, M3GNet has much reduced calculation costs hence much more structures may be sampled. The true ground state can still be recovered even if there are false minima.

  • One should note that M3GNet is built from a vast database of DFT calculations which already contains the target structures, while a DFT search does not rely on any pre-existing data apart from some estimate specie-wise distances.

  • The Materials Project contains many data for oxide and phosphates, in addition it also has relative large set of data for \(\ce{LiFePO4}\) polymorphs.

Hence, further tests are need to answer if M3GNet work well where the ground state is not included in the training set.