Bonan Zhu Computational Materials Scientist at UCL

enabling data compression in AiiDA 2.0

AiiDA 2.0 introduces the new storage format. Rather than placing files individually shard folders, an object storage is used. It includes storage backend stores the data and allows efficient retrival of the data using the hash computed from its content. By default, the disk-objectstore is used for this. The filenames and folder structures are now stored inside the database instead.

This backend stores writes in two ways. First, newly added files are stored as plain files on the disk, with the filename being its hash. This allows fully concurrent writing/reading. The storage can be optimized by concatenating these loose files into a single packed file, and the offsets and lengths for reading the data out is stored in a SQLite database. Optionally, when concatenating, the data stream may be compressed. Compression can lead to significant spacing saving for typical text output of DFT codes. While the verdi storage maintain command can be used to perform this, it is rather conservative and does not enable compression. To get around this, one can use dostore optimize to perform this task. Current, disk-objectstore does not support recompressing packed objects, so one may stuck with many uncompressed streams. In addition, when migrating from AiiDA v1, the migrated data is not compressed either. To force the compression, the following line should be changed in aiida/storage/psql_dos/migrations/utils/utils.py:

    hashkeys = container.add_streamed_objects_to_pack(streams, compress=False, open_streams=True)

to

    hashkeys = container.add_streamed_objects_to_pack(streams, compress=True, open_streams=True)

Do note that enabling compression would make the migration slower.