enabling data compression in AiiDA 2.0

AiiDA 2.0 introduces the new storage format. Rather than placing files individually in sharded folders, an object storage is used. Its storage backend stores the data and allows efficient retrieval of the data using the hash computed from its content. By default, the disk-objectstore is used for this. The filenames and folder structures are now stored inside the database instead.

This backend stores data in two ways. First, newly added files are stored as plain files on the disk, with the filename being its hash. This allows fully concurrent writing/reading. The storage can be optimized by concatenating these loose files into a single packed file, and the offsets and lengths for reading the data out is stored in a SQLite database. Optionally, when concatenating, the data stream may be compressed. Compression can lead to significant space saving for typical text output of DFT codes. While the verdi storage maintain command can be used to perform this, it is rather conservative and does not enable compression. To get around this, one can use dostore optimize to perform this task. Currently, disk-objectstore does not support recompressing packed objects, so one may get stuck with many uncompressed streams. In addition, when migrating from AiiDA v1, the migrated data is not compressed either. To force the compression, the following line should be changed in aiida/storage/psql_dos/migrations/utils/utils.py:

    hashkeys = container.add_streamed_objects_to_pack(streams, compress=False, open_streams=True)

    hashkeys = container.add_streamed_objects_to_pack(streams, compress=True, open_streams=True)

Do note that enabling compression would make the migration slower.

Enjoy Reading This Article?