Easily archive entire directories in whichever manner you wish.
Shell 81.4%
Nix 18.6%
4 1 0

Clone this repository

https://tangled.org/stau.space/archiver
git@tangled.org:stau.space/archiver

For self-hosted knots, clone URLs may differ based on your setup.

README.md

Archiver#

Archiver is a bash script that helps you backup whatever you want, however you want.

Usage#

For example: archive.json

[
  {
    "name": "Wallpapers",
    "target": "~/Media/Pictures/Wallpapers",
    "archive": {
      "name": "walls",
      "destination": "~/OneDrive/Wallpapers"
    },
    "timestamps": {
      "last_archive": 1712438594,
      "last_upload": 1712438693
    },
    "sync_command": "onedrive --synchronize --single-directory 'Wallpapers'",
    "md5sum": "6de26f11ad638fd145f3d1412e0bf1c6"
  },
  {
    "name": "Books",
    "target": "~/Documents/Books",
    "archive": {
      "name": "books",
      "destination": "~/OneDrive/Books"
    },
    "timestamps": {
      "last_archive": 1712439022,
      "last_upload": 1712439638
    },
    "sync_command": "onedrive --synchronize --single-directory 'Books'",
    "md5sum": "b4ae6185bb5a20d19c0b30f9778a10cb"
  }
]

You specify list of attribute sets with three key parts:

  • target: the target directory you wish to backup
  • archive: the name of the archive and where you want to store the archive
  • sync_command: the command you wish to use to back up this specific directory

Other details like timestamps and name are useful for other purposes if you wish to climb under the hood to use them.

The MD5-Sum is also useful if you wish to verify the legitimacy of your files after retrieving them.

PS: an example archives.json is provided.

Methodology#

  1. Your target gets converted into a tarball.
  2. That tarball is compressed into an xz archive.
    • This format was chosen because of its excellent compression ratio.
    • Though in the future I would like to implement multiple formats for this.
  3. A parity archive is created from that compressed tarball.
    • Uses the par2cmdline utilities.
    • A single block file with 30% redundancy is created.
    • Additionally, you can use the index file that's created but par2 doesn't really need it.
  4. A unix timestamp and MD5-Sum is taken from the archived tarball.
  5. Your sync_command hook is run at the end and a secondary timestamp is taken at the end of this.
  6. Your archives.json file is updated with all of the fresh timestamps and MD5-Sum.

dependencies#

  • tar
  • xz
  • stat
  • par2create (from par2cmdline package)
  • md5sum
  • du
  • awk
  • jq

TODO#

  • Multiple archiving formats.
  • Store archives.json in a user-specified directory.
  • Multiple block file parity archive.
  • Multi-threaded archiving.