Algebraicfile Format Specification

Overview

This specification describes the algebraicfile file format—a file format for storing or transmitting encrypted data. Typically, an algebraicfile contains encrypted data corresponding to a single file on a file system. The design of the file format allows for an algebraicfile to be read and written as a stream with constant in-memory overhead.

The specification describes version 5 of the format, which is the current version. Programs and packages that read algebraicfile-formatted files should support the latest and all previous versions. Readers must return an error if a version is unknown or unsupported. Writers may support only the latest version.

The recommended filename extension is .algebraic. The Uniform Type Identifier is org.littleroot.algebraicfile.

Each algebraicfile has 6 sections:

Section Length Encryption
1 Identifier 6 byte none
2 Header 57 byte none
3 Metadata variable, non-zero XChaCha20-Poly1305
4 Filler variable, possibly zero unspecified
5 Data variable, non-zero Libsodium crypto_secretstream
6 Checksum 32 byte none
Algebraic encrypted file icon on macOS
Algebraic encrypted file icon on macOS

Example Algebraicfile

The example below uses an original source file named hello.txt with the contents "hello, world\n".

% cat hello.txt
hello, world
% xxd hello.txt
00000000: 6865 6c6c 6f2c 2077 6f72 6c64 0a         hello, world.
%

The file hello.txt was encrypted (with the option turned on to obfuscate the true length of the original data) into an algebraicfile named hello.txt.algebraic. Each section of the resulting algebraicfile is discussed below.

% xxd hello.txt.algebraic
00000000: 0c75 0d05 0e05 4d77 0805 b407 4a52 714c  .u....Mw....JRqL
00000010: 9d28 1a11 5bed 0000 0001 0040 0000 0826  .(..[......@...&
00000020: dd45 b83f 8a34 4f41 2c95 831e adc7 9c1d  .E.?.4OA,.......
00000030: 186d dce0 8dd4 7d00 0000 0000 0001 3528  .m....}.......5(
00000040: 94f3 4e1e 70c1 8c17 9a02 795f a817 aaa3  ..N.p.....y_....
00000050: c20a 04a3 cc47 55c6 03f2 4775 bf87 1caa  .....GU...Gu....
00000060: d60c 4ee5 0723 dd24 5794 c240 8108 a45a  ..N..#[email protected]
00000070: 1fc5 c778 2b8e 7656 a394 6094 d0b9 1e89  ...x+.vV..`.....
00000080: a1c0 49a3 fc9c fd79 b1f6 6a11 0a4e 02bb  ..I....y..j..N..
00000090: 943a c4bc a12f 627b 479d 2890 c83f 1b95  .:.../b{G.(..?..
000000a0: f021 39f2 ad04 b6c6 2ab5 443d c236 3c23  .!9.....*.D=.6<#
000000b0: 0864 4412 c4e4 4e98 b7e5 ddd9 f7bb 698b  .dD...N.......i.
000000c0: 732e f0ae ecca f1a7 1b0f 2994 c855 efc3  s.........)..U..
000000d0: 28fe 7b1a 1481 3dfe 9c00 ec10 ff29 f27b  (.{...=......).{
000000e0: 7c3f caef 5b63 e2a2 2c8e f15f f651 0b9b  |?..[c..,.._.Q..
000000f0: 8250 7559 a066 4bc9 22f6 148d 7437 377b  .PuY.fK."...t77{
00000100: 784b ffd9 97ac d8b6 4d74 882c c13e 2270  xK......Mt.,.>"p
00000110: 71ee 34be 4c62 7c89 6c9f 2052 b117 6b50  q.4.Lb|.l. R..kP
00000120: cafd df39 fa63 9e5f 8d19 35f3 6305 ae8f  ...9.c._..5.c...
00000130: 0773 9834 d7d9 d5c4 b128 05dc f878 f570  .s.4.....(...x.p
00000140: 9c84 9e7e 5ae4 7ef3 da19 04e2 c6ae e915  ...~Z.~.........
00000150: 78a6 6999 506f 75f5 6d56 bde3 2d2f cf13  x.i.Pou.mV..-/..
00000160: 29d8 e045 6594 776a 4542 9d63 9dec b534  )..Ee.wjEB.c...4
00000170: 3921 25d0 6999 949c 2d49 67b8 02bd 39cc  9!%.i...-Ig...9.
00000180: d2fe 71a4 4333 3472 8a2b cf73 143a 7a52  ..q.C34r.+.s.:zR
00000190: 1f79 051c b244 3489 9659 18ac 7be6 7724  .y...D4..Y..{.w$
000001a0: 8bdf 7a62 1e77 2e2b 251e 05f5 2dfb df22  ..zb.w.+%...-.."
000001b0: 335c b52d b61c edfc 1d2e cb5c f70c 2600  3\.-.......\..&.
000001c0: a404 e458 f7db 0a1a cf21 dd              ...X.....!.
%

File Structure

1. Identifier section

The Identifier section consists of the following struct binary-encoded in big-endian order.

struct {
    magic   [5]byte
    version uint8
}

The magic value is 0x0c 0x75 0x0d 0x05 0x0e. The version field is the algebraicfile format version as an integer; for example, for version 5 of the format the value is 0x05. Programs that read an algebraicfile should read the Identifier section, and based on the version number adjust their parsing behavior for the remaining sections.

In the example algebraicfile xxd output from earlier, the Identifier section is these bytes:

00000000: 0c75 0d05 0e05 4d77 0805 b407 4a52 714c  .u....Mw....JRqL
- SNIP -

2. Header section

The Header section consists of the following struct binary-encoded in big-endian order.

struct {
    salt           [16]byte
    time           uint32
    mem            uint32
    threads        uint8
    metadata_nonce [24]byte
    nextlen        int64
}

The salt, time, mem, and threads fields are parameters for Argon2id key derivation. The mem value must be in unit kibibyte (KiB). The metadata_nonce field is the nonce for encryption of the Metadata section. The nextlen field represents the length in bytes of the variable-length Metadata section that follows this section.

In the example algebraicfile xxd output from earlier, the Header section is these bytes:

00000000: 0c75 0d05 0e05 4d77 0805 b407 4a52 714c  .u....Mw....JRqL
00000010: 9d28 1a11 5bed 0000 0001 0040 0000 0826  .(..[......@...&
00000020: dd45 b83f 8a34 4f41 2c95 831e adc7 9c1d  .E.?.4OA,.......
00000030: 186d dce0 8dd4 7d00 0000 0000 0001 3528  .m....}.......5(
- SNIP -

which breaks down to the hex field values:

salt            4d770805b4074a52714c9d281a115bed
time            00000001
mem             00400000
threads         08
metadata_nonce  26dd45b83f8a344f412c95831eadc79c1d186ddce08dd47d
nextlen         0000000000000135

3. Metadata section

The Metadata section consists of a JSON-encoded object, encrypted and authenticated with XChaCha20-Poly1305. The section byte size includes the Poly1305 authentication tag (in other words, the AEAD overhead). The nonce for the encryption is the metadata_nonce value in the Header section. The encryption key is derived by hashing a user-supplied password with Argon2id; the parameters for Argon2id must match the values in the Header section.

The section largely consists of metadata of the original file. The structure of the JSON-encoded object is:

{
    cp: string // packed copyfile(3) data, base64-encoded.
    fl: number // length of the Filler section, int64.
    m:  number // file mode bits, uint32; see Go type fs.FileMode for format.
    n:  string // filename, final path element only, base64-encoded.
    l:  string // linkname, present iff original file is a symbolic link, base64-encoded.
    u:  number // file uid; int64.
    g:  number // file gid; int64.
    mt: number // file modification time; int64.
    at: number // file access time; int64.
    ct: number // file change time; int64.
    bt: number // file birth time; int64.
    cs: number // chunk size, in bytes, to use with libsodium crypto_secretstream functions; int64.
}

Details that apply to all fields:

All fields, except the cs field, are optional in the encoded JSON. For example, if an algebraicfile represents encrypted in-memory data, then fields such as the original file's name, its file mode bits, and its modification time are not applicable and hence can be omitted.

If a field's value is unavailable or invalid, writers must omit the property in its entirety in the encoded JSON. Readers must use "nil", "empty", or "zero" values for missing fields when decoding JSON. Readers must take into account integer precision and sign requirements when decoding numbers from JSON. Readers must skip without error unknown properties present in the encoded JSON.

Details for specific fields:

The fl field represents the length in bytes of the variable-length Filler section that follows this section. Note that if the property does not exist in the encoded JSON, readers must consider the value to be zero.

The l field represents the target name for a symbolic link. It must be present if and only if the original file corresponding to an algebraicfile is a symbolic link.

The cp field consists metadata about the original file. The value is the base64-encoded result of copyfile(3) called with flags COPYFILE_ACL | COPYFILE_XATTR | COPYFILE_PACK. Writers should omit the field if the value cannot be constructed (e.g. because copyfile(3) isn't available).

In the example algebraicfile xxd output from earlier, the Metadata section is the following 309 encrypted bytes—the length having been specified by the nextlen field in the Header section.

- SNIP -
00000030: 186d dce0 8dd4 7d00 0000 0000 0001 3528  .m....}.......5(
00000040: 94f3 4e1e 70c1 8c17 9a02 795f a817 aaa3  ..N.p.....y_....
00000050: c20a 04a3 cc47 55c6 03f2 4775 bf87 1caa  .....GU...Gu....
00000060: d60c 4ee5 0723 dd24 5794 c240 8108 a45a  ..N..#[email protected]
00000070: 1fc5 c778 2b8e 7656 a394 6094 d0b9 1e89  ...x+.vV..`.....
00000080: a1c0 49a3 fc9c fd79 b1f6 6a11 0a4e 02bb  ..I....y..j..N..
00000090: 943a c4bc a12f 627b 479d 2890 c83f 1b95  .:.../b{G.(..?..
000000a0: f021 39f2 ad04 b6c6 2ab5 443d c236 3c23  .!9.....*.D=.6<#
000000b0: 0864 4412 c4e4 4e98 b7e5 ddd9 f7bb 698b  .dD...N.......i.
000000c0: 732e f0ae ecca f1a7 1b0f 2994 c855 efc3  s.........)..U..
000000d0: 28fe 7b1a 1481 3dfe 9c00 ec10 ff29 f27b  (.{...=......).{
000000e0: 7c3f caef 5b63 e2a2 2c8e f15f f651 0b9b  |?..[c..,.._.Q..
000000f0: 8250 7559 a066 4bc9 22f6 148d 7437 377b  .PuY.fK."...t77{
00000100: 784b ffd9 97ac d8b6 4d74 882c c13e 2270  xK......Mt.,.>"p
00000110: 71ee 34be 4c62 7c89 6c9f 2052 b117 6b50  q.4.Lb|.l. R..kP
00000120: cafd df39 fa63 9e5f 8d19 35f3 6305 ae8f  ...9.c._..5.c...
00000130: 0773 9834 d7d9 d5c4 b128 05dc f878 f570  .s.4.....(...x.p
00000140: 9c84 9e7e 5ae4 7ef3 da19 04e2 c6ae e915  ...~Z.~.........
00000150: 78a6 6999 506f 75f5 6d56 bde3 2d2f cf13  x.i.Pou.mV..-/..
00000160: 29d8 e045 6594 776a 4542 9d63 9dec b534  )..Ee.wjEB.c...4
00000170: 3921 25d0 6999 949c 2d49 67b8 02bd 39cc  9!%.i...-Ig...9.
- SNIP -

4. Filler section

The Filler section may be used to increase the size of an algebraicfile, in order to obfuscate the true length of the data. The number of bytes in the section must match the fl field in the Metadata section. The bytes must be indistinguishable from any actual encrypted data.

Readers may ignore the Filler section, by discarding or seeking past fl bytes after the Metadata section.

The Filler section can have zero length.

In the example algebraicfile xxd output from earlier, the Filler section is exactly 1 byte—the length would have been indicated by the encrypted fl field in the Metadata section.

- SNIP -
00000170: 3921 25d0 6999 949c 2d49 67b8 02bd 39cc  9!%.i...-Ig...9.
- SNIP -

5. Data section

The Data section is all bytes after the Filler section but before the final, fixed-length Checksum section. The section consists of the original source file's data, encrypted and authenticated using libsodium's crypto_secretstream API. The initial key provided to crypto_secretstream_xchacha20poly1305_init_push must be the same key used to encrypt the Metadata section.

The chunk size for crypto_secretstream_xchacha20poly1305_push calls must match the cs value from the Metadata section. Writers may write data that is fewer than cs bytes in the final call.

Writers may omit the Data section entirely when there is zero file data; doing so can save 41 bytes (24 bytes for the init_push header + 17 bytes for the push of the empty, final-tagged message).

In the example algebraicfile xxd output from earlier, the Data section is these 54 encrypted bytes:

- SNIP -
00000170: 3921 25d0 6999 949c 2d49 67b8 02bd 39cc  9!%.i...-Ig...9.
00000180: d2fe 71a4 4333 3472 8a2b cf73 143a 7a52  ..q.C34r.+.s.:zR
00000190: 1f79 051c b244 3489 9659 18ac 7be6 7724  .y...D4..Y..{.w$
000001a0: 8bdf 7a62 1e77 2e2b 251e 05f5 2dfb df22  ..zb.w.+%...-.."
- SNIP -

6. Checksum section

The 32-byte checksum is the SHA-256 sum of all the bytes in the preceding sections. Readers may forgo checksum verification.

In the example algebraicfile xxd output from earlier, the Checksum section is these final 32 bytes:

- SNIP -
000001a0: 8bdf 7a62 1e77 2e2b 251e 05f5 2dfb df22  ..zb.w.+%...-.."
000001b0: 335c b52d b61c edfc 1d2e cb5c f70c 2600  3\.-.......\..&.
000001c0: a404 e458 f7db 0a1a cf21 dd              ...X.....!.