Algebraicfile Format Specification

Overview

The algebraicfile file format is a container file format for encrypted data. The encrypted data in an algebraicfile may correspond to data from any arbitrary source—in-memory data, data from a Unix domain socket, data from a regular file, and so on. Typically, however, an algebraicfile contains encrypted file metadata and encrypted file data corresponding to a single file on a file system.

The file format design allows an algebraicfile to be read and written as a stream, with constant in-memory overhead. The recommended filename extension is .algebraic. The Uniform Type Identifier is org.littleroot.algebraicfile.

This document describes version 4 of the format, which is the current version. Programs and packages that read algebraicfile-formatted files should support the latest and all previous versions. Readers must return an error if a version is unknown or unsupported. Writers may support only the latest version.

Each algebraicfile has 6 sections:

# Section name Length Encryption
1 Identifier 6 byte none
2 Primary header 81 byte none
3 Secondary header variable XChaCha20-Poly1305
4 Filler variable, possibly zero unspecified
5 Data variable, possibly zero XChaCha20
6 Checksum 32 byte none
Algebraic encrypted file icon on macOS
Algebraic encrypted file icon on macOS

Example Algebraicfile

The example below corresponds to an original source file named hello.txt with the contents “hello, world\n”.

% cat hello.txt
hello, world
% xxd hello.txt
00000000: 6865 6c6c 6f2c 2077 6f72 6c64 0a         hello, world.
%

In this example, the file hello.txt was encrypted—with the option turned on to obfuscate the true length of the original data—into an algebraicfile named hello.txt.algebraic. Each section of the example algebraicfile is discussed below.

% xxd hello.txt.algebraic
00000000: 0c75 0d05 0e04 3053 b25a 9b2c e75a e47d  .u....0S.Z.,.Z.}
00000010: 2820 2cec cb7d 0000 0001 0040 0000 08c0  ( ,..}.....@....
00000020: a81c 5dca d766 80ba 6bee 70ae 40c1 a6b3  ..]..f..k.p.@...
00000030: d1ab 6ca2 8a95 ad8e 759e 7bc0 0177 8d0f  ..l.....u.{..w..
00000040: 933b 53c6 d813 bf42 f955 3351 3f62 e000  .;S....B.U3Q?b..
00000050: 0000 0000 0001 2a8f 6602 3c5c d34a 28d2  ......*.f.<\.J(.
00000060: 4c58 7b49 9575 b0b5 438b 928c 9f73 155a  LX{I.u..C....s.Z
00000070: 70e2 22a4 b71e 66fc 01a2 5f60 b9c9 ea3d  p."...f..._`...=
00000080: a87e d1f8 68e8 4586 7ec4 e136 21a1 213b  .~..h.E.~..6!.!;
00000090: 783d 40ae 1b11 9819 f25f c3c8 5bee c1ca  x=@......_..[...
000000a0: 9db6 ed04 f03e 6691 a117 d7b3 7853 b449  .....>f.....xS.I
000000b0: 0291 3f9c 8c30 bf9a 36d8 3829 9eca c570  ..?..0..6.8)...p
000000c0: aac0 5c5d aab1 e46e c282 cda1 21a6 e131  ..\]...n....!..1
000000d0: 0f48 4615 f363 7eaf 6855 89b4 a67e 2884  .HF..c~.hU...~(.
000000e0: 2e05 5078 fe38 a96e 35b4 1c10 6148 e4ba  ..Px.8.n5...aH..
000000f0: 7240 a79f 33b8 9c6b 7d30 3171 700a 7073  r@..3..k}01qp.ps
00000100: 4bad ec34 3110 8935 5c1d cc44 9867 ae7f  K..41..5\..D.g..
00000110: 53c9 c33c 1abb 24d0 a5b4 e076 a0dc b316  S..<..$....v....
00000120: 928e 7594 7ac3 7df8 59e8 26cf f649 9e68  ..u.z.}.Y.&..I.h
00000130: e127 11e5 009f 0ed3 3246 fa5a 5434 e49a  .'......2F.ZT4..
00000140: 2449 d70b e5b6 28f7 b1c6 bdb1 0845 5434  $I....(......ET4
00000150: 0b85 3bf2 e1ab 0228 f317 df54 1f4a 9e66  ..;....(...T.J.f
00000160: f08a 70c8 024d b13d 0ea6 89b8 2f5a 9ef2  ..p..M.=..../Z..
00000170: 6c06 673b 8478 84d4 4fea a23b 93ff ce6e  l.g;.x..O..;...n
00000180: 786d 836b a9f3 89f1 4834 f626 6e7a ee40  xm.k....H4.&nz.@
00000190: b95e 6faa bc15 a154 1ff0 6c78 4d35 3fbe  .^o....T..lxM5?.
000001a0: 3e18 124b b75a 6138 e6c1 382a f423 b3    >..K.Za8..8*.#.
%

File Structure

1. Identifier section

The identifier section is binary-encoded in big-endian order. It consists of:

magic    5 byte
version  1 byte

The magic value is 0x0c 0x75 0x0d 0x05 0x0e. The version field is the algebraicfile format version as an integer; for example, for version 4 of the format the value is 0x04. Programs that read an algebraicfile should read the identifier section, and based on the version number adjust their parsing behavior for the remaining sections.

In the example algebraicfile xxd output from earlier, the identifier section is these bytes:

00000000: 0c75 0d05 0e04 3053 b25a 9b2c e75a e47d  .u....0S.Z.,.Z.}
- SNIP -

2. Primary Header section

The primary header section is binary-encoded in big-endian order. It consists of:

salt          16 byte
time          4 byte
mem           4 byte
threads       1 byte
aead-nonce    24 byte
stream-nonce  24 byte
nextlen       8 byte

The salt, time, mem, and threads fields are parameters for Argon2id key derivation. The mem value must be in unit kibibyte (KiB). The aead-nonce field is the nonce to use with XChaCha20-Poly1305. The stream-nonce field is the nonce to use with XChaCha20. The nextlen field represents the length in bytes of the variable-length secondary header section that follows this section.

In the example algebraicfile xxd output from earlier, the primary header section is these bytes:

00000000: 0c75 0d05 0e04 3053 b25a 9b2c e75a e47d  .u....0S.Z.,.Z.}
00000010: 2820 2cec cb7d 0000 0001 0040 0000 08c0  ( ,..}.....@....
00000020: a81c 5dca d766 80ba 6bee 70ae 40c1 a6b3  ..]..f..k.p.@...
00000030: d1ab 6ca2 8a95 ad8e 759e 7bc0 0177 8d0f  ..l.....u.{..w..
00000040: 933b 53c6 d813 bf42 f955 3351 3f62 e000  .;S....B.U3Q?b..
00000050: 0000 0000 0001 2a8f 6602 3c5c d34a 28d2  ......*.f.<\.J(.
- SNIP -

which breaks down to the hex field values:

salt          3053b25a9b2ce75ae47d28202ceccb7d
time          00000001
mem           00400000
threads       08
aead-nonce    c0a81c5dcad76680ba6bee70ae40c1a6b3d1ab6ca28a95ad
stream-nonce  8e759e7bc001778d0f933b53c6d813bf42f95533513f62e0
nextlen       000000000000012a

3. Secondary Header section

The secondary header section consists of a JSON-encoded object, encrypted with XChaCha20-Poly1305. The section byte size includes the Poly1305 authentication tag (in other words, the AEAD overhead). The nonce for the encryption is the aead-nonce value in the primary header section. The encryption key is derived by hashing a user-supplied password with Argon2id; the parameters for Argon2id must match the values in the primary header section.

The section largely consists of metadata of the original file. The structure of the JSON-encoded object is:

{
    cp: string // packed copyfile(3) data, base64-encoded.
    fl: number // length of "Filler" section, int64.
    m:  number // file mode bits, uint32; see Go type fs.FileMode for format.
    n:  string // filename, final path element only, base64-encoded.
    l:  string // linkname, present iff original file is a symbolic link, base64-encoded.
    u:  number // file uid, int64.
    g:  number // file gid, int64.
    mt: number // file modification time, int64.
    at: number // file access time, int64.
    ct: number // file change time, int64.
    bt: number // file birth time, int64.
}

First, basic rules that apply to all fields: All fields are optional in the encoded JSON. For example, if an algebraicfile represents encrypted in-memory data, then fields such as the original file’s name, its file mode bits, and its modification time are not applicable and hence will be absent.

If a field’s value is unavailable or invalid, writers must omit the property in its entirety in the encoded JSON. Readers must use “nil”, “empty”, or “zero” values for missing fields when decoding JSON. Readers must take into account integer precision and sign requirements when decoding numbers from JSON. Readers must skip without error unknown properties present in the encoded JSON.

Details on specific fields: The fl field represents the length in bytes of the variable-length filler section that follows this section. Note that if the property does not exist in the encoded JSON, readers must consider the value to be zero.

The l field represents the target name for a symbolic link. It must be present if and only if the original file corresponding to an algebraicfile is a symbolic link.

The cp field consists metadata about the original file. The value is the base64-encoded result of copyfile(3) called with flags COPYFILE_ACL | COPYFILE_XATTR | COPYFILE_PACK. Writers should omit the field if the value cannot be constructed (e.g. because copyfile(3) isn’t available).

In the example algebraicfile xxd output from earlier, the secondary header section is the following 298 encrypted bytes—the length having been specified by the nextlen field in the primary header section.

- SNIP -
00000050: 0000 0000 0001 2a8f 6602 3c5c d34a 28d2  ......*.f.<\.J(.
00000060: 4c58 7b49 9575 b0b5 438b 928c 9f73 155a  LX{I.u..C....s.Z
00000070: 70e2 22a4 b71e 66fc 01a2 5f60 b9c9 ea3d  p."...f..._`...=
00000080: a87e d1f8 68e8 4586 7ec4 e136 21a1 213b  .~..h.E.~..6!.!;
00000090: 783d 40ae 1b11 9819 f25f c3c8 5bee c1ca  x=@......_..[...
000000a0: 9db6 ed04 f03e 6691 a117 d7b3 7853 b449  .....>f.....xS.I
000000b0: 0291 3f9c 8c30 bf9a 36d8 3829 9eca c570  ..?..0..6.8)...p
000000c0: aac0 5c5d aab1 e46e c282 cda1 21a6 e131  ..\]...n....!..1
000000d0: 0f48 4615 f363 7eaf 6855 89b4 a67e 2884  .HF..c~.hU...~(.
000000e0: 2e05 5078 fe38 a96e 35b4 1c10 6148 e4ba  ..Px.8.n5...aH..
000000f0: 7240 a79f 33b8 9c6b 7d30 3171 700a 7073  r@..3..k}01qp.ps
00000100: 4bad ec34 3110 8935 5c1d cc44 9867 ae7f  K..41..5\..D.g..
00000110: 53c9 c33c 1abb 24d0 a5b4 e076 a0dc b316  S..<..$....v....
00000120: 928e 7594 7ac3 7df8 59e8 26cf f649 9e68  ..u.z.}.Y.&..I.h
00000130: e127 11e5 009f 0ed3 3246 fa5a 5434 e49a  .'......2F.ZT4..
00000140: 2449 d70b e5b6 28f7 b1c6 bdb1 0845 5434  $I....(......ET4
00000150: 0b85 3bf2 e1ab 0228 f317 df54 1f4a 9e66  ..;....(...T.J.f
00000160: f08a 70c8 024d b13d 0ea6 89b8 2f5a 9ef2  ..p..M.=..../Z..
00000170: 6c06 673b 8478 84d4 4fea a23b 93ff ce6e  l.g;.x..O..;...n
00000180: 786d 836b a9f3 89f1 4834 f626 6e7a ee40  xm.k....H4.&nz.@
- SNIP -

4. Filler section

The filler section may be used to increase the size of an algebraicfile, in order to obfuscate the true length of the data. The number of bytes in the section must match the fl field in the secondary header section. The bytes must be indistinguishable from any actual encrypted data.

Readers may ignore the filler section, by discarding or seeking past fl bytes after the secondary header.

The filler section can have zero length.

In the example algebraicfile xxd output from earlier, the filler section is exactly 1 byte—the length would have been indicated by the encrypted fl field in the secondary header.

- SNIP -
00000180: 786d 836b a9f3 89f1 4834 f626 6e7a ee40  xm.k....H4.&nz.@
- SNIP -

5. Data section

The data section is all bytes after the filler section but before the final, fixed-length checksum section. The section consists of the original source file’s data, encrypted with XChaCha20. The nonce for the encryption is the value of the stream-nonce field in the primary header section. The encryption key is the same one used in the secondary header section.

The data section can have zero length. In practice this can happen, for example, if the original file was a zero length regular file or the original file was a symbolic link.

In the example algebraicfile xxd output from earlier, the checksum section is these 13 encrypted bytes, which is the same length as the original plain text input:

- SNIP -
00000180: 786d 836b a9f3 89f1 4834 f626 6e7a ee40  xm.k....H4.&nz.@
- SNIP -

6. Checksum section

The 32-byte checksum is the SHA-256 sum of all the bytes in the preceding sections. Readers may forgo checksum verification.

In the example algebraicfile xxd output from earlier, the checksum section is these final 32 bytes:

- SNIP -
00000180: 786d 836b a9f3 89f1 4834 f626 6e7a ee40  xm.k....H4.&nz.@
00000190: b95e 6faa bc15 a154 1ff0 6c78 4d35 3fbe  .^o....T..lxM5?.
000001a0: 3e18 124b b75a 6138 e6c1 382a f423 b3    >..K.Za8..8*.#.