Skip to content

Refactor archive to be sequence of pairs #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jul 18, 2025

Conversation

gordonbrander
Copy link
Owner

@gordonbrander gordonbrander commented Jul 15, 2025

This PR refactors the archive format to prioritize streaming encoding and decoding over random access.

TODO:

  • Streaming unarchiving
  • Add content-type to file memo headers (useful for serving from archives)
  • Add iss-nickname to headers
  • Remove "default" key name. Get user to choose name for key on generation.
  • TOFU (Trust On First Use). Prompt user to trust (add to address book) new keys. Register them via their nickname by default.

Advantages:

  • Makes archives useful for more than just files... general purpose bundling.
  • Allows concatenating onto archives, since each memo is signed independently.
  • Supports archives from multiple sources
  • Blake3/Bao/random access are better served via a pass that derives a sidecar index, such as a Bao file
    • E.g. in a scenario where you want to support cheap random access via range requests, a Bao and range index could be generated in a single pass on first request. Following requests defer to the existing index.

Tradeoffs:

  • No signature over total set of memos (this is in tension with concatenation). However, this could be bolted on by generating and signing a HashSeq memo as the first item in the sequence, if desired.

This prioritizes streaming over random access, and makes archive a more
general-purpose tool. Similar in principle to WARC.
This will allow us to implement prompts for the user to remember/store
keys in the address book.
Doesn't work yet, just issues a prompt.
Stores keys (ours and theirs) and nicknames.
We were missing part of the multicodec prefix.
Had to do a bit of surgery to deal with dependency hell.

- Rand had a breaking update to deal with some new reserved keywords in
  Rust.
- Ed25519 Dalek depends upon an outdated version of rand core for the
  rand_core feature.
- We removed that feature, and generate the random bytes ourself, using an
  up-to-date version of rand.
@gordonbrander gordonbrander merged commit 9d0eba8 into main Jul 18, 2025
1 check passed
@gordonbrander
Copy link
Owner Author

Fixes #24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant