Skip to content

Commit bf9c80d

Browse files
Docs party (#98)
* lots of docs for the docs party * lots of docs for the docs party * verify schema fix * docs update * docs update * docs update
1 parent 438b5ea commit bf9c80d

File tree

7 files changed

+258
-77
lines changed

7 files changed

+258
-77
lines changed

ARCHITECTURE.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,3 +100,12 @@ The second level of coordination with a second exclusive lock that is needed is
100100
When tombstone cleanup occurs, the entire state of the log is read. Any tombstones that are found older than the `tmb_grace_sec` are deleted from S3.
101101

102102
When the cleaning process finds a log file with tombstones, it first deletes those files from S3. If that is successful (not found errors being idempotent passes), then the log file is replaced with the same contents, minus the tombstones and any file markers that had those tombstone references.
103+
104+
## Concurrent Merge and Tombstone cleanup
105+
106+
If you have multiple hosts running merge and tombstone cleanup, then you will need to coordinate them with a system
107+
like etcd, Zookeeper, Postgres, CockroachDB, or anything that provide serializable transactions or native exclusive
108+
locking.
109+
110+
Tombstone cleanup would simply result in redundant actions which can reduce performance, however concurrent merges
111+
on the same parts may result in duplicate data, which must be avoided.

README.md

Lines changed: 230 additions & 77 deletions
Large diffs are not rendered by default.

examples/README.md

Whitespace-only changes.

examples/api-falcon.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,12 @@
55
For a single host setup, besides running Flask in debug mode, this is an otherwise
66
production-ready setup for the provided events.
77
8+
Note that this run its own merge and tombstone cleaning, which is NOT SAFE for multi-node setups without distributed
9+
locking.
10+
11+
This example also provides async inserting via an in-memory buffer that flushes every 3 seconds. You must be able to
12+
tolerate data loss if the node dies, otherwise use something like RedPanda for buffering inserts.
13+
814
Run:
915
`docker compose up -d`
1016

examples/api-flask.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,12 @@
55
For a single host setup, besides running Flask in debug mode, this is an otherwise
66
production-ready setup for the provided events.
77
8+
Note that this run its own merge and tombstone cleaning, which is NOT SAFE for multi-node setups without distributed
9+
locking.
10+
11+
This example also provides async inserting via an in-memory buffer that flushes every 3 seconds. You must be able to
12+
tolerate data loss if the node dies, otherwise use something like RedPanda for buffering inserts.
13+
814
Run:
915
`docker compose up -d`
1016
File renamed without changes.

examples/verify-schema.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,11 @@
11
"""
2+
This example verifies the schema before inserting to ensure that the data does not get corrupted.
3+
4+
In practice, you will want to cache the schema in the ingestion workers and when ever there is a change, lookup from
5+
some central data store that supports serializable transactions (Postgres, CockroachDB, FoundationDB, etc.) where you
6+
can lock the schema row and update it if the new schema does not break, otherwise you should drop and/or quarantine
7+
the violating rows for manual review.
8+
29
Run:
310
`docker compose up -d`
411

0 commit comments

Comments
 (0)