danthegoodman1
diff --git a/‎ARCHITECTURE.md
Lines changed: 9 additions & 0 deletions b/‎ARCHITECTURE.md
Lines changed: 9 additions & 0 deletions
diff --git a/‎README.md
Lines changed: 230 additions & 77 deletions b/‎README.md
Lines changed: 230 additions & 77 deletions
diff --git a/‎examples/README.md b/‎examples/README.md
diff --git a/‎examples/api-falcon.py
Lines changed: 6 additions & 0 deletions b/‎examples/api-falcon.py
Lines changed: 6 additions & 0 deletions
diff --git a/‎examples/api-flask.py
Lines changed: 6 additions & 0 deletions b/‎examples/api-flask.py
Lines changed: 6 additions & 0 deletions
diff --git a/‎examples/segment-sink.py renamed to ‎examples/segment-webhook-sink.py b/‎examples/segment-sink.py renamed to ‎examples/segment-webhook-sink.py
diff --git a/‎examples/verify-schema.py
Lines changed: 7 additions & 0 deletions b/‎examples/verify-schema.py
Lines changed: 7 additions & 0 deletions
@@ -100,3 +100,12 @@ The second level of coordination with a second exclusive lock that is needed is
 When tombstone cleanup occurs, the entire state of the log is read. Any tombstones that are found older than the `tmb_grace_sec` are deleted from S3.
 
 When the cleaning process finds a log file with tombstones, it first deletes those files from S3. If that is successful (not found errors being idempotent passes), then the log file is replaced with the same contents, minus the tombstones and any file markers that had those tombstone references.
+
+## Concurrent Merge and Tombstone cleanup
+
+If you have multiple hosts running merge and tombstone cleanup, then you will need to coordinate them with a system 
+like etcd, Zookeeper, Postgres, CockroachDB, or anything that provide serializable transactions or native exclusive 
+locking.
+
+Tombstone cleanup would simply result in redundant actions which can reduce performance, however concurrent merges 
+on the same parts may result in duplicate data, which must be avoided.
@@ -5,6 +5,12 @@
 For a single host setup, besides running Flask in debug mode, this is an otherwise
 production-ready setup for the provided events.
 
+Note that this run its own merge and tombstone cleaning, which is NOT SAFE for multi-node setups without distributed
+locking.
+
+This example also provides async inserting via an in-memory buffer that flushes every 3 seconds. You must be able to
+tolerate data loss if the node dies, otherwise use something like RedPanda for buffering inserts.
+
 Run:
 `docker compose up -d`
 
 
@@ -5,6 +5,12 @@
 For a single host setup, besides running Flask in debug mode, this is an otherwise
 production-ready setup for the provided events.
 
+Note that this run its own merge and tombstone cleaning, which is NOT SAFE for multi-node setups without distributed
+locking.
+
+This example also provides async inserting via an in-memory buffer that flushes every 3 seconds. You must be able to
+tolerate data loss if the node dies, otherwise use something like RedPanda for buffering inserts.
+
 Run:
 `docker compose up -d`
 
 
@@ -1,4 +1,11 @@
 """
+This example verifies the schema before inserting to ensure that the data does not get corrupted.
+
+In practice, you will want to cache the schema in the ingestion workers and when ever there is a change, lookup from
+some central data store that supports serializable transactions (Postgres, CockroachDB, FoundationDB, etc.) where you
+can lock the schema row and update it if the new schema does not break, otherwise you should drop and/or quarantine
+the violating rows for manual review.
+
 Run:
 `docker compose up -d`