Skip to content

Commit 459f7af

Browse files
authored
Merge pull request #70 from tryriot/back-pressure-doc
fix Replication.Supervisor's strategoy and minimal durable-slot/back-pressure documentation in README
2 parents d29b50f + 31b3150 commit 459f7af

File tree

2 files changed

+79
-1
lines changed

2 files changed

+79
-1
lines changed

README.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -297,6 +297,69 @@ defmodule MyApp.Events.User do
297297
end
298298
```
299299

300+
### Advanced usages
301+
302+
##### Durable slot
303+
304+
By default WalEx will create a temporary replication slot in Postgres.
305+
306+
This means that if the connection between WalEx and Postgres gets interrupted (crash / disconnection / etc.),
307+
the replication slot will get dropped by Postgres.
308+
This makes using WalEx safer as there is not risk of filling up the disk of the Postgres writer instance in
309+
case of downtime.
310+
311+
The downside being that this event-loss is more than likely.
312+
If this is a no-go, WalEx also supports durable replication.
313+
314+
```elixir
315+
# config.exs
316+
317+
config :my_app, WalEx,
318+
# ...
319+
durable_slot: true,
320+
slot_name: my_app_replication_slot
321+
```
322+
323+
Only a single process can be connected to a durable slot at once,
324+
in case the slot is already used `WalEx.Supervisor` will fail to start with a `RuntimeError`.
325+
326+
Be warned that there are many additional potential gotchas (a detailed guide is planned).
327+
328+
##### Event middleware / Back-pressure
329+
330+
WalEx receives events from Postgres in `WalEx.Replication.Server` and then `cast` those to `WalEx.Replication.Publisher`.
331+
It's then `WalEx.Replication.Publisher` that is responsible to join these events together and process them.
332+
333+
In the event where you'd expect Postgres to overwhelm WalEx and potentially cause OOMs,
334+
WalEx provides a config option that should help you implement back-pressure.
335+
336+
As it's a quite advanced use, with many strong requirements,
337+
it's recommended instead to increase the amount of RAM of your instance.
338+
339+
Never the less, if it's not an option or would like to control the consumption rate of events,
340+
WalEx provide the following configuration option:
341+
342+
```
343+
config :my_app, WalEx,
344+
# ...
345+
message_middleware: fn message, app_name -> ... end
346+
```
347+
348+
`message_middleware` allows you to define the way `WalEx.Replication.Server` and `WalEx.Replication.Publisher` communicate.
349+
350+
If for instance you'd like to store these events to disk before processing them you would need to:
351+
352+
- provide a `message_middleware` callback. It should serialize messages and store them to disk
353+
- add a supervised strictly-ordered disk consumer. On each event it would call one of:
354+
- `WalEx.Replication.Publisher.process_message_async(message, app_name)`
355+
- `WalEx.Replication.Publisher.process_message_sync(message, app_name)`
356+
357+
Any back-pressure implementation needs to guarantee:
358+
359+
- exact message ordering
360+
- exactly-once-delivery
361+
- that each running walex has an isolated back-pressure system (for instance one queue per instance)
362+
300363
## Test
301364

302365
You'll need a local Postgres instance running

lib/walex/replication/supervisor.ex

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,21 @@ defmodule WalEx.Replication.Supervisor do
2323
{Server, app_name: app_name}
2424
]
2525

26-
Supervisor.init(children, strategy: :one_for_one, max_restarts: 10)
26+
# one_for_all (or rest_for_one) is required here, reason being:
27+
#
28+
# if Publisher crashes:
29+
# We lost the current state.
30+
# This means that until Postgres decides to send us all the needed Relations and Types messages again,
31+
# we won't be able to decode any events from the Server.
32+
# In the mid time everything would look ok but all events would get discarded.
33+
# The only way to guarantee to get those back is to restart the Server.
34+
35+
# if Server crashes:
36+
# The replication will restart at restart_lsn.
37+
# All events from then to the LSN at which the Server crashed will get replayed.
38+
# The means that the message inbox of the Publisher will become potentially inconsistent
39+
# and will likely contain duplicate messages.
40+
# If this is undesirable, one_for_all is required otherwise rest_for_one is fine.
41+
Supervisor.init(children, strategy: :one_for_all, max_restarts: 10)
2742
end
2843
end

0 commit comments

Comments
 (0)