Adds Interval Tree Clocks and Vector Clocks with an algebraic approach #333

johnynek · 2014-08-08T08:34:20Z

I've been geeking out on this for the past few nights.

implementing Interval Tree Clock was really fun.
I noticed a new concept: a partial semigroup. It seems useful to build operations that are robust to duplication.
I think with this, we can make storehaus work with Partial semigroups, which means if we store a value with type (C, T) where we have a clock for C and a semigroup for T we can get duplication tolerance. This is still a little vague to me, but I'd appreciate feedback.

johnynek · 2014-08-08T08:44:03Z

In particular, look at Clock[T]:

https://github.com/twitter/algebird/blob/oscar-interval-tree/algebird-core/src/main/scala/com/twitter/algebird/clock/Clock.scala

Is this a good structure? Is it missing some essence of what clocks are?

jcoveney · 2014-08-08T17:16:32Z

algebird-core/src/main/scala/com/twitter/algebird/PartialSemigroup.scala

+   * where you update if possible).
+   * Note that this is not generally associative.
+   */
+  def plusOrLeft(left: T, right: T): T =


Does this have more general use outside of KV stores? Given it is functionality in terms of tryPlus, do we want it to be in the base trait? I only ask because it seems quite specialized...

Good point. I don't know yet. Maybe we should remove it from the base trait.

Given you have the method in the companion object I think tryPlus is good enough.

non · 2014-08-08T18:12:59Z

My first question is, is tryPlus associative in general, or only when both sides are defined?

If not, I wonder how you can safely rely on associativity in any case, since it is possible that (... + c) + d is defined, but ... + (c + d) is undefined. In that case, this starts looking more like PartialMagma or something, right?

If it is associative, I think it might be a semigroupoid: http://en.wikipedia.org/wiki/Groupoid. But maybe I'm mistaken?

EDIT: The criteria I would like to have is something like this: for (a + b) + c and a + (b + c), either they are both defined and equal, or neither defined. Is this too strong?

non · 2014-08-08T18:14:28Z

(Also, I haven't finished reading it, but this MathOverflow article seems promising: http://mathoverflow.net/questions/123614/on-the-notion-of-partial-semigroup)

avibryant · 2014-08-08T19:06:04Z

So, this may well be the wrong angle, but when I think about the essence of clocks, it's something like:

There is some type V which is the value we actually care about (for a counter, this might be a Long).

A clock is going to, somehow, store multiple V values (eg in a vector). We might combine these Vs in two ways:

element-wise (for some notion of element), when joining to another clock, using some semilattice on V
merging together two or more elements when reorganizing the internals of the clock, or when reading the current total out of the clock. This uses some monoid on V, which may or may not be the same as the semilattice above.

The internals of the clock (how they maintain separate Vs, how they decide which ones to merge and when, or which ones match when doing a join, etc) are implementation-specific.

This is a pretty different perspective from what you've taken here, I think. In particular, a few things that seem strange to me, given where I'm coming from:

The T in Clock[T] is not the V I actually care about. Instead it feels more like an implementation artifact (eg VectorClock.Stamp).
There doesn't seem to be any easy way to get the "final" answer out (what's the sum of the Vector[Long]), or anything in the Clock abstraction that even contemplates this being the use case.
Most importantly, the V seems to be hardcoded in the clock implementations (eg for VectorClock it's always Long). Why can't I define a VectorClock on HLL? There's a well defined semilattice for it, and a well defined merge operation to get the "total" (which unlike for Long, happens to be the same as the semilattice).

johnynek · 2014-08-08T20:06:37Z

Good feedback, sirs.

@non seems you want strongly associative, according to the math overflow term in the question, while I think I only promised properly associative. I think the cases I have here are strongly associative, so I could strengthen it. I don't think I want so many varieties. I'd rather the strongest notion that gives the duplicate-message-handling. Also, what do you think of using descriptive vs canonical names (Semigroupoid vs PartialSemigroup?)

@avibryant Yes, so you are interested in the cases where the vector clocks are not over integers. That does look interesting, but I wonder how you can increment the clock generally? HLL has a semilattice, but it does not have "successible" (or incrementable, or countable or whatever you wish to call it) in a natural way that I see. So, is this notion of being able to create a next largest time needed? Our normal human notion of time does not have that: it is a real number that is continuously integrating forward. I was interested in the classical application of vector clocks to deal with duplicated and out-of-order messages. But you are right: if there is a semilattice on T, then there is a semilattice on Vector[T]. In your picture, each node has a support (id for instance) and it can apply values. Is your notion of a clock stronger than a semilattice? I could improve my implementation I think such that IntervalTree and VectorClock stamps have types V which themselves must be clocks, and I think that is enough to implement them (lift and shrink in IntervalTree.Event will be a bit tricky...)

Avi, can you talk more about your vision for application? My main vision was in summingbird to have messaging give us at-least once semantics, but then use the clock to remove duplicates in order to get at-most-once. I recall you talking about distributed clocks where the values are not longs, but anything with a semilattice. What more did you have in mind there? (other than K-V stores of bloomfilters, HLLs, Sets, maximums, minimums or vectors of these).

I guess I was trying to get a partial band (idem but not commutative) from a general semigroup using the clock.

non · 2014-08-08T20:29:27Z

@johnynek If something like the strongly associative property works here, I think that would be a good property, since it is relatively easy to reason about.

I am inclined to prefer Semigroupoid only because I can imagine wanting to integrate with Groupoid and other related types, and I think the naming scheme there is a bit nicer. But I don't really have strong feelings about it, especially since it seems like there is not a single canonical definition here. I don't think PartialSemigroup will confuse anyone, especially if you are explicit about the kind of associativity it has.

avibryant · 2014-08-08T21:53:13Z

Yes, I was wondering about exactly that recursive representation of having a V which is a Clock; that seems like the shortest path from where you are now to having clocks on non-integers, but I'm not convinced it's the globally optimal design.

In my vision, incrementing is not fundamental; rather, it falls out of the fact that you have a merge monoid for V. That lets you add(inc: V), and you can add(1L) if you want, but there's no equivalent increment op for HLL, and that's fine.

The motivation here comes from using vector clocks as CRDTs in Dynamo-like systems. The canonical example is having a distributed eventually-consistent (AP) counter. So (forgive me if this is obvious), you have a vector clock for the counter, each node has a corresponding element in the vector, asking a node to increment the counter will increment its element, and then whenever you have the opportunity to sync up the nodes, you join vector clocks as usual with element-wise max, and get the total counter value by summing across all the elements. But this generalizes nicely to HLLs etc.

jnievelt · 2014-08-23T00:52:53Z

I don't think we want to design our clock around being used as a distributed counter. Couldn't one just use a latticed Monoid[Vector[T]] with a summing Monoid[T] instead? If we did want to build an intermediate structure, should it also meet CMS use case of a latticed Monoid[Vector[T]] with a latticed Monoid[T]?

Anyway, the issue with nested clocks is in forking/joining. For example, if you have a VectorClock of VectorClocks, how do you do an inner fork? Even if you don't fork, does it make more sense than simply having a flat identifier space?

In terms of my view on what these things really are, Stamps are a representation of a set of events which are guaranteed to have happened. The increment is a stand-in for adding a new event to the set, with the assumption that it's being added in order (within its identifier context) and without duplication. The unions and comparisons are completely in line with those of sets as well.

Clocks, then, are ways of storing those sets given that we can provide a reduced interface, though we don't want any inherent loss of accuracy from the clock itself. Thus we can use stamp structures that can easily "add the next event for a context", combine with another stamp, and check for ordering with another stamp. But we can't enumerate events or even test for their membership (unless we also have a full listing of each id's event sequences) within the structure.

Coming to duplication tolerance, it's difficult for me to imagine how clocks will be used here. Can you describe the logic that might be used and the scenario that it would solve?

jcoveney · 2014-09-04T15:33:49Z

algebird-core/src/main/scala/com/twitter/algebird/clock/IntervalTree.scala

+ * See the very readable paper:
+ * http://gsd.di.uminho.pt/members/cbm/ps/itc2008.pdf
+ */
+object IntervalTree {


no suuuuper strong opinions here but imho there's enough code here that this could be a package and you could split things out a bit. Might be easier to read if they're in separate files and stuff. This is a giant object. But they are all heavily related so I get the desire to keep them together.

ianoc · 2015-08-04T00:49:01Z

Sorry my bad, git foo on cmd line broke stuff and closed all of these

CLAassistant · 2019-11-16T23:48:56Z

All committers have signed the CLA.

Adds Interval Tree Clocks and Vector Clocks with an algebraic approach

ec7ba7f

jcoveney reviewed Aug 8, 2014
View reviewed changes

jcoveney reviewed Sep 4, 2014
View reviewed changes

johnynek mentioned this pull request Dec 22, 2014

Definitions for partial algebras typelevel/algebra#17

Open

ianoc closed this Aug 4, 2015

ianoc reopened this Aug 4, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adds Interval Tree Clocks and Vector Clocks with an algebraic approach #333

Adds Interval Tree Clocks and Vector Clocks with an algebraic approach #333

Uh oh!

johnynek commented Aug 8, 2014

Uh oh!

johnynek commented Aug 8, 2014

Uh oh!

jcoveney Aug 8, 2014

Uh oh!

johnynek Aug 8, 2014

Uh oh!

jcoveney Aug 22, 2014

Uh oh!

non commented Aug 8, 2014

Uh oh!

non commented Aug 8, 2014

Uh oh!

avibryant commented Aug 8, 2014

Uh oh!

johnynek commented Aug 8, 2014

Uh oh!

non commented Aug 8, 2014

Uh oh!

avibryant commented Aug 8, 2014

Uh oh!

jnievelt commented Aug 23, 2014

Uh oh!

jcoveney Sep 4, 2014

Uh oh!

ianoc commented Aug 4, 2015

Uh oh!

CLAassistant commented Nov 16, 2019 •

edited

Loading

Uh oh!

Uh oh!

Adds Interval Tree Clocks and Vector Clocks with an algebraic approach #333

Are you sure you want to change the base?

Adds Interval Tree Clocks and Vector Clocks with an algebraic approach #333

Uh oh!

Conversation

johnynek commented Aug 8, 2014

Uh oh!

johnynek commented Aug 8, 2014

Uh oh!

jcoveney Aug 8, 2014

Choose a reason for hiding this comment

Uh oh!

johnynek Aug 8, 2014

Choose a reason for hiding this comment

Uh oh!

jcoveney Aug 22, 2014

Choose a reason for hiding this comment

Uh oh!

non commented Aug 8, 2014

Uh oh!

non commented Aug 8, 2014

Uh oh!

avibryant commented Aug 8, 2014

Uh oh!

johnynek commented Aug 8, 2014

Uh oh!

non commented Aug 8, 2014

Uh oh!

avibryant commented Aug 8, 2014

Uh oh!

jnievelt commented Aug 23, 2014

Uh oh!

jcoveney Sep 4, 2014

Choose a reason for hiding this comment

Uh oh!

ianoc commented Aug 4, 2015

Uh oh!

CLAassistant commented Nov 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

CLAassistant commented Nov 16, 2019 •

edited

Loading