serializability and consistent pagination

I like the interleavened tables idea - seems it could work well with DDD aggregates. I guess for 3 I was thinking some sort of mvcc snapshot based approach although I can’t imagine comes too cheap if data is changing frequently, but curious how you keep reads performant with strict serializability guarantees? I do like the idea of being able to easily have subscriptions to changing data and I assume that’s a big part of what your pushing out to those edge servers re: TCP connections?
2 Replies
sujayakar
sujayakar2y ago
(forking a thread) yeah, holding MVCC snapshots open for a long time is pretty difficult to make performant in any database system. we keep our MVCC windows pretty tight by enforcing transaction duration limits (that's why queries + mutations have a much lower timeout than actions!). and yep, pushing out function caches and subscription invalidation out to edge servers is exactly what we're planning on the global performance front. making this work efficiently relies on us being serializable but not necessarily strict serializable (or linearizable). in particular, serializability permits read only transactions to execute in the past, since ordering them at that point is a valid serial order. so, our edge query caches can serve function results at slightly old snapshots while remaining serializable. however, this technicality for serializability isn't all that useful for stateful clients (like our react client) that expect to read their own writes. to accommodate that, we plan on letting these clients pass up a minimum timestamp (say, derived from a mutation) that then acts as a lower bound. the edge server, then, can wait for replication from the primary region to pass this timestamp before serving a cached value. and, cool, I hadn't seen the term DDD before, but it makes a bunch of sense! we've talked about similar ideas before from an availability perspective: clustering together related data keeps the impact of a shard going down to just a proportional fraction of customers. otherwise, randomly distributing data means that most customers will have some overlap on the unavailable shard!
DefinitelyNotAWizard
Ah I thought I heard something about strict serializability in the podcast. In practice I’m not sure I’ve ever needed linearizability. I usually don’t even need serializability, but based on my experience most developers have never even heard of transaction isolation levels which is why I’m a fan of defaulting to this for general cases. I like the idea of returning a timestamp from an operation as long as that’s a dependable timestamp which I gather it is from your statement. I’m a big fan of DDD and have been practicing it for many years now. Everyone I manage to get on board has also become a fan and evangelist. It takes some getting used to though. It’s really quite powerful for API design with complex business logic and works well in both functional and OO paradigms, but probably more of a thing that your customers would use - not something I’ve typically applied to performant infrastructure layers. Those are different sorts of problems. More fun ones IMHO.

Did you find this page helpful?