Notes on using Together’s pattern
This is a compilation of things to know and be careful about when designing for together’s pattern.
By Difficulty to be incorporated
Almost impossible, and probably is useless
- DFS graph traversal (one node connection interferes with others, as in which connect to which)
- Search Engine (interfere results with each others)
Hard
- PII related (security reasons)
- Multi tenancy (security reasons)
- Cross partition transactions -> batch only help with throughput, but may increase latency.
- Distributed tracing -> not about storing the data, as most distributed tracing system like Jaeger and Zipkin already batch that. But about semantic of the tracing itself (spans now affect multiple traces)
- Geospatial (need someway to lookahead, or will become unfair to some data)
Easy
- YCSB things, hotspot/popular item/contention-heavy behaviors, logic tx on main key/partition
- BFS graphs traversal
- Abstracted high level API over key value behavior.
On incorporating / using
Application-side
- Start with simpler pattern, such as those with key-value access only. This typically constitute large number of requests, and very simple to batch (akin to
SELECT * FROM a_table_name WHERE some_field IN (...)
, or redis pipelines, memcache’s multi_get).
- Prefer pessimistic rather than optimistic concurrency control, so you can control how complex your logic should be without rollbacking everything.
- Reduce allocations (if possible), as this is another source of non-useful work. See here. CPU for allocations are better used for Together’s logic
-
Use language which can use many cores
for many runtime threads
. Among popular options:
- Golang fits this best.
- Synchronous Java without Project Loom suffers lots of waiting threads.
- Javascript(via nodejs), Ruby, Python, etc all reduce chance for bigger batch, only batch per process, although this is already much better than nothing.
- Not sure about .NET ecosystem or async Java ecosystem, but supposedly they are as good as Golang (performance wise)
- PHP with its process per request removes all possibility for batching.
- Be careful with replication lag, usually during batched writes.
Will be good for
- Hasura/postgrest/equivalent and/or Temporal/cloudstate/dapr/equivalent, as these are abstracted API over key-value behavior
- High contention / hotspot algo / behavior (PSAC, TSO based, leaderboard, etc)
- Materializing more (a la fdb aggregate index)
- Sudden traffic burst (sales event), alongside singleflight
- Heavy ingestion (timeseries, logging, distributed tracing, etc)
Will help a lot
- Business level undo log, with late undo removal (close to late lock acquisition)
- BFS lookup pattern
- Natural scanCombineApply (flat-combining) behavior (skiplist, LSM, queue, stack, etc)
- Explicit locking API, not complex CC -> easier to control
- Scatter gather, waiting for batch to complete in parallel to reduce end-to-end latency
- Service/DB specialization, e.g. user, relation naming, etc.
- Splitting static and dynamic data.