ott
CCConvex Community
•Created by ott on 4/17/2025 in #general
Oh is it literally another db table? The
I didn't consider that every query needs to also be on the timestamp because of the optimistic CC, so it makes sense why a b+tree would perform better
6 replies
CCConvex Community
•Created by ott on 4/17/2025 in #general
Oh is it literally another db table? The
I did some rough IO estimates for the
FIND document WHERE id = <id> AND timestamp <= t
(please correct me if I got anything wrong). it looks like they have similar best case scenerios, but the LSM struggles when L0 is tiered. If it was leveled, I think the IO numbers could be pretty similar.
key size: 64 bytes
value size: 448 bytes
entry size: 512 bytes
OS page size: 4096 bytes
1 billion entries = 512 GB total storage
query: read document/row at timestamp
lsm (e.g. rocskdb):
- size ratio (T) = 10
- L0 tiered, L1.. leveled
- L0 = 10 MB
- entries per page = 4096 bytes / 512 bytes = 16
- 1 MB skiplist as the memory buffer
- bloom filters with 1% false positive rate
- fence pointers on each level/SSTable
better case
1. search memory buffer(s): 0 IOs
2. L0 search: 1 IO
= 1 IO
worse case
1. search memory buffer(s): 0 IOs
2. L0: 10 IOs for each SSTable (they have overlapping key ranges)
3. L1-L5: 1 IO per level (non overlapping key ranges) = 5 IOs. fence pointer cached in memory
= 15 IOs
b+tree:
- leaf nodes: 64 bytes (key) + 8 bytes (pointer) = 72 bytes
- non-leaf nodes: 64 bytes + 8 bytes = 72 bytes
- ~56 nodes / page
- h = 5
better case:
1. all leaf nodes cached
2. 1 IO for value read
= 1 IO
worse case:
1. 5 IOs for index traversal
2. 1 IO for value read
= 6 IOs6 replies
CCConvex Community
•Created by ott on 4/17/2025 in #general
Oh is it literally another db table? The
thanks for answering my questions. I find this stuff super interesting and I'm currently doing database research at uni.
we want to be able to query a snapshot of the index at slightly stale timestamps. this then shifts the workload towards more range scans, which are (often) better on btrees.for the LSM case, you could have a key of
key;<MAX_TS - t>
so that keys are sorted most recent first
6 replies
CCConvex Community
•Created by ott on 4/17/2025 in #general
Oh is it literally another db table? The
whats the reason for using relational databases instead of kv stores for the storage backend? if everything is a byte string, and the source of truth is a LOG, wouldn't using a LSM backed storage engine be more efficient than a b+tree backed one?
6 replies