ott Comments - Convex Community

ott

•Created by ott on 4/17/2025 in #general

Oh is it literally another db table? The

I didn't consider that every query needs to also be on the timestamp because of the optimistic CC, so it makes sense why a b+tree would perform better

6 replies

CCConvex Community

•Created by ott on 4/17/2025 in #general

Oh is it literally another db table? The

I did some rough IO estimates for the FIND document WHERE id = <id> AND timestamp <= t (please correct me if I got anything wrong). it looks like they have similar best case scenerios, but the LSM struggles when L0 is tiered. If it was leveled, I think the IO numbers could be pretty similar. key size: 64 bytes value size: 448 bytes entry size: 512 bytes OS page size: 4096 bytes 1 billion entries = 512 GB total storage query: read document/row at timestamp lsm (e.g. rocskdb): - size ratio (T) = 10 - L0 tiered, L1.. leveled - L0 = 10 MB - entries per page = 4096 bytes / 512 bytes = 16 - 1 MB skiplist as the memory buffer - bloom filters with 1% false positive rate - fence pointers on each level/SSTable

| Level |   size | num Entries | num SSTables |
| ----- | ------:| -----------:| ------------:|
| 0     |  10 MB |         20k |   1-10 (var) |
| 1     | 100 MB |        200k |           10 |
| 2     |   1 GB |          2m |          100 |
| 3     |  10 GB |         20m |           1k |
| 4     | 100 GB |        200m |          10k |
| 5     | 400 GB |        777m |          40k |

| Level |   size | num Entries | num SSTables |
| ----- | ------:| -----------:| ------------:|
| 0     |  10 MB |         20k |   1-10 (var) |
| 1     | 100 MB |        200k |           10 |
| 2     |   1 GB |          2m |          100 |
| 3     |  10 GB |         20m |           1k |
| 4     | 100 GB |        200m |          10k |
| 5     | 400 GB |        777m |          40k |

better case 1. search memory buffer(s): 0 IOs 2. L0 search: 1 IO = 1 IO worse case 1. search memory buffer(s): 0 IOs 2. L0: 10 IOs for each SSTable (they have overlapping key ranges) 3. L1-L5: 1 IO per level (non overlapping key ranges) = 5 IOs. fence pointer cached in memory = 15 IOs b+tree: - leaf nodes: 64 bytes (key) + 8 bytes (pointer) = 72 bytes - non-leaf nodes: 64 bytes + 8 bytes = 72 bytes - ~56 nodes / page - h = 5 better case: 1. all leaf nodes cached 2. 1 IO for value read = 1 IO worse case: 1. 5 IOs for index traversal 2. 1 IO for value read = 6 IOs

6 replies

CCConvex Community

•Created by ott on 4/17/2025 in #general

Oh is it literally another db table? The

thanks for answering my questions. I find this stuff super interesting and I'm currently doing database research at uni.

we want to be able to query a snapshot of the index at slightly stale timestamps. this then shifts the workload towards more range scans, which are (often) better on btrees.

for the LSM case, you could have a key of key;<MAX_TS - t> so that keys are sorted most recent first

user:42;990  ->  t=9
user:42;995  ->  t=4
user:42;999  ->  t=0

user:42;990  ->  t=9
user:42;995  ->  t=4
user:42;999  ->  t=0

6 replies

CCConvex Community

•Created by ott on 4/17/2025 in #general

Oh is it literally another db table? The

whats the reason for using relational databases instead of kv stores for the storage backend? if everything is a byte string, and the source of truth is a LOG, wouldn't using a LSM backed storage engine be more efficient than a b+tree backed one?

6 replies