Query functions caching fails when used with third party libraries
This is not strictly a bug with convex, but is common enough that many applications would run into this issue when trying to use any third party library, resulting in major cache misses that are only found after release.
Problem: Convex disables caching when the Date constructor is invoked. However, the date constructor is not always invoked by the dev, but could be by a third party library (such as date libraries that are used to parse and transform dates).
This results in the query being uncached and an elevated number of row reads.
Proposal: The build should fail if the a query uses the Date constructor, requiring the user to explicitly call a method
ctx.disableCache()
at the start of the query.11 Replies
Our initial take on this was closer to this approach, but this prevented developers from using many libraries they wanted to. We're working on improving debugging and although there's no timeline for this yet, visualizing why a query couldn't use a cached value or why a subscription reran could provide similar insight. "getUsers({limit: 5}) was not a cache hit because [rows of the users table read in initial query were modified / Date was called and the last query ran more than 15 seconds go / there was a code push]".
A strict option is possible too, someday maybe you could configure whether Date is allowed. You can hack around this today by replacing globalThis.Date with your own implementation that returns a consistent time or throws.
mm yeah, this makes sense I think. just thinking aloud - maybe a warning could be useful, to help developers know doing build time that the function was successfully built, but caching was disabled, without requiring a code change.
but I guess the present check is at runtime while the providing feedback at build time instead which could be challenging..?
thanks for the workaround though, it could come in handy.
Yes, I don't know how we could accomplish a build-time check. Unless it was possibly over-sensitive (e.g. linting for
Date
anywhere in your code or any bundled deppendencies)mm based on zero knowledge of the internal architecture, this is my best guess.
the function interface for query looks quite pure, so perhaps a build time check could be a sort of "dry run" of the function in a sandbox, to look at what it may potentially call (possible add a proxy to observe the Date constructor).
If any of the "impossible to cache" conditions occur, return with warnings.
Fundamentally I feel like there are 2 kinds of cache misses
1. failed due to changed inputs
2. fundamentally not cacheable (impure functions)
in the case of using the current date, I think this is the second category. and maybe worth considering whether the behavior should be to assume function purity and return the cached value as the default behavior instead.
rationale being that query invalidation can always be done by the user by adding a current time param to the args, while defaulting to no cache has no workaround.
hard to run a query with every conceivable set of inputs, but we could set a fuzzer on it!
Runtime warnings look more realistic here. There are degrees of cachability between 1. and 2: maybe you can cache a Date, just only for e.g. 1 minute, if that's what granularity you tell us you want it at.
Maybe you have queries rerunning too frequently because of changed inputs (changed database state) but that's just as much a problem as a date in your production application.
oh, you're completely right on this. I was wrong. with conditional logic it would be impossible to run all code paths.
🥲
there's definitely interesting stuff here, if we could limit the language (say you have a language that compiled to WebAssembly that is more amenable to static analysis) you could do this stuff!
yeah, a dream future runtime will be based on like WASM and we'd have the ability to plumb into the concepts of determinism in a more thorough way
right now, we're operating a little on the outside without full control/knowledge of the semantics. we'd love more, but it will take time to build "runtime 2.0". probably at least a year or two away
right! what I'm seeing is some sort of compile / build step that can let us know the surface area of the runtime being executed.
wondering if something like this can be achieved with babel / swc as well. so broadly maybe doing some sort of transpilation and somehow wire in to the process detection whether there are any dependencies using Date 🤔
I wonder how far you could get with configuring in a query something that communicates “it’s ok if Date.now() returns the same value for X seconds” where you can specify X. Then it could be cached for that long, and you’d have control over the granularity. Sometimes the Date is incidental and sometimes it’s to grab recent data.
mm right right. that sounds like quite a lot of complexity though, i’m not fully sure what to expect if a real time system has some sort of caching as well! might lead to some hard to catch bugs, and hard to debug if we compound this with third party libraries using it.