erquhart•2y ago

Extended validation in the schema

The schema is an ideal place to express validation constraints, but it's pretty limited right now. I'm aware we can make zod work with convex, but the challenge is we then have to define loose validation with convex, and then extended validation separately with something else. I'm going to come up with some sort of pattern for my own use, but curious if there's been any thought on this with the team.

12 Replies

Michal Srb•2y ago

It should be possible to wrap ctx.db to perform validation on writes (or reads), beyond what the database does. See Ian's latest on this topic: https://stack.convex.dev/typescript-zod-function-validation#can-i-use-zod-to-define-my-database-types-too It's possible that in the future we'll add a layer that'll allow for this and similar needs like defaults and row/field-level security/authorization.

Using Zod with TypeScript for Server-side Validation and End-to-End...

Use Zod with TypeScript for argument validation on your server functions allows you to both protect against invalid data, and define TypeScript types ...

erquhartOP•2y ago

I'm more looking at where the types for validation are set. I want to find a way to set them in line with the schema definition. I currently have to define a field as numeric in the schema, and then somewhere else define that it is limited to ten characters. I want to do those in the same place. I'm thinking about making a higher level schema where field definitions can be expanded, and then running that through a parser to generate the schema that convex needs. But wanted to see if the team had thoughts first.

jamwt•2y ago

I'll expand on this a little bit, and it's another instance of having "compressed" the database layer and like the user eronomics this early, just because of how much time it takes to build each. we've talked about ideas about a higher-level ORM type thing that has more powerful validations on it that would go on top of the schema definition. the reason why we'd always have something like the current schema definition is we just happen to be using typescript right now to create the canonical definition of the types actually at rest in the database so I'd view this other eventual layer as a great way to make everything really well conforming in, say, the JS/TS environment, which probably includes all the UDFs right now (and for a long time to come) the reason we also need a baseline definition of the database types is we run into things like export/import, integration with SQL, streaming jobs out to other services, where there is a need to have a pretty fundamental definition of the column types. in all of these other places, more sophisticated validators may not run because of a lower-level interface with some other more basic system, like a data warehouse or DuckDB or something like that so there still is a bit of a necessary distinction we need to maintain between the clarify on what actually is at rest in the database, and the code we run in, say, the UDF enviroment to make it more likely those value confirm to certain constraints so, when it comes to what the team is actually doing with higher-level validators, most of the active work right now is around @ian 's work with zod

erquhartOP•2y ago

Thanks for the breakdown! Keeping a bottom layer of universal types definitely makes sense, I'll deal with the separation of definitions for now.

jamwt•2y ago

cool. we'd love to see what you come up with, because we definitely know people want and need more power here also @RJ 's work with Effect + convex is maybe yet another way to have more powerful/opinionated value specifications that layer over our schema definition

ian•2y ago

I'm thinking about making a higher level schema where field definitions can be expanded, and then running that through a parser to generate the schema that convex needs.

The Zod post didn't call it out very loudly, but the Zod helpers already come with two functions: zodToConvex(z.*) and zodToConvexFields({ fieldName: z.*, ... }). Both produce convex validators recursively (v.*). So you can define your higher level schema in zod and generate Convex schema for defineTable etc. I'd like to make some Zod helper that turns a zod schema ({field: z.*...) into both tables, and an associated reader / writer wrapper that runs the right pardser on the right table, so you could go full-zod without a lot of manual shlep. I don't want to put something out prematurely though, so I won't rush it out in the next couple weeks

erquhartOP•2y ago

@ian oh nice! So they're able to convert a more complex parser, like z.string().email(), into a simple v.string() for the convex schema?

ian•2y ago

Yup! Check this out: https://github.com/get-convex/convex-helpers/blob/main/convex/zodExample.ts#L30 Finer-grained string types, optional types, branded, readonly, nullable, strict, discriminated unions, effects & pipelines, etc. I even have an example turning a string -> Date -> string at the bottom of the file (where the Date type is in the function but only string on the wire)

erquhartOP•2y ago

Awesome, definitely using this. Thank you!

RJ•2y ago

Aye, this is exactly what I'm working on, but with the Effect ecosystem's Schema library rather than Zod. I also have a working Effect Schema to Convex Validator compiler (https://github.com/rjdellecese/effect-convex/blob/755f46db3012940f7d242ea6f55c5210352d23b4/src/schema-to-validator-compiler.ts), which exposes functions that operate exactly how Ian's zodToConvex and zodToConvexFields do. I also want to wrap other Convex APIs such that you can define your schema, read/write (decode/encode) data from your database in terms of a richer schema language (Schema in my case). I think it will be a little easier with Schema, because each Schema contains a decoder and an encoder (the type is actually Schema<From = To, To>), whereas in Zod you'll have to define a separate Zod decoder and encoder for each field. My goals with this Effect/Convex library are broader than just this (see https://discord.com/channels/1019350475847499849/1019350478817079338/1157131989825093772 for more), but I think this is the highest-value first feature! It would be really neat to see this functioning for Zod, too @ian. I'm sure you're much more familiar with the Convex JS codebase than I am, but if you ever want to compare notes on your implementation of this for Zod, let me know This all makes me think of this cool diagram (figure 2) from some UW CS course notes (https://courses.cs.washington.edu/courses/cse341/04wi/lectures/13-dynamic-vs-static-types.html), and this awesome talk by Runar Bjarnason (https://www.youtube.com/watch?v=GqmsQeSzMdw), in which he coins (or at least popularized?) the phrase "Constraints Liberate, Liberties Constrain". Any given user of Convex has a lot of control over the context in which they need or want to interface with their Convex database. I might choose to only interface with (or at least write to) Convex via a TypeScript application. This gives me a lot of power, if I use some nice high-level abstractions like Ian's Zod helpers, or my Effect/Schema library, to ensure that my data obeys certain rules/is correct in great detail. But the more you try to interface with other systems/languages, the more you need to relax the constraints on your data in order to have a constraint-definition language that actually applies its constraints to all of the systems/languages that you want to support (assuming, of course, that that's your goal). This is one reason, I think, why it's tricky for Convex to take their current Validator approach much further than it has been taken so far.

jamwt•2y ago

yep, and that's exactly why we haven't. even if more powerful things layer on top, we also want to make sure we keep available and unobscured a bare-bones representation of exactly what's in the database that will be universally recognized by all entities interacting with the database over the next 10+ years of your company when you have tons of systems and millions of lines of code 😛

jamwt•2y ago

it's possible it won't always be typescript that describes this (or, even, is maybe the only way to describe it?) but it's definitely the way for now. a good parallel is pulumi (https://www.pulumi.com/) supporting python, typescript, go, etc. it's more accurate to think that right now Convex schemas are using typescript as a DSL to build a model representation rather than for "running code"

pulumi

Pulumi - Infrastructure as Code in any Programming Language

Pulumi's open source infrastructure as code SDK enables you to create, deploy, and manage infrastructure on any cloud, using your favorite languages.

Extended validation in the schema

Did you find this page helpful?