dumb question on how to allow users to do analytics
hi team! im not there yet but im curious how can i offer analytics for my users. let's assume my user wants to click a button to generate a chart of the revenue per month for the past 12 months.
after reading some threads, i see folks here talking bout airbyte, dagster, duckdb, etc..
what's the correct mental model to think about this?
is it the case that i stream data out of convex via airbyte (cloud) to another server that hosts the duckdb? which means i can then set up my convex service to make an API call to the duckdb service for the revenue data?
pls feel free to correct my wrong understanding. thanks!
4 Replies
i havent done any analytics related stuff before. pls forgive me for the dumb question 😂
https://discord.com/channels/1019350475847499849/1115667234308767855/1116776181849665546 why is an S3 data lake needed here? cant we just fivetran -> duckdb?
last question first.. fivetran doesn't directly support a duckdb integration. duckdb runs on top of a data lake like parquet on S3, ergo the replication to an S3 bucket step is the right next step
you can just point duckdb to that bucket and run it right away
so, running aggregates... if it's a lot of convex data you're pulling together
(1) now, if it's a ton of open-ended ad-hoc querying you want to support, you probably need to stream it to a SQL engine
(2) in the future, we plan on offering an on-platform OLAP type interface, where you can write SQL SELECT statements in actions to run expensive queries over a slightly-stale replica of your data without needing an external system. we don't have a timeline for shipping this yet, however
(3) now, if it's a pretty fixed set of things you're gonna do, and you don't want to involve an external systems, you can just break the aggregation up into several
runQuery
statement in an action, or even a paginated query in an action that you just walk and aggregate data a page at a time
this is more work than SQL for a query like this, but it would work fine and save you integrating with a data warehouse if you don't want to bother with itamazing! thanks for the detailed explanation 🙂
quick clarification, "stream it to a SQL engine" alludes to having a different service that hosts duckdb, right? then i'd just API call the service to get the piece of data
Yeah. Or snowflake, Postgres. Any fivetran destination