winsoroaks
winsoroaks•12mo ago

dumb question on how to allow users to do analytics

hi team! im not there yet but im curious how can i offer analytics for my users. let's assume my user wants to click a button to generate a chart of the revenue per month for the past 12 months. after reading some threads, i see folks here talking bout airbyte, dagster, duckdb, etc.. what's the correct mental model to think about this? is it the case that i stream data out of convex via airbyte (cloud) to another server that hosts the duckdb? which means i can then set up my convex service to make an API call to the duckdb service for the revenue data? pls feel free to correct my wrong understanding. thanks!
4 Replies
winsoroaks
winsoroaksOP•12mo ago
i havent done any analytics related stuff before. pls forgive me for the dumb question 😂 https://discord.com/channels/1019350475847499849/1115667234308767855/1116776181849665546 why is an S3 data lake needed here? cant we just fivetran -> duckdb?
jamwt
jamwt•12mo ago
last question first.. fivetran doesn't directly support a duckdb integration. duckdb runs on top of a data lake like parquet on S3, ergo the replication to an S3 bucket step is the right next step you can just point duckdb to that bucket and run it right away so, running aggregates... if it's a lot of convex data you're pulling together (1) now, if it's a ton of open-ended ad-hoc querying you want to support, you probably need to stream it to a SQL engine (2) in the future, we plan on offering an on-platform OLAP type interface, where you can write SQL SELECT statements in actions to run expensive queries over a slightly-stale replica of your data without needing an external system. we don't have a timeline for shipping this yet, however (3) now, if it's a pretty fixed set of things you're gonna do, and you don't want to involve an external systems, you can just break the aggregation up into several runQuery statement in an action, or even a paginated query in an action that you just walk and aggregate data a page at a time this is more work than SQL for a query like this, but it would work fine and save you integrating with a data warehouse if you don't want to bother with it
winsoroaks
winsoroaksOP•12mo ago
amazing! thanks for the detailed explanation 🙂 quick clarification, "stream it to a SQL engine" alludes to having a different service that hosts duckdb, right? then i'd just API call the service to get the piece of data
jamwt
jamwt•12mo ago
Yeah. Or snowflake, Postgres. Any fivetran destination

Did you find this page helpful?