allen
allen2y ago

Looking into the Airbyte integration

Looking into the Airbyte integration this morning, and it doesn't appear as a supported connector from the dashboard. Wondering if this is something the Convex team manages via Airbyte or if I should reach out to them directly.
No description
29 Replies
allen
allenOP2y ago
After searching their Slack it appears that perhaps its only currently supported on the OSS installs and not the cloud offering. I also see some chatter in that Slack about having Convex be a destination as well as a source. What is the ETA on that?
Indy
Indy2y ago
Hi Allen, we're not in their dashboard. Convex as a source is available on the OSS side. https://docs.convex.dev/using/integrations/airbyte
Using Convex with Airbyte | Convex Developer Hub
Analyze your Convex data by exporting via Airbyte.
Indy
Indy2y ago
We wouldn't mind you nudging them as a user to get on their dashboard. 😉 Convex as a destination is still an open Pull Request on the airbyte project. We're still waiting on them to review it. For future product planning for us: what integrations matter for you?
allen
allenOP2y ago
Honestly, I'm not strong on the data eng side and my understanding of that stack is limited. I only pursued Airbyte because it looked turnkey from your docs. One of the appeals to Supabase was the psql backend, as it lent it self to a lot of existing integration points, as well as being able to do some more complex querying within Supabase itself. I'm about 90% decided to use convex over supabase, but the querying limitations definitely have given me pause. Seems to me if I could get data streaming into BigQuery, RedShift or a similar warehouse, that would unblock me for the most part, as it would open up a massive world of tooling. Pushing aggregated views of data back to supabase so it could be consumed by my app would be ideal as well. Having the ability to do basic partial text search and get counts back on queries would be simplify things as well.
jamwt
jamwt2y ago
hey @allen ! the good news is all this is underway right now with the team. I'll address a few different points here, but feel free to ask follow ups, and/or I'm happy to jump on a call 1. I'm meeting with the airbyte PM in charge of integrations soon. we're working on getting into the cloud product for egress + ingress, I hope that will be solved soon so you don't have to run your own instance 2. we've discussed an "out of the box" simple OLAP-type thing, and this would probably amount to us running airbyte + clickhouse or something for pro accounts. but the complexity would be managed by convex so you'd have a slightly-delayed, read-only, but highly performant SQL engine to do whatever analysis you want on your convex data without having to set up your own system. we don't have a timeline on this though, we're still weighing it against other priorities. this would be lableled something like "Convex OLAP" and would be useful for sure so you don't have to string together your own airbyte + SQL solution (we'd probably solve the "full SQL" situation this way so we can keep the OLTP core very fast and available, as opposed to directly exposing SQL on the convex data) 3. we have an in-house search system that's close to beta, but we're still discussing timelines on releasing it. may need a bit more work. this would let you do full text searches on in-convex values
jamwt
jamwt2y ago
3a. in the long run, our intention is the "industrial grade" solution for search once again is predicated on smooth airbyte integration, which is why we've invested in that early. namely, you can do airbyte -> elasticsearch (via this connector: https://docs.airbyte.com/integrations/destinations/elasticsearch/ )
jamwt
jamwt2y ago
Seems to me if I could get data streaming into BigQuery, RedShift or a similar warehouse, that would unblock me for the most part, as it would open up a massive world of tooling.
Definitely 💯 . This is the promise of airbyte for something like Convex, is you can get into those systems, or PostgreSQL, or any other place to use your Convex data basically anywhere
allen
allenOP2y ago
Thanks for the insights, @jamwt . Seems like you are aware of the gaps and filling them accordingly. Let me know if I can beta anything and offer feedback. I'll be looking to go to market in the next ~60 days. Anything coming online in that timeframe, even as a preview release?
jamwt
jamwt2y ago
on the airbyte front, I just got this intro from the airbyte CEO yesterday, so I'll follow up with you when I have more info on the timeline to get into the cloud product and to land the destination connector. on search, I'll defer to @james who was chatting with the team about the state of built-in search yesterday and thanks for the offer about feedback! definitely, keep it coming 😄
allen
allenOP2y ago
in-convex sql replica sounds very appealing The elastic search approach makes sense at a high level, I'm just unsure how it would practically come together... Execute an action that returns document IDs that then executes a query to to fetch the documents in a reactive state?
jamwt
jamwt2y ago
yeah, so there's a longer consideration here because search means a lot of different things. part of why we've invested a bit into 1st party search is having simple "application search" just work out of the box and have a consistent + subscription-capable relationship to everything else in convex. for many apps, this is all they need by search. it's close to what postgres calls search and then for other things, people want sophisticated stemming, ranking, multiple languages / locales, large documents, etc etc. something closer to alogolia or elastic. and it's unlikely convex would build that in house as opposed to recommending solving those kinds of use cases with an integration
allen
allenOP2y ago
That makes sense. Basic fuzzy field search would solve for a lot, leaving the heavier search functionality to ES. What about a use case of say Posts that have a relationship to Tags and I want to get the top 10 tags by popularity (count of posts that have those tags). Is this something that your in-convex search would support? Seems like a basic query, but right now I would be looking at a complex data pipeline to try and get that result set back to my app. (or trying to maintain some TopTags table that increments/decrements in sync with mutations)
jamwt
jamwt2y ago
that particular case may best be served with just maintaining the counts as part of the mutation, yep.
allen
allenOP2y ago
I got the Airbyte connection setup in a local container between Convex and BigQuery, however the sync is failing with "Failure Origin: normalization, Message: Something went wrong during normalization". The tables show up in the BigQuery destination, but no data.
james
james2y ago
re. search, we have an internal implementation that allows in-convex search in a transactionally consistent way, and will likely launch in the near future. we have designs for fuzzy search and prefix matching and could add soon after if something like this meets your needs that'd be great. there will still be a gap in functionality between built-in search and elastic, so there will be use cases that fall outside our featureset and will just want to stream to elastic search we haven't directly tested Convex streaming into BigQuery. one would think it'd work since the airbyte destination connector on the BigQuery side should take care of it, but we can test this
allen
allenOP2y ago
Thanks @james . Let me know if the log dump from Airbyte would help. Definitely seems to be something on the Convex side: Sync worker failed. No properties node in stream schema Source did not output any state messages State capture: No state retained.
Indy
Indy2y ago
Thanks Allen! We're looking at this now. We'll update you as soon as we know what's going on.
jamwt
jamwt2y ago
oh, one idea @allen ... is this a pro account? I think airbyte egress is pro accounts only
allen
allenOP2y ago
Ah, it is not. I saw that in the docs, but everything seemed to wire up fine.
jamwt
jamwt2y ago
yeah, granted, this is probably not the best error message to clarify if that's indeed what's going on. @Indy @Emma -- maybe that's the issue here?
Indy
Indy2y ago
Yep definitely not the right message if that's the issue (it probably is, but we'll verify for sure).
Emma
Emma2y ago
hey @allen! I'm looking into this - would you mind sending me the logs from the failed sync?
allen
allenOP2y ago
DMed you
allen
allenOP2y ago
Also, seems to be related as it started after my sync attempts -- my whole Airbyte installed is nuked, with every route serving the attached because of the shown error.
No description
Emma
Emma2y ago
@allen and I resolved this in DMs, but the summary is that: 1. Sync won't work if you don't have a pro account. This is stated in docs but it's confusing that it partially succeeds and doesn't have a good error message in the airbyte logs. (we can improve this!) 2. The source schema was stale, so Allen continued to get normalization errors after upgrading to a pro plan. Remember to refresh your source schema when it changes! Helpful debugging tips: - deselecting tables helped allen locate the table that wasn't normalizing successfully
allen
allenOP2y ago
Thanks so much for your help @Emma !
Emma
Emma2y ago
@allen feel free to add anything I missed here! you're welcome 🙂
Indy
Indy2y ago
Yey! Glad to hear it's up and running!
Emma
Emma2y ago
@allen btw in case you run into more convex-airbyte errors, I opened this PR to make them visible. Hopefully they can merge this soon and you can update! https://github.com/airbytehq/airbyte/pull/23797
GitHub
🐛 Convex source connector error messages by emmaling27 · Pull Reque...
What This PR adds well-formatted error messages to the Convex source connector. A Convex user encountered unhelpful error messages when their sync failed. This diff adds error messages that should ...

Did you find this page helpful?