Convex <> Airbyte <> Google sheets
I recently added this integration to sync data to google sheets.
It was working fine with one issue: row updates were not pushed through.
I then changed the airbyte connection to "full refresh & overwrite", which required a reset. After reset, only a portion of my convex table is being sync'ed. Out of 602 rows, only 129 are being sync'ed.
I wrote Airbyte a ticket about this, but wanted to ask here as well if there is a known issue causing my data replication to be incomplete.
14 Replies
hi! this is not a known issue, although i'm not really surprised. a change to Airbyte framework 6 months ago had a similar effect (https://airbytehq.slack.com/archives/C027KKE4BCZ/p1686329017267509). i thought we had fixed it, but it's possible it wasn't fully fixed, or a new bug has been introduced.
Have you considered using Fivetran? Convex users have had better results from our Fivetran integration than Airbyte.
I'll try to repro the issue with Airbyte and let you know how it goes. thanks for reporting!
Thank you. Does Convex or Airbyte maintain this integration? Their support pointed to this line as the potential cause: https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-convex/source_convex/source.py#L127
GitHub
airbyte/airbyte-integrations/connectors/source-convex/source_convex...
Data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes. - airbytehq/airbyte
Convex wrote the code for the connector and somewhat maintains it (we have to send to Airbyte for code review which can take several months) but Airbyte maintains the framework. the
state_checkpoint_interval
does sound relevant since only 129 rows are being sync'd.
I have been able to repro -- it seems airbyte only syncs 128 documents from convex before stopping. Do you happen to have a link to your conversation with Airbyte? If not, I can start a new one
i found the issue in the Convex-maintained code 😢 . it may take a while to get the fix reviewed, so the issue does not affect "incremental" syncs so i recommend using that if you can, or using Fivetran.Hi @lee . Glad you found the issue. It is causing our OPS team a lot of pain because we can't show them updated user data.
Unfortunately fivetran does not seem to have a google sheets destination.
Using incremental syncs will not work because those do not seem to carry out "updates" - only appends new records without making updates to previous records
some airbyte connectors support "updates" with the Append+Deduped mode https://docs.airbyte.com/using-airbyte/core-concepts/sync-modes/incremental-append-deduped
but i'm not sure if google sheets is one of them.
if you're running airbyte open source, you can try patching the fix https://github.com/airbytehq/airbyte/pull/33431 . unfortunately I don't know of a way to get this change into airbyte cloud quickly.
Incremental Sync - Append + Deduped | Airbyte Documentation
High-Level Context
GitHub
[Convex source] fix bug where full_refresh stops after one page by ...
What
Describe what the change is solving
The Convex source connector has a bug where SyncMode == full_refresh causes it to stop the sync after a single page of 128 results. This PR fixes the bug an...
Is it pending review from them? I can highlight to them the urgency of this review if that is the case.
Just heard from airbyte they'll review it tomorrow
Yes it's pending review. Thanks for helping!
Absolutely, thank you.
By the way, it would be great if there was another option similar to append but also updates records that already synced if they changed. Is that on the roadmap?
I don't know how lookups work in spreadsheets, so I understand that may ultimately be more computationally expensive, but if spreadsheets does have an optimized lookup, it would be much better overall (I think)
Some connectors support "Append + Deduped" which sounds like what you're describing, but both the source and destination need to support it. Convex as a source (and as a destination) supports it, but it seems like Google Sheets as a destination does not. I would reach out to the maintainer of the google sheets connector
Got it. Thanks!
I see the fix has now been merged, thank you! And just to be clear - append + dedupe does not update old records if they change, correct?
... so if I want that I should always use the overwrite method... ?
append+dedupe does update old records if they change https://docs.airbyte.com/using-airbyte/core-concepts/sync-modes/incremental-append-deduped
Incremental Sync - Append + Deduped | Airbyte Documentation
High-Level Context
That's great that the fix has been merged! I'm not sure how long it takes for merged changes to make it into the released product. Last i checked, it took 2-6 weeks
GitHub
Releases · airbytehq/airbyte
Data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes. - airbytehq/airbyte
Well that's interesting because some of the records were not updated. There were optional columns that didn't sync once they were added
In append + dedup