pearcy
pearcy2mo ago

External Data Import to Convex with Dagster

I'm encountering issues when using the Convex Streaming Import API (/api/streaming_import/import_airbyte_records) to import data with relationships. I'm using Dagster to orchestrate the process, and I'm not using Airbyte directly. Specifically, I am trying to upsert records into four tables, tools, articleTools, materials, and articleMaterials tables with one json file. 1. Initial Setup: I'm using Dagster to manage the data import. I'm directly calling the Convex Streaming Import API from a Python client. I have a Convex schema with tables for articles, tools, articleTools, materials, and articleMaterials. The relationships are handled through the articleId on articleTools, and articleMaterials tables. I have a function for creating an article first which works correctly. 2. Problem: I successfully upload the base article record. I'm getting errors when importing tools and materials with their relationships to articles. The errors I've encountered have evolved during debugging but the latest error is IndexNotFoundError: Index tools._by_airbyte_primary_key not found. 3. Debugging Attempts: Initially, I was receiving "code":"MissingStream" errors and I resolved this by providing table schemas in the payload. I then received BadJsonBody errors, and those were resolved by: - Ensuring primaryKey is an array of strings. - Adding a jsonSchema with properties to correctly define the types of the fields. - Ensuring primaryKey in the schema is also defined as a list. - Finally ensuring primaryKey is an array of arrays. After all these fixes, I'm back to getting IndexNotFoundError: Index tools._by_airbyte_primary_key not found.
5 Replies
pearcy
pearcyOP2mo ago
4. Current status The base article record is being created correctly using the /api/streaming_import/import_airbyte_records endpoint. We have confirmed that the BadJsonBody errors are no longer an issue because we are sending the correct payload. We are receiving IndexNotFoundError: Index tools._by_airbyte_primary_key not found. which indicates the streaming import endpoint is expecting a specific index _by_airbyte_primary_key. Questions: a. Is Airbyte a hard dependency for using the /api/streaming_import/import_airbyte_records endpoint, or can clients like Dagster directly connect and utilize this API? b. It's peculiar that the article records are successfully upserted into the articles table using the same API endpoint, but the associated tools, articleTools, materials, and articleMaterials records fail with an index error. Is there a different behavior/configuration expected for related tables? c. Is it necessary to send all records in a single request? Could the issue be related to the fact that I'm performing a staged import: 1. First import articles. 2. Then import tools with article references. 3. Finally import materials with article references. Is there correct way to handle related data/foreign keys with streaming import? I initially attempted a single-stage import but encountered errors, which led me to break the process into the current staged approach. And I can successfully upsert this same JSON data using my React Native/Convex application, but I am aiming for a more scalable, robust solution using the streaming import API within my data pipeline, orchestrated by Dagster. Any suggestions would be greatly appreciated.
ballingt
ballingt2mo ago
What does directly calling the streaming import API from a Python client mean, a Python Airbyte client? a Python Dagster client? We've only tested these APIs with Airbyte, not using them directly without Airbyte. Re bad json body errors it sounds like the usability isn't great without using Airbyte if so, are you maybe looking for a different streaming import API? i.e. do you have a feature request for a different API, or might it make sense to write your own HTTP endpoints for your deployment for what you need here?
pearcy
pearcyOP2mo ago
Thanks Tom, I started with Airbyte but got errors so tried connecting with Dagster directly. This helps. I will go back and try get it to work with Airbyte. Feel free to close this out.
ballingt
ballingt2mo ago
@pearcy I just discovered some issues with configuring a new Convex destination with Airbyte, it you're doing that you should hold off: specifically, 404 errors when you set it up. I'll update here when that's been resolved.
pearcy
pearcyOP2mo ago
@ballingt ok, I just posted my 404 error. I will keep an eye out on this other post since it is specific to this error. https://discord.com/channels/1019350475847499849/1322301443507621941/1322301443507621941

Did you find this page helpful?