punnP
Convex Community3y ago
4 replies
punn

Taking latest timestamp for duplicates

For a specific conversation, our app fetches messages from an external service and stores them in our convex
messages
table. Before storage, we query
messages
to see if there are any existing instances with the same
userId
and
messageId
. If it exists, we
patch()
, otherwise we
insert()
.

As we're running through this workflow for 1000s of messages, we split them and call our storage mutation for each batch of messages. We also use the scheduler to run this message fetching for multiple conversations.

Since we're reading and mutation the same fields, this raised the following error:
OptimisticConcurrencyControlFailure: Data read or written in this mutation changed while it was being run. Consider reducing the amount of data read by using indexed queries with selective index range expressions (https://docs.convex.dev/database/indexes/).


As @presley mentioned, we should either delay each scheduled fetching/storage flow, or ensure we aren't mutating/reading the same fields across scheduled actions.

Not querying the table before storage and just inserting the
userId
,
messageId
field is one possible solution. This would slightly increase complexity (and delay?) when fetching messages for specific conversation (i.e. some
userId, messageId, convoId
). Since we're fetching new messages frequently this might bloat our tables even if we have some daily cleanup cronjob.

Which solution approaches are recommended for this kind of workflow?
Was this page helpful?