WeamonZ
WeamonZ•2mo ago

Using image with agents

I'm trying to allow my agent to use an image URL I send him and call a tool to describe the image(s). The problem is that by providing it in my content such as : {type: "image", [...]} my agent can't retrieve the URL anymore. So in the following calls, it can't perform actions on my image. I found a solution for it to work but it's to add an other message in my content with the image url (but it's ugly AND, how am I supposed to hide it from the user ?). That way my agent will have the url for further tool called.
export const unauthed = mutation({
args: {
threadId: v.string(),
prompt: v.string(),
imageUrls: v.array(v.string())
},
handler: async (ctx, { threadId, prompt, imageUrls }) => {
// Build the message content with text and images
const content: Array<ModelMessage> = [
...imageUrls.map((url) => [{
type: "image" as const,
image: new URL(url),
mediaType: "image/png",
},
{
type: "text" as const,
text: `Image url: "${new URL(url)}".`
}]).flat(),
{ type: "text", text: prompt }
]

const { messageId } = await saveMessage(ctx, components.agent, {
threadId, message: { role: "user", content }
});

await ctx.scheduler.runAfter(0, internal.llm.messages.text._internal, {
promptMessageId: messageId,
threadId,
});
},
});


export const _internal = internalAction({
args: { threadId: v.string(), promptMessageId: v.string() },
handler: async (ctx, { threadId, promptMessageId }) => {
console.log("Basic Agent called");
const { thread } = await basicAgent.continueThread(ctx, { threadId });
await thread.streamText({ promptMessageId }, { saveStreamDeltas: { chunking: "line", throttleMs: 1000 } });
},
});
export const unauthed = mutation({
args: {
threadId: v.string(),
prompt: v.string(),
imageUrls: v.array(v.string())
},
handler: async (ctx, { threadId, prompt, imageUrls }) => {
// Build the message content with text and images
const content: Array<ModelMessage> = [
...imageUrls.map((url) => [{
type: "image" as const,
image: new URL(url),
mediaType: "image/png",
},
{
type: "text" as const,
text: `Image url: "${new URL(url)}".`
}]).flat(),
{ type: "text", text: prompt }
]

const { messageId } = await saveMessage(ctx, components.agent, {
threadId, message: { role: "user", content }
});

await ctx.scheduler.runAfter(0, internal.llm.messages.text._internal, {
promptMessageId: messageId,
threadId,
});
},
});


export const _internal = internalAction({
args: { threadId: v.string(), promptMessageId: v.string() },
handler: async (ctx, { threadId, promptMessageId }) => {
console.log("Basic Agent called");
const { thread } = await basicAgent.continueThread(ctx, { threadId });
await thread.streamText({ promptMessageId }, { saveStreamDeltas: { chunking: "line", throttleMs: 1000 } });
},
});
2 Replies
Convex Bot
Convex Bot•2mo ago
Thanks for posting in <#1088161997662724167>. Reminder: If you have a Convex Pro account, use the Convex Dashboard to file support tickets. - Provide context: What are you trying to achieve, what is the end-user interaction, what are you seeing? (full error message, command output, etc.) - Use search.convex.dev to search Docs, Stack, and Discord all at once. - Additionally, you can post your questions in the Convex Community's <#1228095053885476985> channel to receive a response from AI. - Avoid tagging staff unless specifically instructed. Thank you!
WeamonZ
WeamonZOP•2mo ago
// tool
export const imageDescription = createTool({
description: `Generate an extremely detailed, comprehensive description of an image.`,
args: z.object({
imageUrl: z.url().describe("The URL of the image to analyze and describe in detail"),
}),
[...]
// tool
export const imageDescription = createTool({
description: `Generate an extremely detailed, comprehensive description of an image.`,
args: z.object({
imageUrl: z.url().describe("The URL of the image to analyze and describe in detail"),
}),
[...]
export const _internal = internalAction({
args: { threadId: v.string(), promptMessageId: v.string(), imageUrls: v.array(v.string()) },
handler: async (ctx, { threadId, promptMessageId, imageUrls }) => {
console.log("Basic Agent called");
const { thread } = await basicAgent.continueThread(ctx, { threadId });
await thread.streamText({ promptMessageId }, {
contextHandler: async (ctx, context) => {
return [
...context.allMessages,
...imageUrls.map((url) => ({
role: "user" as const,
content: [{
type: "text" as const,
text: `Use the following image url: "${url}".`
}]
}))
]
},
saveStreamDeltas: { chunking: "line", throttleMs: 1000 }
});
},
});
export const _internal = internalAction({
args: { threadId: v.string(), promptMessageId: v.string(), imageUrls: v.array(v.string()) },
handler: async (ctx, { threadId, promptMessageId, imageUrls }) => {
console.log("Basic Agent called");
const { thread } = await basicAgent.continueThread(ctx, { threadId });
await thread.streamText({ promptMessageId }, {
contextHandler: async (ctx, context) => {
return [
...context.allMessages,
...imageUrls.map((url) => ({
role: "user" as const,
content: [{
type: "text" as const,
text: `Use the following image url: "${url}".`
}]
}))
]
},
saveStreamDeltas: { chunking: "line", throttleMs: 1000 }
});
},
});
Ok I added it to the context when generating the text, instead of in the save message, is this the right way, how will I be able to re-use it later.. It's only available for the current call 🤔 ?
Edit: Mhh it's not available. Pretty obvious ahah
Should I save the url as a system prompt instead ? What's the good practice to save hidden data for the LLM to use, in the chat ?
await saveMessage(ctx, components.agent, {
threadId, message: {
role: "system", content: `The following image urls have been provided: ${imageUrls.join(", ")}`
},
});
await saveMessage(ctx, components.agent, {
threadId, message: {
role: "system", content: `The following image urls have been provided: ${imageUrls.join(", ")}`
},
});
Maybe I should add it as a system prompt instead ? It's now persistent and my agent correctly retrieve the image url, but... now it's visible in the should (ofc I can filter it out), and I don't think it's the correct way to add persistent context anyway 🤔

Did you find this page helpful?