WeamonZ•2mo ago

Using image with agents

I'm trying to allow my agent to use an image URL I send him and call a tool to describe the image(s). The problem is that by providing it in my content such as : {type: "image", [...]} my agent can't retrieve the URL anymore. So in the following calls, it can't perform actions on my image. I found a solution for it to work but it's to add an other message in my content with the image url (but it's ugly AND, how am I supposed to hide it from the user ?). That way my agent will have the url for further tool called.

export const unauthed = mutation({
  args: {
    threadId: v.string(),
    prompt: v.string(),
    imageUrls: v.array(v.string())
  },
  handler: async (ctx, { threadId, prompt, imageUrls }) => {
    // Build the message content with text and images
    const content: Array<ModelMessage> = [
      ...imageUrls.map((url) => [{
        type: "image" as const,
        image: new URL(url),
        mediaType: "image/png",
      },
      {
        type: "text" as const,
        text: `Image url: "${new URL(url)}".`
      }]).flat(),
      { type: "text", text: prompt }
    ]

    const { messageId } = await saveMessage(ctx, components.agent, {
      threadId, message: { role: "user", content }
    });

    await ctx.scheduler.runAfter(0, internal.llm.messages.text._internal, {
      promptMessageId: messageId,
      threadId,
    });
  },
});


export const _internal = internalAction({
  args: { threadId: v.string(), promptMessageId: v.string() },
  handler: async (ctx, { threadId, promptMessageId }) => {
    console.log("Basic Agent called");
    const { thread } = await basicAgent.continueThread(ctx, { threadId });
    await thread.streamText({ promptMessageId }, { saveStreamDeltas: { chunking: "line", throttleMs: 1000 } });
  },
});

export const unauthed = mutation({
  args: {
    threadId: v.string(),
    prompt: v.string(),
    imageUrls: v.array(v.string())
  },
  handler: async (ctx, { threadId, prompt, imageUrls }) => {
    // Build the message content with text and images
    const content: Array<ModelMessage> = [
      ...imageUrls.map((url) => [{
        type: "image" as const,
        image: new URL(url),
        mediaType: "image/png",
      },
      {
        type: "text" as const,
        text: `Image url: "${new URL(url)}".`
      }]).flat(),
      { type: "text", text: prompt }
    ]

    const { messageId } = await saveMessage(ctx, components.agent, {
      threadId, message: { role: "user", content }
    });

    await ctx.scheduler.runAfter(0, internal.llm.messages.text._internal, {
      promptMessageId: messageId,
      threadId,
    });
  },
});


export const _internal = internalAction({
  args: { threadId: v.string(), promptMessageId: v.string() },
  handler: async (ctx, { threadId, promptMessageId }) => {
    console.log("Basic Agent called");
    const { thread } = await basicAgent.continueThread(ctx, { threadId });
    await thread.streamText({ promptMessageId }, { saveStreamDeltas: { chunking: "line", throttleMs: 1000 } });
  },
});

2 Replies

Convex Bot•2mo ago

Thanks for posting in <#1088161997662724167>. Reminder: If you have a Convex Pro account, use the Convex Dashboard to file support tickets. - Provide context: What are you trying to achieve, what is the end-user interaction, what are you seeing? (full error message, command output, etc.) - Use search.convex.dev to search Docs, Stack, and Discord all at once. - Additionally, you can post your questions in the Convex Community's <#1228095053885476985> channel to receive a response from AI. - Avoid tagging staff unless specifically instructed. Thank you!

WeamonZOP•2mo ago

// tool
export const imageDescription = createTool({
  description: `Generate an extremely detailed, comprehensive description of an image.`,
  args: z.object({
    imageUrl: z.url().describe("The URL of the image to analyze and describe in detail"),
  }),
[...]

// tool
export const imageDescription = createTool({
  description: `Generate an extremely detailed, comprehensive description of an image.`,
  args: z.object({
    imageUrl: z.url().describe("The URL of the image to analyze and describe in detail"),
  }),
[...]

export const _internal = internalAction({
  args: { threadId: v.string(), promptMessageId: v.string(), imageUrls: v.array(v.string()) },
  handler: async (ctx, { threadId, promptMessageId, imageUrls }) => {
    console.log("Basic Agent called");
    const { thread } = await basicAgent.continueThread(ctx, { threadId });
    await thread.streamText({ promptMessageId }, {
      contextHandler: async (ctx, context) => {
        return [
          ...context.allMessages,
          ...imageUrls.map((url) => ({
            role: "user" as const,
            content: [{
              type: "text" as const,
              text: `Use the following image url: "${url}".`
            }]
          }))
        ]
      },
      saveStreamDeltas: { chunking: "line", throttleMs: 1000 }
    });
  },
});

export const _internal = internalAction({
  args: { threadId: v.string(), promptMessageId: v.string(), imageUrls: v.array(v.string()) },
  handler: async (ctx, { threadId, promptMessageId, imageUrls }) => {
    console.log("Basic Agent called");
    const { thread } = await basicAgent.continueThread(ctx, { threadId });
    await thread.streamText({ promptMessageId }, {
      contextHandler: async (ctx, context) => {
        return [
          ...context.allMessages,
          ...imageUrls.map((url) => ({
            role: "user" as const,
            content: [{
              type: "text" as const,
              text: `Use the following image url: "${url}".`
            }]
          }))
        ]
      },
      saveStreamDeltas: { chunking: "line", throttleMs: 1000 }
    });
  },
});

Ok I added it to the context when generating the text, instead of in the save message, is this the right way, how will I be able to re-use it later.. It's only available for the current call 🤔 ?

Edit: Mhh it's not available. Pretty obvious ahah

Should I save the url as a system prompt instead ? What's the good practice to save hidden data for the LLM to use, in the chat ?

    await saveMessage(ctx, components.agent, {
      threadId, message: {
        role: "system", content: `The following image urls have been provided: ${imageUrls.join(", ")}`
      },
    });

    await saveMessage(ctx, components.agent, {
      threadId, message: {
        role: "system", content: `The following image urls have been provided: ${imageUrls.join(", ")}`
      },
    });

Maybe I should add it as a system prompt instead ? It's now persistent and my agent correctly retrieve the image url, but... now it's visible in the should (ofc I can filter it out), and I don't think it's the correct way to add persistent context anyway 🤔

Using image with agents

Did you find this page helpful?