machadokd.
machadokd.
CCConvex Community
Created by machadokd. on 6/6/2024 in #support-community
Group by without hitting database limits
I tried to do this:
export const getMostCreative = query({
args: { paginationOpts: paginationOptsValidator },
handler: async (ctx, args) => {
const data = await ctx.db
.query("scraped_ads")
.withIndex("byCreativeId")
.order("desc")
.paginate(args.paginationOpts)

// Group the data by creativeId
const groupedByCreativeId = Object.groupBy(data.page, ad => ad.creativeId);

// Convert the grouped data into an array of arrays
const formattedData = Object.entries(groupedByCreativeId);

// Sort the array by the number of occurrences (length of the inner arrays)
formattedData.sort((a, b) => b[1].length - a[1].length);

return {
...data,
page: formattedData,
}
},
});
export const getMostCreative = query({
args: { paginationOpts: paginationOptsValidator },
handler: async (ctx, args) => {
const data = await ctx.db
.query("scraped_ads")
.withIndex("byCreativeId")
.order("desc")
.paginate(args.paginationOpts)

// Group the data by creativeId
const groupedByCreativeId = Object.groupBy(data.page, ad => ad.creativeId);

// Convert the grouped data into an array of arrays
const formattedData = Object.entries(groupedByCreativeId);

// Sort the array by the number of occurrences (length of the inner arrays)
formattedData.sort((a, b) => b[1].length - a[1].length);

return {
...data,
page: formattedData,
}
},
});
And it didn't do what I wanted, it keep the most creativeId occurencies out of the query
11 replies
CCConvex Community
Created by machadokd. on 6/6/2024 in #support-community
Group by without hitting database limits
I have a document like this:
{
adArchiveID: 1568306660630991,
ad_creation_time: "2024-04-13",
ad_delivery_start_time: "2024-04-13",
ad_delivery_stop_time: "2024-05-07",
ad_snapshot_url: "https://www.facebook.com/ads/archive/render_ad/?id=1568306660630991",
creativeId: 120207068126630300,
spend: "R$6 mil a R$7 mil",
startDate: 1712991600,
stateMediaRunLabel: null
}
{
adArchiveID: 1568306660630991,
ad_creation_time: "2024-04-13",
ad_delivery_start_time: "2024-04-13",
ad_delivery_stop_time: "2024-05-07",
ad_snapshot_url: "https://www.facebook.com/ads/archive/render_ad/?id=1568306660630991",
creativeId: 120207068126630300,
spend: "R$6 mil a R$7 mil",
startDate: 1712991600,
stateMediaRunLabel: null
}
And I want to group by creativeId and order by the most occurencies. I will then expose this on my client side. I took a look at the indexes but those seems like a perfomant way to query data, not to do what I want. My fear of using pagination is to miss on these aggregation. Say one page comes like:
page1: [{creativeId: 1, ...}, {creativeId: 2,...},...]
page2: [{creativeId: 10, ...}, {creativeId: 1,...},...]
page1: [{creativeId: 1, ...}, {creativeId: 2,...},...]
page2: [{creativeId: 10, ...}, {creativeId: 1,...},...]
Then I would miss the creativeId = 1 from page 2, if I load only the first page. I though about doing another table where I would store the reference and when inserting on the table1 I would also insert on table 2 with this data already processed, like:
Table1
{
_id: 1,
adArchiveID: 1568306660630991,
ad_creation_time: "2024-04-13",
ad_delivery_start_time: "2024-04-13",
ad_delivery_stop_time: "2024-05-07",
ad_snapshot_url: "https://www.facebook.com/ads/archive/render_ad/?id=1568306660630991",
creativeId: 120207068126630300,
spend: "R$6 mil a R$7 mil",
startDate: 1712991600,
stateMediaRunLabel: null
}
Table1
{
_id: 1,
adArchiveID: 1568306660630991,
ad_creation_time: "2024-04-13",
ad_delivery_start_time: "2024-04-13",
ad_delivery_stop_time: "2024-05-07",
ad_snapshot_url: "https://www.facebook.com/ads/archive/render_ad/?id=1568306660630991",
creativeId: 120207068126630300,
spend: "R$6 mil a R$7 mil",
startDate: 1712991600,
stateMediaRunLabel: null
}
Table2
{
creativeId: 120207068126630300,
table1Ids: [1],
count: 1
}
Table2
{
creativeId: 120207068126630300,
table1Ids: [1],
count: 1
}
Then I could create a query from table 2 sorting from the count and grabbing the info I want from table 2 according to the results. I wonder if there's another way I can do that with convex.
11 replies
CCConvex Community
Created by machadokd. on 6/6/2024 in #support-community
Group by without hitting database limits
The table is big, holding thousands of records
11 replies
CCConvex Community
Created by machadokd. on 6/6/2024 in #support-community
Group by without hitting database limits
My use case is that I scrape a website every 15 minutes. And I wan to group the results by a specific field. I thought I could make the db group that for me. But I need to do that aggregation in memory. I’m afraid of using pagination and missing some records
11 replies