Large-Scale Record Updates in OpenSearch

anyone ever faced doing large-scale updates of records in OpenSearch? Use case / questions in thread.

we have ~20M records in OpenSearch at any point in time. they each have a primary key and are transaction records with a unique ID.

we do some out-of-band processing that sometimes takes a while. ideally, we want to be able to put the processing results together with the record, but there’s also an SLA for getting the records into OpenSearch within 2-3 min of initial record receipt.

we’re faced with the question of either:

A) delaying the initial write until the processing is done.

B) updating the records, which is expensive. It’s essentially:
• scan the whole table
• find the record ID
• update the record
we’ve been also wondering about maybe taking a different workflow:
• write the initial record
• run the processing async
• fetch and then delete the initial record
• write the updated record

This really belongs in . However, it’s in this thread now. Is this a regular job you’re discussing or a one-off migration? I’m assuming the former. Is the transaction ID not the PK? If not, why not? If so, why would there need to be a whole table scan on an update?

it’s a regular job and the ID is the PK. maybe we’re misunderstanding the table scan aspect…