I am trying to determine what is more cost effective when it comes to lambda execution sqs trigger, processing a single message at a time or multiple messages.
I have a SQS and every hour it receives around 40k messages. I have a lambda function that processes the messages on this Q. Currently the lambda function has batchSize=1 and maxConcurrency=50 _I_s there some way i can determine if its more cost effective for each exec to process multiple messages, without actually making this change.
Also interested if you’ve done this kind of testing before and what you found out if anything.
there are way too many variables to determine this. How much setup work you do OUTSIDE the main handler is a big component of how much better it will be to change the batch size. But in-batch error rate is important - if every other batch will be thrown away because of a failure, that won’t be good. Your best bet is to ensure your lambda is coded properly and then just test different batch sizes and concurrency counts. The right answer could be 1/50 or 100/100.
the other variable you aren’t noting is how much memory (and so by proxy how much CPU) you have allocated to the lambda (and whether the underlying runtime can make use of it). There are times a 128MB lambda works, but a 512MB lambda works more than 4x faster, so it’s cheaper.
I had to icnrease the memory usage from 128mb to 512mb because 40% of the messages ( for example) could require me to fetch a large amount of data and therefore requires more memory to buffer 4 out of 10 times before storing the data into s3.
but i dont know if its giong to need 512mb of ram until ive fetched the data ( at which poitn the lambda is already executing ). So ive had to set a default value to of ram to something much higher than 60% of the messages
if you can interrogate the size of the data you need BEFORE fetching it (i.e. like doing a HEAD on the object before GET in S3) then maybe you want to migrate this to a step functions workflow - the first step just figures out whether you need the “BIG” lambda or the “TINY” lambda and farms it off to the appropriate SQS queue. Then you run a 128MB lambda to categorize, another 128MB lambda to process “small” batches, and a 512MB lambda to process big batches (or maybe the batch size trigger on the large is only 2 or 3 instead of 10, and you can keep it 128MB or 256MB).
How much setup work you do OUTSIDE the main handler - since lambda contexts are reused across function invocations in limited time spans, if you do all the heavy setup work outside the main handler, you can gain the benefit of reusing that over multiple invocations. Setting up sdk clients, fetching secrets, connecting to DB connection pools (or making one), computing a complex hash, downloading binaries to tmp, whatever else you’re doing, if you can put it outside the handler, you can save yourself that time on the next invocation, you only pay the tax on what’s outside the main handler on cold start, but once it’s warm, you can keep reusing it. Even S3 = boto.client("s3") takes a few ms, no need to do it inside the handler.
By default, if your function encounters an error while processing a batch, all messages in that batch become visible in the queue again. For this reason, your function code must be able to process the same message multiple times without unintended side effects.