Hi! I am needing to use a server that will allow me to run some pretty expensive tasks on it. For example I am wanting to receive a file from the client, then compress it using FFMPEG, then upload the smaller, compressed file to cloud storage. I have tried to use Firebase functions and Cloud Run but the timeout limits on those don’t give me enough time to run everything that I need to. Any recommendations on what would be best to use here? This is really the only expensive server side task that I need to run, but will be running it fairly often. Is something like App Engine or VM’s in Compute Engine probably my best best? These kinds of services would be a decent amount more expensive than Firebase Functions and Cloud Run right??
Too long a runtime for a v2 cloud function too? Do cloud run jobs have a time limit?
One option, tho maybe overengineered is make a pub/sub subscriber and run it in kubernetes. You can then scale out on queue size or queue depth if you have multiple jobs, and use a combination of pod and node autoscaling to somewhat minimize costs. You can also maybe consider committed use discounts and / or preemptible instances to save money
I think the best approach would be storing the file on GCS, then configure the bucket to send a Pub/Sub message that would receive the Cloud Function, which would then process the image and store it on another GCS Bucket.
Another option would be splitting the job into different Cloud Functions and using something like Cloud Workflows.
This way the solution would be flexible and scalable, at least in my opinion
Use pre-emptible VMs to get the most bang for the book. You can auto-scale workers to process based on the length of a pubsub queue, so it scales down when not in use
Don’t submit the file as the task, store in in GCS and the task has a ref to it
For high compute workloads, especially ones that take time, use standard compute instances, but configure them to self terminate once complete. If your workload is a container, use the compute engine run container option as it makes the workload portable. I have used this for Bigquery analytics workloads that took 11 hours for some jobs. I used cloud run for the short ones (same container) but shorter runtime.
Here is a new service much better suited to your type of workload, launched recently https://cloud.google.com/blog/products/compute/new-batch-service-processes-batch-jobs-on-google-cloud