Trying to evaluate the best execution environment in Google Cloud for R-Script execution. We have a data processing pipeline that uses a GCE VM with 4 CPU cores and 16 GB memory. The execution is getting slower with each run and from the top command it seems to be utilizing only one core and consuming 100% CPU. The execution seems to be compute intensive I believe and could require more CPU power than the existing one and memory in future.
check here https://cloud.google.com/compute/docs/cpu-platforms Maybe you could change the machine type to utilize a machine with higher single core turbo. If the time is growing though you’re just going to buy a little time with this if it helps at all. You should focus on improving the performance of this step if possible or parallelizing it
Thanks . I used Haswell processor which is listed higher in the single core type but didn’t help
I’m not an R expert, but I believe it can be challenging to get R to full utilise SMP. Have you investigated the running process, are using see multiple threads? If it looks single threaded, this is where you should focus to get more performance. Or as pointed out, parallelising (scale out) the workload.
We never thought the R-Script will come in our way . It is a simple R-Script (no calls to BQ, GCS etc). The heavy-lifting is done by a Python Script that copies the input files from GCS to a directory on the VM and size of these files is around 75MB. The output of the R-Script is also written to a directory on the VM. We are using a decent configuration for the VM (n1-standard-4).