does anyone know in the GCP dataflow templates cloud_pubsub_to_splunk what the max exponential backoff defaults to, and if there’s a way to override it to say 2h? (… is there any way to view the current backoff? I don’t see anywhere). I see it might be called https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/9bdd96503ed969178280600c61b82753f1cd7da2/it/google-cloud-platform/src/main/java/org/apache/beam/it/gcp/pubsub/PubsubResourceManager.java#L392|FAILSAFE_RETRY_MAX_DELAY and 60s in the code for pubsub but … is there also an exponential backoff in the sending of data to splunk? I have about 90 jobs and I’m seeing that jobs just all stopped responding 2/11 or 2/12 when the HEC tokens were disabled, but they’re now re-enabled and it should’ve resumed processing but the logs just don’t show any activity at all
there is also FAILSAFE_MAX_RETRIES=5 … but again this is just for receiving from pubsub — that wasn’t where the errors are
DEFAULT_MAX_ELAPSED_TIME_MILLIS is 15 minutes default so :shrug:. I am gonna just open a case and see what they say
so I think the issue is the Java ExponentialBackOff library has a max retry period — BUT what that means is once the exponential backoffs grow to be that time period (15m default), it won’t create an exponential backoff that’s bigger, it just stops retrying. And then you have to handle any further restarting of that retry in your code as if it’s a new call