Hi All,
Can anyone please help me to do the spark streaming graceful shutdown using pyspark or scala. I am reading data from Kafka and writing into GCS bucket and this suppoose to be stop after 1hr but it’s not happening so trying to do some workaround for the same.
Thanks!
Which version of Spark are you currently utilizing?
Does it support Structured Streaming?
I am using 3x version
Where Do you run Spark?
Do you use any connectors (DataBricks).
Or you consuming data from Kafka topic manually PySpark/Scala processor application?
I am running in Dataproc cluster and just running pyspark by executing “spark submit “
If you have the capability to monitor Kafka messages, you can keep an eye on the progression of the offset.
When you observe that the offset is no longer advancing, it’s a reasonable indication that there are no new messages arriving in the Kafka topic.
This cessation of message flow suggests that the current “batch” of data has concluded, allowing you to smoothly and efficiently terminate the job.