I am bit new to EMR and trying to get familiarize with its concepts. Appreciate your inputs.
Question How do we ensure the high availability with EMR master node when automatic fail over is not supported inherently ?
I could be off-base here, just because I’ve never used EMR for a workload that requires HA for Hadoop, but: it’s always felt to me that EMR was best suited for workloads where HA isn’t a requirement. I’ve always just assumed that if you truly wanted HA, you were better off swallowing the cost of operating your own cluster.
Happy to hear other folks have other opinions, though!
yeah, agreed — EMR isn’t really for permanent Hadoop clusters (I’m guessing you want a queryable Hive store or something similar)
it is very much built around “job flows”