Problem: Executors of scala Spark applications are failing periodically and the number of driver pod restarts is directly connected to the number of failed executors.
Reason: Not found yet. Error is OOM, but reason of that isn’t clear.
Status: In progress
Solution: Not found yet. Investigating the real reason for executors failing. Actions that should be taken depend on real reason.
Preventing steps: -