Totango Engineering

Standalone Spark Deployment for Stability and Performance

Main tips for using Spark Standalone

See full details in the presentation below

  • To make cores fill up workers before using other free workers, use spark.deploy.spreadOut=false

  • To make worker clean up directories of completed applications, use spark.worker.cleanup.enabled=true

  • To use External Shuffle Service, use spark.shuffle.service.enable=true and run sbin/start-shuffle-service.sh in each worker node

  • You can use LogStash to gather all application logs into once central logs, which will also exist after the the application folder has be cleared

  • Spark exposes Codahale metrics which can be consumed through Graphite-compatible systems

  • When using AWS Auto Scaling Groups, use Termination Protection to avoid shutting down active Spark Workers, by checking ps -ef | grep executor | grep spark | wc -l

  • Set unix niceness of Spark processes, to use all cpu cores but not block system network resources

Huge thanks to Noam Gazit for originally building many of these OPS integrations!


Romi Kuntsman