Posts

Showing posts from July, 2021

Apache Spark Features

  In-memory computation Distributed processing using parallelize Can be used with many cluster managers (Spark, Yarn, Mesos e.t.c) Fault-tolerant Immutable Lazy evaluation Cache & persistence Inbuild-optimization when using DataFrames Supports ANSI SQL