About Apache Spark & Scala

Intellipaat Apache Spark and Scala Certification Training Course offer you hands-on knowledge to create Spark applications using Scala programming. It gives you a clear comparison between Spark and Hadoop. The course provides you techniques to increase application performance and enable high-speed processing using Spark RDDs as well as help in customization of Spark using Scala.

Course Objective

 Understand what is Apache Spark and Scala programming
 Understand the difference between Apache Spark and Hadoop
 Learn Scala and its programming implementation
 Implement Spark on a cluster
 Write Spark Applications using Python, Java and Scala
 Understand RDD and its operation along with implementation of Spark Algorithms
 Define and explain Spark Streaming
 Learn about the Scala classes concept and execute pattern matching
 Learn Scala Java Interoperability and other Scala operations
 Work on Projects using Scala to run on Spark applications

Who should take this course?

 Software Engineers looking to upgrade Big Data skills
 Data Engineers and ETL Developers
 Data Scientists and Analytics Professionals
 Graduates looking to make a career in Big Data


There are no prerequisites for taking up this Apache Spark Certification training course. Basic knowledge of database, SQL and query language can help to learn Spark and Scala.

Course Content

Scala Course Content

  • Introduction of Scala
  • Introducing Scala and deployment of Scala for Big Data applications and Apache Spark analytics.

Pattern Matching

  • The importance of Scala, the concept of REPL (Read Evaluate Print Loop), deep dive into Scala pattern matching, type interface, higher order function, currying, traits, application space and Scala for data analysis.

Executing the Scala code

  • Learning about the Scala Interpreter, static object timer in Scala, testing String equality in Scala, Implicit classes in Scala, the concept of currying in Scala, various classes in Scala.

Classes concept in Scala

  • Learning about the Classes concept, understanding the constructor overloading, the various abstract classes, the hierarchy types in Scala, the concept of object equality, the val and var methods in Scala.

Case classes and pattern matching

  • Understanding Sealed traits, wild, constructor, tuple, variable pattern, and constant pattern.

Concepts of traits with example

  • Understanding traits in Scala, the advantages of traits, linearization of traits, the Java equivalent and avoiding of boilerplate code.

Scala java Interoperability

  • Implementation of traits in Scala and Java, handling of multiple traits extending.

Scala collections

  • Introduction to Scala collections, classification of collections, the difference between Iterator, and Iterable in Scala, example of list sequence in Scala.

Mutable collections vs. Immutable collections

  • The two types of collections in Scala, Mutable and Immutable collections, understanding lists and arrays in Scala, the list buffer and array buffer, Queue in Scala, double-ended queue Deque, Stacks, Sets, Maps, Tuples in Scala.

Use Case bobsrockets package

  • Introduction to Scala packages and imports, the selective imports, the Scala test classes, introduction to JUnit test class, JUnit interface via JUnit 3 suite for Scala test, packaging of Scala applications in Directory Structure, example of Spark Split and Spark Scala.

Spark Course Content

  • Introduction to Spark
  • Introduction to Spark, how Spark overcomes the drawbacks of working MapReduce, understanding in-memory MapReduce,interactive operations on MapReduce, Spark stack, fine vs. coarse grained update, Spark stack,Spark Hadoop YARN, HDFS Revision, YARN Revision, the overview of Spark and how it is better Hadoop, deploying Spark without Hadoop,Spark history server, Cloudera distribution.

Spark Basics

  • Spark installation guide,Spark configuration, memory management, executor memory vs. driver memory, working with Spark Shell, the concept of Resilient Distributed Datasets (RDD), learning to do functional programming in Spark, the architecture of Spark.

Working with RDDs in Spark

  • Spark RDD, creating RDDs, RDD partitioning, operations & transformation in RDD,Deep dive into Spark RDDs, the RDD general operations, a read-only partitioned collection of records, using the concept of RDD for faster and efficient data processing,RDD action for Collect, Count, Collectsmap, Saveastextfiles, pair RDD functions.

Aggregating Data with Pair RDDs

  • Understanding the concept of Key-Value pair in RDDs, learning how Spark makes MapReduce operations faster, various operations of RDD,MapReduce interactive operations, fine & coarse grained update, Spark stack.

Writing and Deploying Spark Applications

  • Comparing the Spark applications with Spark Shell, creating a Spark application using Scala or Java, deploying a Spark application,Scala built application,creation of mutable list, set & set operations, list, tuple, concatenating list, creating application using SBT,deploying application using Maven,the web user interface of Spark application, a real world example of Spark and configuring of Spark.

Parallel Processing

  • Learning about Spark parallel processing, deploying on a cluster, introduction to Spark partitions, file-based partitioning of RDDs, understanding of HDFS and data locality, mastering the technique of parallel operations,comparing repartition & coalesce, RDD actions.

Spark RDD Persistence

  • The execution flow in Spark, Understanding the RDD persistence overview,Spark execution flow & Spark terminology, distribution shared memory vs. RDD, RDD limitations, Spark shell arguments,distributed persistence, RDD lineage,Key/Value pair for sorting implicit conversion like CountByKey, ReduceByKey, SortByKey, AggregataeByKey

Spark Streaming & Mlib

  • Spark Streaming Architecture, Writing streaming programcoding, processing of spark stream,processing Spark Discretized Stream (DStream), the context of Spark Streaming, streaming transformation, Flume Spark streaming, request count and Dstream, multi batch operation, sliding window operations and advanced data sources. Different Algorithms, the concept of iterative algorithm in Spark, analyzing with Spark graph processing, introduction to K-Means and machine learning, various variables in Spark like shared variables, broadcast variables, learning about accumulators.

Improving Spark Performance

  • Introduction to various variables in Spark like shared variables, broadcast variables, learning about accumulators, the common performance issues and troubleshooting the performance problems.

Spark SQL and Data Frames

  • Learning about Spark SQL, the context of SQL in Spark for providing structured data processing, JSON support in Spark SQL, working with XML data, parquet files, creating HiveContext, writing Data Frame to Hive, reading JDBC files, understanding the Data Frames in Spark, creating Data Frames, manual inferring of schema, working with CSV files, reading JDBC tables, Data Frame to JDBC, user defined functions in Spark SQL, shared variable and accumulators, learning to query and transform data in Data Frames, how Data Frame provides the benefit of both Spark RDD and Spark SQL, deploying Hive on Spark as the execution engine.

Scheduling/ Partitioning

  • Learning about the scheduling and partitioning in Spark,hash partition, range partition, scheduling within and around applications, static partitioning, dynamic sharing, fair scheduling,Map partition with index, the Zip, GroupByKey, Spark master high availability, standby Masters with Zookeeper, Single Node Recovery With Local File System, High Order Functions.

Call Now- +91-921-276-0556

Send a Query

Tai Infotech Pvt Ltd, 2017 All Rights Reserved