Skip to main content

Learn

  • Frame your Big Data problems as Apache Spark jobs
  • Set up the development environment for Scala and Apache Spark
  • Develop efficient Spark applications using Scala
  • Build and deploy Spark jobs on Hadoop clusters
  • Process real-time streams of data using Spark Streaming
  • Query your structured data using SparkSQL and work with the DataSets API
  • Analyze and process graph structures using Spark’s GraphX module

About

“Big data" analysis is a hot and highly valuable skill – and this course will teach you the hottest technology in big data: Apache Spark. Employers including Amazon, EBay, NASA JPL, and Yahoo all use Spark to quickly extract meaning from massive data sets across a fault-tolerant Hadoop cluster. You'll learn those same techniques, using your own Windows system right at home. It's easier than you might think, and you'll be learning from an ex-engineer and senior manager from Amazon and IMDb.

Learn and master the art of framing data analysis problems as Spark problems through over 20 hands-on examples, and then scale them up to run on cloud computing services in this course.

This course is very hands-on; you'll spend most of your time following along with the instructor as we write, analyze, and run real code together – both on your own system, and in the cloud using Amazon's Elastic MapReduce service. 7.5 hours of video content is included, with over 20 real examples of increasing complexity you can build, run and study yourself. Move through them at your own pace, on your own schedule. The course wraps up with an overview of other Spark-based technologies, including Spark SQL, Spark Streaming, and GraphX.

Features

  • Understand the fundamentals of Scala and the Apache Spark ecosystem
  • Handle large streams of data with Spark Streaming and perform Machine Learning in real time with Spark MLlib
  • Comprehensive tutorial packed with practical examples to help you develop real-world Big Data applications with Spark with Scala

Course Length : 7 hours 23 minutes

ISBN : 9781787129849

Requirements

Add information about the skills and knowledge students need to take this course.

Author

Frank Kane

Frank Kane - Frank spent 9 years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to hundreds of millions of customers, all the time. Frank holds 17 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, Frank left to start his own successful company, Sundog Software, which focuses on virtual reality environment technology, and teaching others about big data analysis.

Frequently Asked Questions

What web browser should I use?

The Open edX platform works best with current versions of Chrome, Edge, Firefox, Internet Explorer, or Safari.

See our list of supported browsers for the most up-to-date information.

respond
hours per week
respond
Free
respond
RPS
respond
en

Share this course

Categories

Data Science(241)

Coding and Tools(37)

Admin and Cloud(380)

DevOps(78)

Programming(631)

Application Development(754)

Web Development(547)

Big Data and Analytics(709)

Soft Skills(19)

Network Security & Infrastructure(285)

Process Concepts(8)

Database(80)

Business Intelligence(22)

I've read enough.Take me to RPS