Course Outline

Day 1: Data Processing and Python Essentials 

Session 1: Spark DataFrames and Basic Operations 

  • Working with Spark DataFrames Implementing Basic Operations
  • Groupby and Aggregate Operations
  • Handling Timestamps and Dates
  • Hands-on Exercise: Data analysis using Spark DataFrames 

Session 2: Python Programming for Big Data 

  • Core Python for Data Handling Using Variables, Lists, and Functions
  • Working with Classes and Files
  • Integrating APIs and External Data
  • Hands-on Exercise: Building a Python project that processes and analyzes data with PySpark 

Day 2: Advanced PySpark and Machine Learning 

Session 3: Machine Learning with PySpark 

  • Implementing Machine Learning with Spark MLlib Linear and Logistic Regression
  • Random Forest Classification Models
  • Hands-on Exercise: Building and evaluating machine learning models using PySpark 

Session 4: Clustering and Recommender Systems 

  • K-means Clustering Theory and Practical Implementation
  • Hands-on Exercise: Building a K-means clustering model
  • Recommender Systems Building a recommendation engine with Spark MLlib
  • Hands-on Exercise: Recommender system project 

Session 5: Spark Streaming and NLP 

  • Real-Time Data Streaming with Spark Implementing real-time data processing
  • Hands-on Exercise: Streaming data with Spark
  • Natural Language Processing (NLP) with PySpark Implementing basic NLP tasks
  • Hands-on Exercise: NLP pipeline using PySpark 
 14 Hours

Testimonials (1)

Related Categories