Shrestha Rajat

Search

Search IconIcon to open search

Last updated Jul 9, 2023 Edit Source

# EMR

#aws #cloud #bigdata #data-analytics

A managed cluster platform by AWS for Apache Hadoop and Apache Spark and used mostly for processing data for analytics, BI or to transform data by performing ETL functions.

# EMR integrations

EC2 instances are used for running the nodes in the cluster. VPC is used to for networking S3 is used for storing input/output data CloudWatch for monitoring cluster IAM for managing permissions CloudTrail for auditing AWS DataPipeline to schedule and start clusters Lake Formation To create data in S3 Data Lake

# Use cases:

  1. For loading data into Redshift for Data Warehousing BI analytics.
  2. For loding into S3 after performing a ETL transformation for analytics.