# EMR
#aws #cloud #bigdata #data-analytics
A managed cluster platform by AWS for Apache Hadoop and Apache Spark and used mostly for processing data for analytics, BI or to transform data by performing ETL functions.
# EMR integrations
EC2 instances are used for running the nodes in the cluster. VPC is used to for networking S3 is used for storing input/output data CloudWatch for monitoring cluster IAM for managing permissions CloudTrail for auditing AWS DataPipeline to schedule and start clusters Lake Formation To create data in S3 Data Lake
# Use cases:
- For loading data into Redshift for Data Warehousing BI analytics.
- For loding into S3 after performing a ETL transformation for analytics.