ANT308-R1: Deep dive into running Apache Spark on Amazon EMR
AWS re:Invent 2019 - En podcast av AWS
![](https://is4-ssl.mzstatic.com/image/thumb/Podcasts115/v4/b4/16/90/b4169036-dd7b-71aa-349b-948aa31d6f71/mza_3327929354172525232.jpg/300x300bb-75.jpg)
Kategorier:
Amazon EMR enables customers to run ETL, machine learning, real-time processing, data science, and low-latency SQL at petabyte scale. We focus this session on running Apache Spark on Amazon EMR. We introduce design patterns such as using Amazon S3 instead of HDFS, running long- and short-lived clusters, using notebooks, and performance-related enhancements. We discuss lowering cost with auto scaling and Spot Instances, and security with encryption and fine-grained access control with AWS Lake Formation.