AWS Glue is a serverless data integration service, which provides fully managed extract, transform, and load (ETL) functionality.
Overview
You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL.
Components
- Source Data Store
- S3, RDS, DynamoDB
- Crawler
- Data Catalog
- Persistent metadata store
- Job
- The business logic to perform the ETL task
- Python or Scala
- Source Data Store -> (Crawler) -> Data Catalog -> (Job) -> Output Data Store
- The business logic to perform the ETL task
- Output Data Store