[AWS] AWS Glue

AWS Glue is a serverless data integration service, which provides fully managed extract, transform, and load (ETL) functionality.

Overview

You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL.


Components

  • Source Data Store
    • S3, RDS, DynamoDB
  • Crawler
  • Data Catalog
    • Persistent metadata store
  • Job
    • The business logic to perform the ETL task
      • Python or Scala
    • Source Data Store -> (Crawler) -> Data Catalog -> (Job) -> Output Data Store
  • Output Data Store

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s