[AWS] Athena

Athena is an interactive query service that runs ad-hoc SQL queries (schema-on-read) directly against data stored in S3.


Features

Athena can query many forms of data such as AWS logs, JSON, or CSV in S3.

  • Athena also supports open-source columnar formats such as Apache ORC and Apache Parquet.
  • Athena can query a variety of service logs (e.g. CloudFront, CloudTrail, VPC flow, or ELB)

The main benefit of the Athena is that it is easy to use. Simply put your data in S3, define the schema, and start querying using standard SQL without using complex extract, transform, and load (ETL) tools.

  • Tables are defined in a data catalog.
  • Supports the dynamic schema with each query execution
  • Athena does not modify any data and sends output to visualization tools.

Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

  • Pay per query and storage costs.
    • It fits for ad-hoc queries rather than frequent queries.

Athena is integrated with AWS Glue Data Catalog. You can create a unified metadata repository across various services.


Athena vs. Macie

  • Athena is a query service used to query log files and generate business reports on data stored in S3.
  • Macie is a security service used to identify and protect sensitive data – PII (Personally Identifiable Information) – stored in S3 using machine learning and NLP (Natural Language Processing).

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s