[AWS] Kinesis Data Firehose

Kinesis Firehose is a fully managed service that loads streaming data into data stores (S3) and analytics tools (Redshift, Elasticsearch, or Splunk), enabling near real-time analytics with existing business intelligence tools.

Features

  • Kinesis Data Streams has shards to store data for some time (data persistence), but Firehose does not save the data.
  • Firehose can batch, compress, transform, and encrypt the data before loading it.
    • For example, you can automatically convert the incoming data to columnar formats like Apache Parquet and Apache ORC, before the data is delivered to other data sources like S3.
  • Firehose can optionally invoke an AWS Lambda function to transform incoming data before delivering it to destinations. But Lambda functions can not be a destination. The destinations are storage/analytic services.

How it Works

  • Ingest
    • Send data to Kinesis Data Firehose
    • Sources
      • Amazon Kinesis Data Streams
      • Direct PUT
        • Kinesis Agent
        • AWS Services
        • Custom Applications (SDK)
  • Transform
    • Optionally transform source records using AWS Lambda functions
  • Loading
    • Deliver data to a specific destination such as S3 or Redshift

Use Cases

https://aws.amazon.com/kinesis

Kinesis Firehose is used when:

  • collecting streaming data and delivering to the destination quickly
  • processing is optional, and data retention is not important
  • e.g.) capturing data from IoT devices and stream into a data lake

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s