[AWS] Kinesis Data Firehose

Kinesis Firehose is a fully managed service that loads streaming data into data stores (S3) and analytic tools (Redshift, Elasticsearch, or Splunk), enabling near real-time analytic with existing business intelligence tools.


Features

  • Kinesis Data Streams has shards to store data for some time (data persistence), but Firehose does not hold the data.
  • Firehose can batch, compress, transform, and encrypt data before passing it to the destination.
    • For example, you can automatically convert the incoming data to columnar formats like Apache Parquet and Apache ORC, before the data is delivered to other data sources like S3.
  • Firehose can optionally invoke an AWS Lambda function to transform incoming data before delivering it to destinations.
    • But Lambda functions can not be a destination.

How it Works

  • Ingest
    • Send data to Kinesis Data Firehose
    • Sources
      • Amazon Kinesis Data Streams
      • Amazon Managed Streaming for Apache Kafka (MSK)
      • Direct PUT
        • via Kinesis Agent
        • AWS Services
          • Lambda
          • CloudWatch Logs, CloudWatch Metric Streams
          • SNS
        • Custom Applications (SDK)
  • Transform
    • Optionally transform source records using a Lambda function
  • Loading
    • Deliver data to a specific destination such as S3 or Redshift
  • Sink Types (Destinations): storage/analytic services
    • S3
    • Amazon OpenSearch
    • Redshift
      • Data is delivered to S3 first
      • And then Redshift COPY command
    • 3rd party
      • Snowflake, Splunk, NewRelic, MongoDB …
    • Any HTTP endpoint

Use Cases

https://aws.amazon.com/kinesis

Kinesis Firehose is used when:

  • collecting streaming data and delivering to the destination quickly
  • processing is optional, and data retention is not important
  • use cases
    • capturing data from IoT devices and stream into a data lake
    • steaming log data, normalizing via Lambda transformation, save them in S3

Leave a Comment