Kinesis Firehose is a fully managed service that loads streaming data into data stores (S3) and analytic tools (Redshift, Elasticsearch, or Splunk), enabling near real-time analytic with existing business intelligence tools.
Features
- Kinesis Data Streams has shards to store data for some time (data persistence), but Firehose does not hold the data.
- Firehose can batch, compress, transform, and encrypt data before passing it to the destination.
- For example, you can automatically convert the incoming data to columnar formats like Apache Parquet and Apache ORC, before the data is delivered to other data sources like S3.
- Firehose can optionally invoke an AWS Lambda function to transform incoming data before delivering it to destinations.
- But Lambda functions can not be a destination.
How it Works
- Ingest
- Send data to Kinesis Data Firehose
- Sources
- Amazon Kinesis Data Streams
- Amazon Managed Streaming for Apache Kafka (MSK)
- Direct PUT
- via Kinesis Agent
- AWS Services
- Lambda
- CloudWatch Logs, CloudWatch Metric Streams
- SNS
- Custom Applications (SDK)
- Transform
- Optionally transform source records using a Lambda function
- Loading
- Deliver data to a specific destination such as S3 or Redshift
- Sink Types (Destinations): storage/analytic services
- S3
- Amazon OpenSearch
- Redshift
- Data is delivered to S3 first
- And then Redshift COPY command
- 3rd party
- Snowflake, Splunk, NewRelic, MongoDB …
- Any HTTP endpoint
Use Cases
https://aws.amazon.com/kinesis
Kinesis Firehose is used when:
- collecting streaming data and delivering to the destination quickly
- processing is optional, and data retention is not important
- use cases
- capturing data from IoT devices and stream into a data lake
- steaming log data, normalizing via Lambda transformation, save them in S3