Kinesis Firehose is a fully managed service that loads streaming data into data stores (S3) and analytics tools (Redshift, Elasticsearch, or Splunk), enabling near real-time analytics with existing business intelligence tools.
Features
- Kinesis Data Streams has shards to store data for some time (data persistence), but Firehose does not save the data.
- Firehose can batch, compress, transform, and encrypt the data before loading it.
- For example, you can automatically convert the incoming data to columnar formats like Apache Parquet and Apache ORC, before the data is delivered to other data sources like S3.
- Firehose can optionally invoke an AWS Lambda function to transform incoming data before delivering it to destinations. But Lambda functions can not be a destination. The destinations are storage/analytic services.
How it Works
- Ingest
- Send data to Kinesis Data Firehose
- Sources
- Amazon Kinesis Data Streams
- Direct PUT
- Kinesis Agent
- AWS Services
- Custom Applications (SDK)
- Transform
- Optionally transform source records using AWS Lambda functions
- Loading
- Deliver data to a specific destination such as S3 or Redshift
Use Cases
https://aws.amazon.com/kinesis
Kinesis Firehose is used when:
- collecting streaming data and delivering to the destination quickly
- processing is optional, and data retention is not important
- e.g.) capturing data from IoT devices and stream into a data lake