Kinesis is a fully managed, scalable, and resilient streaming service. It is designed to ingest a large amount of data in real-time from many producers.
- Streaming Data: online purchase, stock prices, social network data, geospatial data, or game data
- Use Kinesis (rather than SQS) when you need many producers/consumers.
- Kinesis consists of a couple of services:
- Video Streams
- Data Streams
- Data Firehose
- Data Analytics.
Kinesis Data Streams
Kinesis Data Streams collects and processes a large amount of incoming data from an unlimited number of producers.
- Producers supply data to Kinesis, e.g., any IoT (Internet of Things) devices.
- Consumers are any entity that can consume the data.
- Kinesis Data Streams are used:
- Real-time analytics or feed data into other services in real-time with data retention.
- e.g.) analyze logs continuously or run real-time analytics on click system data
- By default, Lambda invokes your function as soon as records are available in the stream. Lambda can process up to 10 batches in each shard simultaneously. If you increase the number of concurrent batches per Shard, Lambda still ensures in-order processing at the partition-key level.
- Transient Data Store:
- Streams are deleted based on their rolling retention window. (24-hour default; can be increased to 7 days)
- Kinesis Data Stream provides an ordering of records.
- Kinesis Shards
- Shards are the capacity of a Kinesis Stream.
- Each shard has the same partition key, but the data are divided by the sequence number.
- Allows streams to scale. A stream starts with at least 1 shard (1 MB of ingestion and 2 MB of consumption capacity per second). Shards can be added or removed from streams.
- Kinesis Data Record
- Data record is the basic entity. Each shard consists of a sequence of data records.
- Data records are composed of a sequence number, a partition key, and a data blob. Data blob can be up to 1MB.
Interacting with Kinesis Data Streams
- Kinesis Producer Library (KPL) passes data to Kinesis Data Stream.
- KPL provides the efficient abstraction layer for ingesting data with automatic retry and better performance
- Kinesis Client Library (KCL) delivers all records for a given partition key to the same record processor, making it easier to build multiple applications reading from the same Kinesis data stream.
- Kinesis API (AWS SDK) is used to interact with Kinesis Data Stream through love level API operations such as (PutRecord or GetRecords).
Kinesis Data Firehose
Kinesis Firehose is a fully managed service that loads streaming data into data stores (S3) and analytics tools (Redshift, Elasticsearch, or Splunk).
- Kinesis has shards to store data for some time (data persistence), but Firehose does not save the data.
- Firehose can optionally invoke an AWS Lambda function to transform incoming data before delivering it to destinations. But Lambda functions can not be a destination. The destinations are storage/analytic services.
- Kinesis Firehose is used when:
- collect streaming data and deliver to the destination quickly
- processing is optional, and data retention is not important
- e.g.) capturing data from IOT devices and stream into a data lake
Kinesis Video Streams
Kinesis Video Streams securely stream video (audio or images) from connected devices to AWS for analytics, machine learning, and other processing.
Use Case: Amber Alert System
Kinesis Data Analytics
Kinesis Data Analytics analyzes streaming data and gains actionable insights in real-time.
- You can provide the SQL command against the input stream, and the result is passed to the destination in real-time.
- When to use:
- Run SQL queries on streaming data and provides insight on your data.
- Create metrics, dashboard, notification, and alarms
- e.g.) send real-time alarms when certain metrics reach the predefined threshold
Use Case: Click System Analysis