Amazon Kinesis Data Streams is a real-time streaming solution for data that is extremely scalable and long-lasting. KDS can gather gigabytes per second from tens to thousands of sources including website clickstreams and database event streams. It also supports financial transactions, social media feeds, IT logs, location-tracking events, and financial transactions. The data is acquired in milliseconds and allows for real-time dashboards, real time anomaly detection, dynamic pricing and other real-time analytics uses cases.
Data records are read from a data stream using a typical Kinesis Data Streams app. These apps can be used on Amazon EC2 instances. The processed records can be used to create dashboards and alerts, adjust pricing and advertising strategies dynamically, and to transfer data to other AWS services.
Kinesis Data Streams include Kinesis Data Firehose and Kinesis Video Streams. Kinesis Data Analytics is part of the Kinesis streaming platform.
How AWS Kinesis Data Streams Work?
Source: AmazonKinesis Data Streams Architecture High-Level:
The diagram below shows the high-level architecture for Kinesis Data Streams. Kinesis Data Streams receives data from producers on a regular basis. Consumers then process the data in real-time. AWS services such as Amazon DynamoDB and Amazon Redshift can be used by consumers (such a custom application running on Amazon EC2/Amazon Kinesis Data Firehose delivery streams) to store their findings.
Source: AmazonWo utilises Amazon Kinesis Data Streams?
Kinesis Data Streams can be used to gather and aggregate data in real time. You can use IT infrastructure log data, application data logs, social media, market and clickstream data as examples of the data you may use. The processing of data intake and processing takes place in real-time. This makes it often very minimal.
Where can I use Kinesis Data Streams
These are some examples of how to use Kinesis Data Streams.
Producers can send data into a stream immediately using accelerated log and data feed intake and processing. For example, push system and application logs can be sent immediately to a stream and are ready for analysis in seconds. The log data will not be lost if the application server or front end fails. Kinesis Data Streams makes it possible to receive data faster because you don’t batch data on servers before submitting it for intake.
Real-time metrics, reportingData via Kinesis Data Streams allows for real-time data analysis. Instead of waiting for data to arrive in batches, your data-processing software may start analyzing and reporting on system and application logs.
Real-time data analyticsThe power of parallel processing is combined with the value real-time information in this way. You can process real-time clickstreams from websites and then measure site engagement using many Kinesis Data Streams applications that are running in parallel, such as the Kinesis Data Streams app.
Complex stream processingKinesis data streams apps and data streams can be transformed into Directed Acyclic graphs (DAGs). This involves combining data from multiple Kinesis Data Streams applications into a single stream that can be processed later by another Kinesis Data Streams app.
What are the benefits of using Amazon Kinesis Data Streams
Kinesis Data Streams can be used to address a wide range streaming data challenges. One popular use is real time data aggregation, followed by loading the aggregated data into a map-reduce cluster or data warehouse.
Kinesis data streams store data to ensure its flexibility and longevity. It takes less than 1 second for a record (put-to get latency) to be added to the stream and for it retrieved. Kinesis Data Streams applications may start consuming data from the stream almost immediately after adding.
Multiple Kinesis Data Streams apps may ingest data from a stream. This allows for parallel and different tasks like archiving or processing.
Important Kinesis Data Streams Terminology
1. Kinesis Data Stream
A Kinesis data stream is a collection of shards. Each shard contains a succession of data records. Kinesis Data Streams assigns a number to each data record.
2. Data Record
A data record is the smallest unit in a Kinesis data stream. A data record is made up of a sequence number, a partition code, and a blob. This is an immutable sequence containing bytes. Kinesis Data Streams cannot inspect, interpret, or change the data contained in the blob. A data blob may be as large as 1 megabyte.
3. Retention Period
The retention period is the time that data records can be accessed after being introduced to the stream. A stream’s default retention period is 24 hours from its creation. The IncreaseStreamRetentionPeriod operation may extend the retention duration up to 8760 hours (365 days), while the DecreaseStreamRetentionPeriod operation can reduce the retention period to a minimum of 24 hours. Additional fees may apply to streams with a retention period exceeding 24 hours. For more information, please see Amazon Kinesis Data Streams Pricing.
4. Producer
A Producer populates Amazon Kinesis Data Streams (with records) with records. A producer is a web server that sends log data to stream. Amazon Kinesis Data Streams records are received by consumers and processed by them. These consumers are known as Amazon Kinesis Data Streams Application.
5. Amazon Kinesis Data Streams Application
An Amazon Kinesis Data Streams consumer is an Amazon Kinesis Data Streams app. This stream is often operated on a fleet EC2 instances. There are two types of customers you can produce: enhanced fan-out and shared fan-out. To learn more about the differences and how to build each type of consumer, see Reading Data from Amazon Kinesis Data Streams.
The output of a Kinesis Data Streams app may be used to input another stream.