A data warehouse is a data storage solution that aggregates a massive amount of historical data. It supports querying, reporting, analytics, and business intelligence (BI).
Redshift is a petabyte-scale data warehousing solution (Business Intelligence).
Amazon Redshift is a cloud data warehouse. It makes it fast, simple, and cost-effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools. It allows you to run complex analytic queries against terabytes to petabytes of structured and semi-structured data.
It uses sophisticated query optimization, columnar storage on high-performance storage, and massively parallel query execution. Most results come back in seconds.
- Column based database; Data can be loaded from S3.
- Not for transactional-based processing, which RDS can support.
- Not for unstructured data, which NoSQL can support.
- MPP (Massively Parallel Processing)
- Use Cases
- Used for OLAP (OnLine Analytics Processing) workloads (retrieval and analytics).
- Data Consolidation: consolidate multiple data sources for reporting.
- Single Node
- Multi Nodes: Leader node + Compute node
- Kinesis can inject data into Redshift. Redshift is used for analyzing summarized data from other services such as Kinesis or RDS.
- Charges: Compute Node Hours (no charge for leader node hours), Backups, Data Transfer inside VPC
- Security: SSL in transit, AES-256 encryption at rest
- Backups: You can create snapshots in inter-AZs and even inter-regions.
Redshift Spectrum is a feature of Amazon Redshift.
- Spectrum queries and analyzes data in S3 using the open data formats you already use, with no data loading or transformations (ETL).
Redshift Enhanced VPC Routing
With Enhanced VPC Routing enabled, Redshift forces all COPY and UNLOAD traffic between the cluster and data repositories through the Amazon VPC. If Enhanced VPC Routing is not enabled, Redshift routes traffic through the internet.
By using Enhanced VPC Routing, you can use standard VPC features, such as security groups, network access control lists (ACLs), VPC endpoints, internet gateways, and Domain Name System (DNS) servers.
You can also monitor COPY and UNLOAD traffic using VPC Flow Logs.