Snowflake vs Redshift.
We compare them as of July 2020 on 7 key aspects.
And here is our verdict:
“Both platforms are great cloud data warehouses and you cannot go too wrong with either of them. Snowflake is easier to maintain and handles JSON better. Amazon Redshift is easier to integrate in an AWS environment.”
Before we jump into our snowflake vs redshift analysis that led to the verdict, let us examine why cloud data warehouses have become very important. Data has become essential for businesses. More and more key business decisions are being data-driven. To understand, analyze, and utilize the data, data warehouses have become critical business solution tools. Enterprises looking to modernize their data warehousing infrastructure are increasingly looking at cloud-based, horizontally scalable data warehouses.
Once a decision has been made to go for a data warehouse, then the next step is to decide what data warehouse to choose. We choose to compare Redshift vs Snowflake because both these platforms have transformed the volume, speed, and quality of business understanding and key-decision making. Choosing one over the other doesn’t necessarily mean that one service is better than the other, it is identifying which solution makes the most sense for your business.
What is Amazon Redshift
RedShift is defined as a fully-managed, cloud-ready petabyte-scale data warehouse service that can be seamlessly integrated with business intelligence tools. Here you can just start with a few hundred gigabytes of data and scale up or down according to your requirements.
Redshift is a great cloud data warehouse, in a way it kicked off the trend to migrate to cloud data warehouses.
What is Snowflake
Snowflake is a powerful RDBMS that offers an analytic data warehouse for both structured and semi-structured data as a Software-as-a-Service (SaaS) model. This means that Snowflake uses an SQL design and architecture that was specifically designed for the cloud making it fast, user-friendly and more flexible than a traditional data warehouse.
Redshift vs Snowflake
If you have used both Snowflake and Redshift you would have noticed far more similarities than differences. However, there are a few additional features and functions that come with each platform. Hence, we compare them on seven key aspects.
Snowflake vs Redshift: The Comparison
1. Integration options
Winner: It depends
If your organization is already working with AWS then it makes Redshift a better option. This is primarily because Redshift can integrate seamlessly with other AWS services like Glue, Athena, CloudWatch etc.
While you can find Snowflake on the AWS marketplace with on-demand functions, it’s important to note that it is challenging to integrate Snowflake with AWS data warehouse tools like Athena and Glue.
However, if you are worried about vendor lock-in, Snowflake has a variety of integration options for tools outside the AWS marketplace such as Apache Spark, IBM Cognos, Tableau etc. One of the critical reviews on Gartner writes about the integration aspects of Snowflake.
When comparing Redshift vs Snowflake we see that Snowflake is completely automated and does not require any maintenance activity from the user.
For Redshift, regular Vacuum/Analyse tables are required. This can become a huge headache with very large tables. Snowflake manages this out of the box.
In Redshift, all the users need to compete over the same resources. WLM queues are used to manage this. Snowflake allows you to simply start warehouses of different sizes that look at the same data, without copying any data, and allocate them to different users. One of the reviews on Gartner highlights the aspect of the need for high maintenance of Redshift.
3. Snowflake vs Redshift Performance
Winner: It depends
Both Snowflake and Redshift use columnar storage and massively parallel processing (MPP) for simultaneous computation. However, the key difference is in Snowflake the storage layer is separated from the compute layer in order to optimize their individual performance. This change in architecture makes it difficult to declare a clear winner when it comes to performance.
Snowflake has a lower query run time on unoptimized queries. Redshift query runtime for unoptimized queries improves if the same query is run much more frequently, this is because Redshift compiles code for the query and cache it. The compiled code is shared across sessions in a cluster, so the same query with different parameters in the different sessions will run faster because of no overhead. Also, in Redshift, we can tune our queries and data structures by using the SORTKEY and DISTKEY, which would improve run times for queries involving WHERE and JOIN.
Snowflake services layer does all the query planning and query optimization based on data profiles that are collected automatically as the data is loaded. It automatically collects and maintains the required statistics to determine how to most effectively distribute the data and queries across the available compute nodes.
So, no indexes, no need to figure out partitions and partition keys, no need to pre-shard any data for even distribution, and no need to remember to update statistics. Snowflake also offers a clause called PARTITION BY which helps with WHERE clause, but there are no ways to optimize JOIN in Snowflake.
Actual performance figures across various kinds of loads are hard to come by, so it is hard to say which of these products is a winner in performance.
4. Scaling Capabilities
One key advantage of going for cloud-based data warehouses is the ability to scale. Hence the ease of scalability is a major factor in choosing the platform.
Due to its unique architecture of separating the compute layer and storage layer, Snowflake has a clear advantage in terms of scale. Hence if the query load increases then we can scale the compute layer independently. Thus Snowflake offers instant scaling without the need for redistribution of data. Its auto concurrency allows users to set a minimum and maximum cluster size and the clusters will scale automatically based on the demands.
Redshift can also scale but not as instantly as Snowflake. It takes anywhere between a few minutes to a few hours to add new nodes to its clusters. In December 2019, AWS introduced RA3 Nodes which allows Redshift to scale and pay for computing and storage independently, so some of these scaling limitations may not be as cumbersome as in the past.
5. Data Structure
When we compare Redshift vs Snowflake when it comes to the data structure. Snowflake wins. Snowflake works well with multiple structured and semi-structured data formats such as JSON.
Redshift follows pure columnar storage format and is optimized for structure data storage making it hard to query JSON data. AWS recommends minimal usage of JSON “We recommend using JSON sparingly. JSON is not a good choice for storing larger datasets because, by storing disparate data in a single column, JSON does not leverage Amazon Redshift’s column store architecture.”
6. Security and Compliance
Redshift has end-to-end security which can be tailored to fit your requirements. Integration with AWS CloudTrail provides auditing to help meet compliance.
Snowflake offers different editions based on varying levels of security. Hence you can opt for a lower edition if full security is not required.
Winner: It depends
Snowflake prices Compute and Storage separately. Snowflake offers a dynamic pricing model – clusters will stop when no queries are running and automatically resume when they are, and they can flexibly resize themselves based on a changing workload. This can potentially save you money when query load decreases.
Originally, Redshift did not price Compute and Storage separately. In December 2019, Amazon launched RA3 nodes that separate compute and storage pricing. Amazon also offers other pricing models.
Comparison Summary – Snowflake vs Redshift
I put together a summary of the comparison of Redshift vs Snowflake to help you decide.
[table id =1/]
Some aspects on which there seems to be a more perceptible difference include:
- Maintenance: Redshift requires a more hands-on approach in terms of maintenance as the likes of data vacuuming and compression are not automated. Snowflake automates these tasks making it easier to maintain.
- Integration: If you plan to use AWS components in different modules in your architecture then Redshift makes it easier to integrate. But by splitting computation and storage and offering tiered editions, Snowflake provides businesses the flexibility to purchase only the features they need while preserving the potential to scale.
- Security: If the need is a full suite of security and compliance then Redshift is the better choice. Snowflake allows varying levels of security during different phases and an option to compromise on compliance in the initial stages for cost savings.
- Data Handling: Snowflake’s support to handle and query JSON data is more robust than Redshift. In Redshift, the JSON is split into strings making it harder to handle.
If you have more questions, please reach out to us via chat. We are very familiar with both Redshift and Snowflake. If you are looking for a system integrator to help you with your data warehouse, you could also reach out to us.