Cloud, Data Platforms and Visualization, Technology
Snowflake️: All you need to know to get started
Snowflake has made its grand entry into the world of EDWs (Enterprise Datawarehouses) by calling itself Datawarehouse for the Cloud. Even when pitted against the competition, Snowflake firmly holds its ground. If you are looking for information on how to get started with your Snowflake Data Cloud, this article is just right for you.
What is Snowflake?
A data warehouse is essential for any company that wants to be data-driven. Having business data at the fingertips of analysts and decision-makers has never been more important. That said, it is no easy feat to architect an on-premise data warehouse.
It requires heavy duty IT support to set up and maintain the necessary hardware and software. On-premise data warehouses have traditionally only been viable at large companies and are often proved a sore point for both IT and Analytics teams.
Snowflake elegantly solves the problem of data warehousing. Being a cloud-based solution, Snowflake requires none of the heavy infrastructure set-up or up-front costs associated with traditional on-premise data-warehouses. It also allows users to scale up and down easily, whilst only paying for the storage and compute that they use. This makes it a great option for startups, where large upfront costs may not be feasible. Or for companies moving up the data hierarchy of needs, that need to start small, but scale up as their company progresses towards a more data-driven culture.
Its unique selling point comes in its layered architecture.
At the base we have the “Database Storage” layer, this is where Snowflake stores and efficiently organises data loaded into the platform. Above this sits the “Compute” layer, where our virtual warehouses live. Each of these is an independent compute cluster that has access to the storage layer. Being independent, these virtual warehouses do not compete for the same resources. This allows us to scale the computational power to run queries on our storage layer indefinitely. On top of all of this sits Snowflake’s “Cloud Services” layer. This is the layer that we directly interact with, and it coordinates our interactions with Snowflake’s underlying architecture, all through the universal language of SQL.
The separation of these layers and independent ability to scale each one is what makes Snowflake stand out from the crowd. It offers the flexibility to organisations to pay only for the storage and compute they need, while also allowing them to scale indefinitely to meet their needs.
Snowflake is for everyone who has anything to do with data!
Here are a few reasons why Snowflake is the best Cloud Warehousing solution.
- It has out-of-the box features like separation of storage and compute, on-the-fly scalable compute, data sharing, data cloning, and third party tools support.
- It can be used for a wide range of technology areas, including data integration, business intelligence, advanced analytics, security, and governance.
- For general users, Snowflake provides complete ANSI SQL language support for managing day-to-day operations.
- It’s cloud agnostic, with unlimited, seamless scalability across Amazon Web Services (AWS) and Microsoft Azure – with the prospect of adding Google Cloud soon.
- Snowflake can be used to fit many use cases – Data Lakes with raw data, ODS with staged data, and Data Warehouse / Data Marts with presentable and modeled data.
- It empowers us to analyze a variety of data structures including CSVs, JSON, XML, Parquet, Avro, and to blend them at the same time using SQL.
The Standard Edition is the introductory level offering providing full, unlimited access to all of Snowflake’s standard features. It provides a strong balance between features, level of support, and cost.
The Enterprise Edition provides all the features and services of the Standard Edition, with additional features designed specifically for the needs of large-scale enterprises and organizations.
Business Critical Edition
The Business Critical Edition, formerly known as Enterprise for Sensitive Data (ESD), offers even higher levels of data protection to support the needs of organizations with extremely sensitive data, particularly PHI data that must comply with HIPAA, HITRUST CSF, and SOC2 regulations. It includes all the features and services of the Enterprise Edition, with the addition of enhanced security and data protection. In addition, database failover/failback adds support for business continuity and disaster recovery.
Virtual Private Snowflake (VPS)
Virtual Private Snowflake offers the highest level of security for organizations that have the strictest requirements, such as financial institutions and other large enterprises that collect, analyze, and operate with highly sensitive data. It includes all the features and services of Business Critical Edition, but in a separate Snowflake environment, isolated from other Snowflake accounts.
The Pricing Model
Snowflake operates with a flexible pay-as-you-go model based on the volume of data you store and the compute time you use, therefore allowing you to create an account and start using it without delay.
The charge for storage is per terabyte, compressed, per month. The charge for compute is based on the processing units, which are referred to as credits, consumed to run your queries or perform a service.
Compute charges are billed on actual usage, per second. There are no charges during idle time and no added usage quotas or hidden price premiums.
However, without the proper planning to ensure governance and visibility on utilization, this model makes it easy to run up a significant bill as multiple business units ask for access. To combat this, you will need to decide in advance how to pay for Snowflake credits.
At Ideas2IT, we have developed a Cost Calculator tool to help you with planning and budgeting your Snowflake needs. Click here to estimate the cost based on the Snowflake edition, location, storage and expected usage requirements.
Snowflake provides some essential account-level usage information and a dashboard, but that dashboard is only useful if someone is looking at it as it has no alerting capabilities.
Additionally, the only role that can see everything useful is the ACCOUNTADMIN role and giving out access to this is not advisable. Moreover, Snowflake does not organize information by your company’s budget groupings. Let’s say Project X Project Y have separate budgets. Though you may have bought the credits in bulk for both projects to save money, you will presumably want to deduct credits from specific budgets.
To overcome these difficulties, we at Ideas2IT have built a highly useful Snowflake Dashboard tool to help you monitor your Snowflake accounts for cost usage and performance. Our clients love it and they say it’s indispensable. Click here to know more.
Security and Compliance
Snowflake provides multiple levels of security for accounts and users, as well as all the data stored. Snowflake is also continuing to expand its portfolio of Security & Compliance Reports, such as HIPAA.
- Snowflake’s site access is managed through network policies which contain allowed IPs and a Blocked IPs list.
- Azure Private Link provides private connectivity to Snowflake through a private IP address. Traffic can only occur from the customer’s Virtual Network (VNet) to the Snowflake VNet using a Microsoft backbone, therefore avoiding the public Internet.
- AWS PrivateLink enables creating a highly-secure network between Snowflake and your other AWS Virtual Private Cloud (in the same AWS region), therefore offering full protection from unauthorized external access.
- Key Pair Authentication & Key Pair Rotation for increased security with client authentication.
- MFA (multi-factor authentication) for increased security for account access.
- OAuth for authorized account access without sharing or storing user login credentials.
- Support for user SSO (single sign-on) through federated authentication.
The access is controlled to all objects in the account (i.e., users, warehouses, databases, tables) through a hybrid model of DAC (Discretionary Access Control) and RBAC (Role-based Access Control).
- All ingested data is stored in Snowflake tables and all files are stored in internal stages for data loading and unloading operations. The data is encrypted using AES-256 encryption.
- Periodic rekeying of encrypted data (in Enterprise or Higher).
- Support for encrypting data using customer-managed keys (Business Critical or Higher).
Government and Industry Data Security Compliance
Snowflake’s government deployments have achieved Federal Risk & Authorization Management Program (FedRAMP) Authorization to Operate (ATO), at the Moderate level.
In addition, Snowflake complies with the specifications mandated by benchmarks like SOC2 Type 2, PCI-DSS compliance, and HIPAA as required by industries, and state and federal governments.
Data Loading options in Snowflake
Snowflake provides three different ways of loading data depending on the volume and frequency of loading. You could also leverage the several third party ETL and data loading solutions available in the market.
- Bulk loading of large amounts of data could be done using SQL commands such as COPY in SnowSQL in the Snowflake CLI.
- Automated bulk loading of data could be accomplished with Snowpipe. It uses the same COPY command, but with additional features that lets you automate this process.
- You could also use the Snowflake web interface to load a limited amount of data.
Snowflake supports a handful of file formats, ranging from structured to semi-structured. Layered on top of the file formats are protocols we could use to bring that data into Snowflake.
Getting Started: Sign up for a free trial
Yes, there’s a free trial that you can try out, without paying anything upfront.
To get started, you simply need to register here. Upon registration, you will receive an email with a link to access your newly created Snowflake instance, for which you can set a username and password. From this point onwards you have $400 of Snowflake credits at your disposal. The free trial account will give you a fair idea of the platform and the features it has to offer.
Founded in 2009 for captive product development by an ex-Googler, we help our customers execute forward looking initiatives by leveraging cutting edge technologies like AI/ML, Cloud-Native, BlockChain, and IIoT. We are an energetic team of software engineers, and our impact far exceeds our size. We work with passionate innovators in start-ups and enterprises. Refer here for more details on our Snowflake cloud platform services and expertise.