Snowflake has numerous advantages and strategic values who all lead to the success we know and witness today. In the last year, Snowflake has become an established value as a cloud data platform thanks to an amazing marketing campaign but moreover, by one of the most successful IPOs in the history of IT.
Before going into more detail about the unique features of Snowflake, it is crucial to know what the most common obstacles and issues are in an organization for which Snowflake offers a solution.
Most organizations possess an enormous amount of data from all sorts of applications, in and outside of their organization. This data is often not logically structured, which means there is no line of communication between the applications.
The bigger the organization gets over time, the more complex this web of information gets.
When taking (operational) decisions it is of the utmost importance that all necessary information is taken into account. This means a clear overview with all correlations included should be made available.
Faster integration, retrieval, analysis and distribution of data means a faster time to market. This time to market is crucial when it comes to profits and costs. There are cost savings in terms of FTEs because fewer employees need to be deployed for a shorter period of time. On the other hand, revenues can be acknowledged faster.
When only a limited amount of people has access to the data or know how to access the data, it creates a dependency that slows down all internal processes from analysis to decision making.
By making the data easily accessible in an organization, departments are able to test, create reports and thus acquire insights without waiting for person x of department y to have the time to prepare a set of data which results in the needed report. All of this with the risk that something went wrong in translation and the report does not meet expectations.
Business has no time for complex systems with thousands of buttons and queries. They need a user friendly and simple platform which gives them the information with just one click.
An ever-changing environment results in ever-changing needs and wishes. This means that the business case that exists today was not there last year and will not be there next week or even next quarter. Your data platform needs to be prepared to seamlessly integrate these continuous changes. They are obliged to buy more and more hardware or software to keep up which goes hand in hand with significant costs.
All these very recognizable problems are being solved by Snowflake. A cloud data platform where speed, scalability and zero management are key features and are made possible by the following unique characteristics.
Designed for the cloud
Snowflake is the only data platform that was designed for the cloud. Unlike other data platform suppliers, Snowflake is not based on an on-premise solution. The underlying data storage is designed for optimal cloud and data warehousing workloads.
Snowflake has a unique data warehouse architecture where storage, compute and cloud services are separated and are independently scalable. Snowflake offers a linear scalability because of the detachment of storage and compute whereas other legacy vendors do not use this separation.
Snowflake makes sharing data between Snowflake accounts easy, safe and simple through Secure Data Sharing. Databases, schemas and tables (on record level) are provided via Shares with other accounts. The receiving party only needs to import the data and the data will always be synchronized from then on. It suffices to set up one share to provide the receiving party with constant updated data. This is all possible without adding volume to your storage. There are two possibilities for computes costs:
- The receiving party has its own Snowflake account: the external uses will be responsible for the costs that are associated with querying the data.
- The receiving party has no Snowflake account: you have the possibility to set up reader accounts (on the platform). These reader accounts have read-only access which allows them to access only the data that is provided to them. The receiving party can then migrate the data to its own environment. The costs of the reader accounts is borne by the account it is associated with.
The advantages are:
- No tool is necessary to migrate data between two parties
- The availability of up-to-date data
- No time is lost by transferring large volumes of data between two or more parties
- Business users are able to use self service capabilities to test and produce reports
No infrastructure & zero management
Snowflake is focused on minimizing the management of your data platform by delivery as a service. This means you always have the most recent version of the platform at your disposal, have constant access to the newest functionalities, there is no maintenance of servers, no optimization of storage, no query performance tuning (indexes, partitioning, vacuuming,…) etc. Everything is executed automatically and results in an FTE reduction because there is no need for DBA specialized profiles.
Snowflake is a data platform designed for analytical queries on large data volumes. The accelerated time to market that is a direct benefit from the automatically scalable virtual warehouses which make the compute power adaptable to the workload. These dedicated virtual warehouses provide different compute power to data loading or reporting. This way a virtual warehouse is customized for every specific need and queries can be executed while still loading data from your source systems in real time.
Unlimited and automatic scalability
The unique MPP (= massively parallel processing) architecture of Snowflake, where storage is separated from compute, makes it possible to have unlimited scalability. The amount of data on a Snowflake account can increase without effecting compute and the other way around. The architecture is built to process, to storage and to use enormous amounts of data.
Compute resources can scale without interruption or downtime, while queries (for ingestion and output) stay active. The data does not have to be partitioned again. All this is happening by automatic detection within Snowflake. Snowflake detect when scaling is necessary without intervention of a database administrator or platform users. Rescaling in traditional MPP data platform require a read only mode setting or a restart, meaning manual intervention.
Automatic compression and Micro Partitioning
When data is loaded into the Snowflake data platform, it is automatically compressed with a ratio between 5:1 and 4:1. This means an amount of 10 TB will result in 2 to 2,5 TB of data in the data warehouse so storage costs are reduced significantly.
All the data that is loaded into the Snowflake platform undergoes micro-partitioning. The data will be logically divided into small pieces (a few rows) so queries can be executed at a higher speed.
Structured and semi- structured data
The new data type of Snowflake, VARIANT, makes it possible to load semi structured data such as JSON, Avro, Parquet and XML into tables without any transformation. Snowflake SQL on the other hand, allows structured and semi structured data to be combined in one SQL statement.
Pay for what you use
You pay for the compute power you use. When there are no queries being executed, no costs will be charged to your account.
They apply a billing per second mechanism and an automatic suspense for non-active warehouses which all result in a significant cost reduction and the possibility to keep a clear overview of costs and usage.
Connectors with different ETL en Dashboarding tools
Snowflake works efficiently with a range of ETL tools (such as Talend, Matillion,…) and Dashboarding tools (Tableau, Power BI,..).
Cloud agnostic means that a data warehouse in Snowflake can be built on any cloud platform on the market so migration costs will have a minimum impact and no vendor lock in is created.
Continuity and security are key elements in Snowflake’s architecture. Snowflake accounts are automatically spread between three different data centers within a region. If one of the data centers is lost due to malfunction (very unlikely), there are still two fully functioning back-ups available. Because everything is automatically arranged behind the scenes, the end user will not even notice the malfunction.
The retention of data is guaranteed through Time Travel and Fail Safe. Time Travel makes it possible to query deleted or updated data (by the user) with a time limit of 90 days, if you use the enterprise edition. Fail Safe is an extra protection measure where data can be queried up to seven days back in time (available for all accounts). These two measures add up to a maximum recovery period of 97 days, which means periodic backups become completely unnecessary.