Cloud Data Warehouses: Snowflake, RedShift, BigQuery, and Synapse Analytics

Collecting, Storing, and Organizing data is what we traditionally called Data Warehousing. When the storage is offered in the cloud with tools that makes the collection and organization easier, along with automated backups and scalability, we call it a Cloud Data Warehouse platform.

Apoorva team has been working with Traditional Data Warehouses since 2001. And now we work with Cloud Data Warehouses as a natural progression. There are many commercially available cloud data warehouses in the market. With the obvious listings from the three big companies, Google, Microsoft, and Amazon, the hot off the press  is Snowflake; especially, with their recent IPO. In this article, we would like to introduce the four large Cloud Data Warehouses that we recommend for your Analytics:

Snowflake

snowflake

Snowflake is the #1 Cloud Data Warehouse in the market. Primarily because they started the concept and others, like Google and Amazon, have adopted it with the cloud-storage resources that they have. Many of Apoorva’s clients have both structured data (tables and columns or spreadsheets) and unstructured data (word documents, PDFs, etc.). Snowflake handles both of them and in formats such as JSON and XML.

One of the biggest differentiation is their ability to deploy with multi-cloud option. And, we can write SQL based queries which is easier for our clients and their staff to understand and maintain in-house after complete the deployment. It doesn’t mean you can’t use other languages. We even used R and Python programming languages on top of the Snowflake platform.

As any data warehouse gets ingested with fresh data at scheduled intervals, Snowflake accommodates with data ingestion in the cloud. Snowflake can intake both structured and semi-structured data. Examples of structured data include CSV files and database tables, whereas examples of semi-structured data include JSON and XML. Some of the NoSQL databases such as MongoDB and MarkLogic are also considered as semi-structured data.

Amazon Redshift

aws

Amazon being the inventor of the Cloud-Storage, they are obviously into the the Cloud Data Warehouse game. Their Redshift Cloud Data Warehouse is picking up with tens of thousands of customers using it. You can query petabytes of structured and semi-structured data across operational databases and use standard SQL.

One advantage with the Amazon Redshift is their S3 Data Lake which allows open formats like Apache Parquet which allow you to further analyze from other analytics services like Amazon EMR, Amazon Athena, and Amazon SageMaker.

If you not an S3 user, you can still integrate with other data sources by using their “collection tool” called AWS Glue which is an Extract, Transform, and Load (ETL) tool. And the security is strong around the storage and operations as with any of the AWS products.

Google BigQuery

Google

The BigQuery, like the name suggests, allows you to run Queries on large Data Warehouses. With serverless architecture and highly scalable, Google’s BigQuery is a cost-effective multi-cloud data warehouse. Some of the nonprofit organizations that we work with use Google BigQuery due to its cost-effective nature. 

Google has been into the “sharing” strategy with their Google Docs and other products. Same with the data. You can access data and share insights and analytics with ease; of course in a secure manner. Stunning reports can be easily created out-of-the-box using popular Business Intelligence (BI) tools.

If you have streaming data coming, you can collect this in real-time and provide up-to-date information to all your business departments. With the Machine Learning (ML) capabilities available in the Google’s toolbox, you can also perform what-if analyses and predict the business outcomes easily without moving the data to another tool.

Microsoft Azure Synapse Analytics

microsoft

.

If you have Microsoft SQL Server in-house, you will probably lean towards the Microsoft Azure SQL Data Warehouse Synapse Analytics because of the easier adoption. Just like the SQL Server, the Azure Data Warehouse scales well for organizations of any size. The Azure version has take off, especially with the cloud offering of SQL Server, that provided many of the security options and features that are native to SQL Server.

One of the features that we love here are the Dynamic Data Masking (DDM) which allows hid sensitive data on the fly as the queries are executed which provides security control at a granular level. For Apoorva’s clients, we provide a software called ekLogic that helps not only mask the data but shuffle it so that you can take the “sensitivity” out of the sensitive data. This is useful for the programmers who wants to work with realistic data but not with sensitive data. For example, if we shuffle one client’s first name with another client’s last name, and associate transactions from a third client, we kept the essence of the data but not the sensitivity.

For the scalability’s point of view, Microsoft’s Massive Parallel Processing (MPP) architecture come in handy for the end-users to run hundreds of concurrent (at the same time) queries.

About Apoorva

Apoorva is a technology services company that assists software products with ideation, developing prototypes, programming, creating a digital marketing presence and accelerating sales through direct contact. Over 150 for-profit and non-profit organizations, such as Xcel Energy, PeopleCare Health Services, Frontier Airlines and Centers for Spiritual Living have trusted Apoorva to build software. Apoorva was founded in 2001, has more than 35 employees, and uses proprietary and proven methodologies to bring technology products to the market. Visit apoorva.com for more information.