Forecasting Snowflake Credit Usage
Author: Jacob Frackson, Practice Lead
Businesses often evaluate several cloud data warehouse solutions before deciding which one to use. While they may find one that best suits their data needs, getting a clear idea of annual costs remains a key consideration for decision-makers. In this blog, we walk you through a step-by-step guide to help you estimate your potential Snowflake costs.
While there is no perfect solution to forecast Snowflake costs, there are a few things businesses can proactively do to get a better estimate, and that starts with understanding the cost model. Snowflake costs can be broken down into the following groups:
- Compute costs: a small fraction of this is spent on Serverless and Cloud Services (~15%), while the majority is spent on Virtual Warehouse credits. This section is the hardest to estimate, and for most customers, it’s the main cost (e.g., 80% of the total Snowflake bill).
- Storage costs: the second largest cost for most customers is their data storage, but this is much simpler to calculate.
- Data egress costs: this third cost is not relevant for every customer, but it’s incurred when data is transferred to a different region or Cloud.
While all three of these are important, the biggest driver and the most complex are compute costs, and more specifically, Virtual Warehouse credits. In this post, we’ll explain how you can estimate this element of your total Snowflake cost.
Forecasting Snowflake credit usage will ultimately depend on three core factors:
- Number of virtual warehouses needed
- Size of virtual warehouses (i.e. computational power; CPU, Memory, Network).
- Frequency of running these virtual warehouses
To be explicit, the method presented here is only supposed to be used as an estimate. It is not an exact science and your costs may vary based on your real-world usage.
To better estimate your credit usage, we recommend the following steps.
1. Identify all necessary workloads or tools that will function within Snowflake – these will be your individual warehouses:
Common ones include:
- Data Loading (ETL tool)
- Analytics (BI tool)
- Ad Hoc Analytics
- Data Science
We found that running multiple small workloads on the same warehouse can be a great cost-saving technique. If you have listed workloads that seem too small to warrant a dedicated warehouse, feel free to group them together initially but note that it may not be appropriate for some workloads as it can lead to queue times.
2. Determine the warehouse size required for each workload:
We advise starting at an XS for most workloads at this stage. You can always optimise for performance at a later stage by vertically scaling up the virtual warehouse size. To further support this statement, Fivetran also recommends using an XS warehouse for loading data into Snowflake. It’s a great way to save on data load costs.
We have used Snowflake in tandem with many dbt projects and an XS warehouse was more than sufficient. A tell-tale sign that it is time to increase the size of the virtual warehouse is when the memory begins to spill over to local storage. You can read more about this here.
|Estimated Disk Space
Source: these are estimates of warehouse resources, referenced in this post from SELECT based on findings from 2019. These details may have since changed and they may vary across different cloud hosts (e.g., EC2 on AWS), but these estimates give a good sense of the approximate resources available at different scales.
An effective approach to select the best warehouse size for your use case is to start off with XS and then increase the size progressively until the query duration stops halving. This will give you the best returns in regard to cost-to-performance.
In the following hypothetical scenario, you can see that the sweet spot for this query workload is a S or M warehouse, with the job using 0.8 credits to run. As the warehouse is scaled up or down, the efficiency drops because it is either over-provisioned or under-provisioned to handle the workload. Most query workloads will have a curve similar to this (i.e. a sweet spot somewhere between XS and L, and then diminishing returns with the largest warehouse sizes).
Based on the chart, operating an XS warehouse for an hour will consume 1 credit and a Small warehouse will consume 2 credits – this is the credit/hour (A). Take note that for every 1 size increase in the virtual warehouse, both the computing power and the number of credits consumed per hour doubles.
As a general best practice, we recommend benchmarking where possible so that way you can have a meaningful comparison of whether your optimisations have been successful or not. For instance, if you’re migrating a dbt project from Redshift to Snowflake you can record the dbt runtimes of your most complex models on Redshift as the baseline and then compare those runtimes in your Snowflake trial.
Hypothetically, if a high complexity query takes 20 minutes on Redshift but 5 minutes on Snowflake with a M warehouse, that would be a 75% speed improvement but at a cost of 4 credits per hour. Using these pieces of data, you could then apply the time savings and credit costs to the various models executed with dbt run.
3. Start estimating the time usage of each warehouse
Estimate how many hours/day each warehouse will operate (B). If the warehouse is running a fixed workload, such as ingestion or transformation, you can estimate the total hours by benchmarking a portion of the workload (e.g., if 25% of the data volume took 10 minutes per day, then 100% should take 40 minutes per day).
If the warehouse is running a variable workload, such as BI or ad hoc analysis, determine the range of active hours (e.g., 8am to 8pm) and then estimate their utilisation as a percentage. If it’s a smaller team that also has meetings and other responsibilities, 50% may be a good estimate; if it’s a larger, distributed team where someone is more likely to always be working during that range, something closer to 90% or 100% may be more realistic. Note: the first minute of activity is always charged by Snowflake, but after that, they charge to the second until a warehouse is suspended, so 50% utilisation between 8am and 6pm is not unrealistic.
Next, estimate how many days/week each warehouse will operate. Remember to be mindful of excluding weekends if no work is done on Saturday/Sunday. (C)
Then, find your relevant price by filling in the pricing tool here. Price will also depend on which features the client needs to use which can be found on the pricing tool. (D)
Annual Credits = A * B (52 * C)
Annual Cost ($) = Annual Credits * D
To help better illustrate how we might employ this calculation, let’s explore a few examples together.
Organisation A has a relatively small data volume and wants to be as conservative as possible to save costs where they can, at least to start. This plan is far too small for most clients but it still remains useful as an example.
- They’ll start by running Fivetran, dbt Cloud, dbt developers and Looker all with the same warehouse.
- They’ll start with an S and then size up later if needed (i.e. 2 credits/hour)
- They won’t run on weekends. (5 days per week)
- They think the warehouse will run for about 8 hours per day: 2 hours during the night for Fivetran and dbt Cloud, and then sporadic use from 8am to 8pm every day (they estimate that it will be used for about 50% of the time in that window; this is not unrealistic because Snowflake charges to the minute so if you only run queries occasionally this is possible.
- They’re planning on hosting on AWS, US East on Enterprise. They chose Enterprise because they needed certain features (such as Dynamic Data Masking, but did not need Business Critical features like PrivateLink support).
- 2 * 8 (52 * 5) = 4,160 Credits per year
- 4,160 * $3 = $12,480 USD per year
Organisation B has a more defined workload and is more concerned with performance and being able to refresh data on time. This plan would be too large for some of our clients but realistic for others.
- They’ll start by running everything on separate warehouses
- Fivetran: will run for 15 minutes every hour on the hour between 6am and 8pm using a M warehouse (3.5 hours per day, 4 credits per hour).
- dbt cloud: will run for 10 minutes every hour on the half-hour between 6am and 8pm using a L warehouse (2.3 hours per day, 8 credits per hour).
- dbt developers: will run for about 5 hours every day and use a M warehouse (4 credits per hour).
- Looker: will run for about 5 hours every day and use a S warehouse (2 credits per hour).
- They won’t run on weekends.
- They’re planning on hosting on AWS, US East on Business Critical. They chose Business Critical because they are a healthcare company using AWS PrivateLink as a private network and they need HIPAA support; neither of which are available on Enterprise.
- 4 * 3.5 (52 * 5) + 8 * 2.3 (52 * 5) + 4 * 5 (52 * 5) + 2 * 5 (52 * 5) = 16,224 Credits per year
- 16,224 * $4 = $64,896 USD per year
Throughout this blog, we took a narrow approach to compute costs, rather than storage costs, as this will be your biggest contributor to your overall cost. Three core factors (number of running virtual warehouses, the size of these virtual warehouses and run frequency) were tied together in a relatively simple formula to calculate the estimated total costs per year.
Though this is a great start for new Snowflake accounts, we highly recommend implementing resource monitors at both an account level or warehouse level. You can set up alert triggers to pause compute to reduce costs in the event that periods of unanticipated higher usage occurs. This provides a greater degree of control over your billing.
We hope that this provided you with a useful starting point to forecasting your Snowflake costs with a focus on estimating credit usage for Snowflake.
For more information on what you can do to save costs and cut unnecessary spending, take a look at our FinOps whitepaper, 4 Steps to Cloud FinOps Maturity.
Datatonic has a wealth of experience in data warehouse implementation and optimisation. To discuss your pain points or work on a project together, contact us here.