In my last blog post, I explored what it takes to execute performance tests during any proper cloud data warehouse evaluation. In this blog entry, I’ll drill down into the important, yet often under-appreciated effort of compiling and socializing your testing results.
PART 3 (OF 5): How to Compile Testing Results
Keep in mind as you read this that, fundamentally, you’re evaluating a cloud data warehouse for a specific reason: maybe to cut costs, or to improve performance in order to deliver a better customer experience. Therefore an understandable outcome from your evaluation is expected by your leadership, and that outcome must be able to be socialized. So take some time when writing up your results; present them in a fashion that bridges the divide between the business and IT so the result is seen as a business-lead technology initiative. The effort will pay you dividends long term.
To start, begin by focusing on single-stream timing. The first thing you want to do is compute a timing score for each platform that you’re testing, and for each workload that you’re testing on each platform. Take your query times, lay them out in a spreadsheet, and make note of all the important settings of the test: what the size of the cluster was, what the workload was, links to which queries were performed, etc. Enter the timings for each iteration of each query, include the total timing for each run. Then, for each query, take the average time across all your runs. Now, instead of just calculating the average across all your queries as your performance benchmark, take the geometric mean (Excel function = GEOMEAN). This will weigh short queries as just as important as long queries. The resulting number (circled in red in the screenshot below), known as the Geometric Mean Timing, is a single number that you can use to compare the performance results of the cloud data warehouse platforms that you’re testing.
This gives you a performance perspective for one platform on one workload (Platform 1, Workload Z in the screenshot above).
You aren’t going to test the platforms with just one workload though, you’re going to test them for several such as Interactive, Reporting, Analytic, and Mixed. Therefore, you have to extrapolate your results across all workloads and platforms using a good visual indicator like a columnar graph in Excel:
If necessary (and it likely will be), you need to do the same kind of analysis for workloads under concurrency so you can evaluate each platform’s overall throughput. Use Concurrency = 5 or 10 for your throughput testing since, for most platforms that charge by time for a specific cluster size, higher concurrency will more than likely not improve throughput. You’ll compute the geometric mean of the concurrent timings just like you did for your single stream performance measurement, but extrapolate the time out to a queries / hour calculation so you can measure overall throughput. Do this by using the following equation: (1/geomean timing) X concurrency rate X 3600. This gives you the geometric mean of queries per hour, which is a really useful throughput measure of the power of each platform.
Now that you have your throughput (Queries/hour) calculated, you can go after the all-important financial equation of cost-effectiveness (Queries/$, a.k.a. “QPD”). Some platforms charge by time, all the way down to the second, and typically tie the cost to the size of the cluster you’re requesting. For these platforms, your cost-effectiveness equation is pretty simple: QPD = throughput/price-per-hour. However, some platforms charge by the query. For these platforms, you have to compute the actual cost for each query in the workload, usually by analyzing the query logs for the service or sometimes via API calls. Much like you did for single-stream and concurrent-stream timing, for platforms that charge per query you’ll want to calculate the geometric mean of all the prices and divide it by your geometric mean of throughput: QPD = geomean of throughput/geomean of price. It’s important to note that the QPD reflects only how much you can get done for a specific price on a fully loaded system; your actual results will depend on whether you can scale down the system costs when the system is not busy. This is a very important point, one of the key value propositions of cloud data warehousing: Elasticity. I’ll cover how to evaluate elasticity in my next blog entry.
There is a lot of math and analysis behind the work I just outlined above. In my opinion, it’s really important to obfuscate much of that detail when you present your evaluation results to your stakeholders. Most will not care how you derived your results, they’ll just care about two things: is it less expensive than traditional EDW technology, and does it provide a better/faster/improved result. So keep it simple, use graphs and charts to provide visualizations to your math, and you’ll go a long way to promote your recommendation.
In case you missed it, check out Part 2 – Execution.