|
| 1 | +# Benchpress |
| 2 | + |
| 3 | +Benchpress is a framework for e2e performance tests. |
| 4 | + |
| 5 | +# Why? |
| 6 | + |
| 7 | +There are so called "micro benchmarks" that esentially use a stop watch in the browser to measure time |
| 8 | +(e.g. via `performance.now()`). This approach is limited to time, and in some cases memory |
| 9 | +(Chrome with special flags), as metric. It does not allow to measure: |
| 10 | + |
| 11 | +- rendering time: e.g. the time the browser spends to layout or paint elements. This can e.g. used to |
| 12 | + test the performance impact of stylesheet changes. |
| 13 | +- garbage collection: e.g. how long the browser paused script execution, and how much memory was collected. |
| 14 | + This can be used to stabilize script execution time, as garbage collection times are usually very |
| 15 | + unpredictable. This data can also be used to measure and improve memory usage of applications, |
| 16 | + as the garbage collection amount directly affects garbage collection time. |
| 17 | +- distinguish script execution time from waiting: e.g. to measure the client side only time that is spent |
| 18 | + in a complex user interaction, ignoring backend calls. |
| 19 | + |
| 20 | +This kind of data is already available in the DevTools of modern browsers. However, there is no standard way to |
| 21 | +use those tools in an automated way to measure web app performance, especially not across platforms. |
| 22 | + |
| 23 | +Benchpress tries to fill this gap, i.e. allow to access all kinds of performance metrics in an automated way. |
| 24 | + |
| 25 | + |
| 26 | +# How it works |
| 27 | + |
| 28 | +Benchpress uses webdriver to read out the so called "performance log" of browsers. This contains all kinds of interesting |
| 29 | +data, e.g. when a script started/ended executing, gc started/ended, the browser painted something to the screen, ... |
| 30 | + |
| 31 | +As browsers are different, benchpress has plugins to normalizes these events. |
| 32 | + |
| 33 | + |
| 34 | +# Features |
| 35 | + |
| 36 | +* Provides a loop (so called "Sampler") that executes the benchmark multiple times |
| 37 | +* Automatically waits/detects until the browser is "warm" |
| 38 | +* Reporters provide a normalized way to store results: |
| 39 | + - console reporter |
| 40 | + - file reporter |
| 41 | + - Google Big Query reporter (coming soon) |
| 42 | +* Supports micro benchmarks as well via `console.time()` / `console.timeEnd()` |
| 43 | + - `console.time()` / `console.timeEnd()` mark the timeline in the DevTools, so it makes sense |
| 44 | + to use them in micro benchmark to visualize and understand them, with or without benchpress. |
| 45 | + - running micro benchmarks in benchpress leverages the already existing reporters, |
| 46 | + the sampler and the auto warmup feature of benchpress. |
| 47 | + |
| 48 | + |
| 49 | +# Supported browsers |
| 50 | + |
| 51 | +* Chrome on all platforms |
| 52 | +* Mobile Safari (iOS) |
| 53 | +* Firefox (work in progress) |
| 54 | + |
| 55 | + |
| 56 | +# How to write a benchmark |
| 57 | + |
| 58 | +A benchmark in benchpress is made by an application under test |
| 59 | +and a benchmark driver. The application under test is the |
| 60 | +actual application consisting of html/css/js that should be tests. |
| 61 | +A benchmark driver is a webdriver test that interacts with the |
| 62 | +application under test. |
| 63 | + |
| 64 | + |
| 65 | +## A simple benchmark |
| 66 | + |
| 67 | +Let's assume we want to measure the script execution time, as well as the render time |
| 68 | +that it takes to fill a container element with a complex html string. |
| 69 | + |
| 70 | +The application under test could look like this: |
| 71 | + |
| 72 | +``` |
| 73 | +index.html: |
| 74 | +
|
| 75 | +<button id="reset" onclick="reset()">Reset</button> |
| 76 | +<button id="fill" onclick="fill()">fill innerHTML</button> |
| 77 | +<div id="container"></div> |
| 78 | +<script> |
| 79 | + var container = document.getElementById('container'); |
| 80 | + var complexHtmlString = '...'; // TODO |
| 81 | +
|
| 82 | + function reset() { cotainer.innerHTML = ''; } |
| 83 | +
|
| 84 | + function fill() { |
| 85 | + container.innerHTML = complexHtmlString; |
| 86 | + } |
| 87 | +</script> |
| 88 | +``` |
| 89 | + |
| 90 | +A benchmark driver could look like this: |
| 91 | + |
| 92 | +``` |
| 93 | +// A runner contains the shared configuration |
| 94 | +// and can be shared across multiple tests. |
| 95 | +var runner = new Runner(...); |
| 96 | +
|
| 97 | +driver.get('http://myserver/index.html'); |
| 98 | +
|
| 99 | +var resetBtn = driver.findElement(By.id('reset')); |
| 100 | +var fillBtn = driver.findElement(By.id('fill')); |
| 101 | +
|
| 102 | +runner.sample({ |
| 103 | + id: 'fillElement', |
| 104 | + // Prepare is optional... |
| 105 | + prepare: () { |
| 106 | + resetBtn.click(); |
| 107 | + }, |
| 108 | + execute: () { |
| 109 | + fillBtn.click(); |
| 110 | + // Note: if fillBtn would use some asynchronous code, |
| 111 | + // we would need to wait here for its end. |
| 112 | + } |
| 113 | +}); |
| 114 | +``` |
| 115 | + |
| 116 | +## Measuring in the browser |
| 117 | + |
| 118 | +If the application under test would like to, it can measure on its own. |
| 119 | +E.g. |
| 120 | + |
| 121 | +``` |
| 122 | +index.html: |
| 123 | +
|
| 124 | +<button id="measure" onclick="measure()">Measure document.createElement</button> |
| 125 | +<script> |
| 126 | + function measure() { |
| 127 | + console.time('createElement*10000'); |
| 128 | + for (var i=0; i<100000; i++) { |
| 129 | + document.createElement('div'); |
| 130 | + } |
| 131 | + console.timeEnd('createElement*10000'); |
| 132 | + } |
| 133 | +</script> |
| 134 | +``` |
| 135 | + |
| 136 | +When the `measure` button is clicked, it marks the timeline and creates 10000 elements. |
| 137 | +It uses the special names `createElement*10000` to tell benchpress that the |
| 138 | +time that was measured is for 10000 calls to createElement and that benchpress should |
| 139 | +take the average for it. |
| 140 | + |
| 141 | +A test driver for this would look like this: |
| 142 | + |
| 143 | +```` |
| 144 | +driver.get('.../index.html'); |
| 145 | +
|
| 146 | +var measureBtn = driver.findElement(By.id('measure')); |
| 147 | +runner.sample({ |
| 148 | + id: 'createElement test', |
| 149 | + microMetrics: { |
| 150 | + 'createElement': 'time to create an element (ms)' |
| 151 | + }, |
| 152 | + execute: () { |
| 153 | + measureBtn.click(); |
| 154 | + } |
| 155 | +}); |
| 156 | +```` |
| 157 | + |
| 158 | +When looking into the DevTools Timeline, we see a marker as well: |
| 159 | + |
| 160 | + |
| 161 | +# Best practices |
| 162 | + |
| 163 | +* Use normalized environments |
| 164 | + - metrics that are dependent on the performance of the execution environment must be executed on a normalized machine |
| 165 | + - e.g. a real mobile device whose cpu frequency is set to a fixed value |
| 166 | + - e.g. a calibrated machine that does not run background jobs, has a fixed cpu frequency, ... |
| 167 | + |
| 168 | +* Use relative comparisons |
| 169 | + - relative comparisons are less likely to change over time and help to interpret the results of benchmarks |
| 170 | + - e.g. compare an example written using a ui framework against a hand coded example and track the ratio |
| 171 | + |
| 172 | +* Assert post-commit for commit ranges |
| 173 | + - running benchmarks can take some time. Running them before every commit is usually too slow. |
| 174 | + - when a regression is detected for a commit range, use bisection to find the problematic commit |
| 175 | + |
| 176 | +* Repeat benchmarks multiple times in a fresh window |
| 177 | + - run the same benchmark multiple times in a fresh window and then take the minimal average value of each benchmark run |
| 178 | + |
| 179 | +* Use force gc with care |
| 180 | + - forcing gc can skew the script execution time and gcTime numbers, |
| 181 | + but might be needed to get stable gc time / gc amount numbers |
| 182 | + |
| 183 | +* Open a new window for every test |
| 184 | + - browsers (e.g. chrome) might keep JIT statistics over page reloads and optimize pages differently depending on what has been loaded before |
| 185 | + |
| 186 | +# Detailed overview |
| 187 | + |
| 188 | + |
| 189 | + |
| 190 | +Definitions: |
| 191 | + |
| 192 | +* valid sample: a sample that represents the world that should be measured in a good way. |
| 193 | +* complete sample: sample of all measure values collected so far |
| 194 | + |
| 195 | +Components: |
| 196 | + |
| 197 | +* Runner |
| 198 | + - contains a default configuration |
| 199 | + - creates a new injector for every sample call, via which all other components are created |
| 200 | + |
| 201 | +* Sampler |
| 202 | + - gets data from the metrics |
| 203 | + - reports measure values immediately to the reporters |
| 204 | + - loops until the validator is able to extract a valid sample out of the complete sample (see below). |
| 205 | + - reports the valid sample and the complete sample to the reporters |
| 206 | + |
| 207 | +* Metric |
| 208 | + - gets measure values from the browser |
| 209 | + - e.g. reads out performance logs, DOM values, JavaScript values |
| 210 | + |
| 211 | +* Validator |
| 212 | + - extracts a valid sample out of the complete sample of all measure values. |
| 213 | + - e.g. wait until there are 10 samples and take them as valid sample (would include warmup time) |
| 214 | + - e.g. wait until the regression slope for the metric `scriptTime` through the last 10 measure values is >=0, i.e. the values for the `scriptTime` metric are no more decreasing |
| 215 | + |
| 216 | +* Reporter |
| 217 | + - reports measure values, the valid sample and the complete sample to backends |
| 218 | + - e.g. a reporter that prints to the console, a reporter that reports values into Google BigQuery, ... |
| 219 | + |
| 220 | +* WebDriverAdapter |
| 221 | + - abstraction over the used web driver client |
| 222 | + - one implementation for every webdriver client |
| 223 | + E.g. one for selenium-webdriver Node.js module, dart async webdriver, dart sync webdriver, ... |
| 224 | + |
| 225 | +* WebDriverExtension |
| 226 | + - implements additional methods that are standardized in the webdriver protocol using the WebDriverAdapter |
| 227 | + - provides functionality like force gc, read out performance logs in a normalized format |
| 228 | + - one implementation per browser, e.g. one for Chrome, one for mobile Safari, one for Firefox |
| 229 | + |
| 230 | + |
0 commit comments