Skip to content

Commit 8598c87

Browse files
committed
docs(bench press): add initial docs
1 parent 33bfc4c commit 8598c87

File tree

3 files changed

+234
-0
lines changed

3 files changed

+234
-0
lines changed

modules/benchpress/docs/index.md

Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
# Benchpress
2+
3+
Benchpress is a framework for e2e performance tests.
4+
5+
# Why?
6+
7+
There are so called "micro benchmarks" that esentially use a stop watch in the browser to measure time
8+
(e.g. via `performance.now()`). This approach is limited to time, and in some cases memory
9+
(Chrome with special flags), as metric. It does not allow to measure:
10+
11+
- rendering time: e.g. the time the browser spends to layout or paint elements. This can e.g. used to
12+
test the performance impact of stylesheet changes.
13+
- garbage collection: e.g. how long the browser paused script execution, and how much memory was collected.
14+
This can be used to stabilize script execution time, as garbage collection times are usually very
15+
unpredictable. This data can also be used to measure and improve memory usage of applications,
16+
as the garbage collection amount directly affects garbage collection time.
17+
- distinguish script execution time from waiting: e.g. to measure the client side only time that is spent
18+
in a complex user interaction, ignoring backend calls.
19+
20+
This kind of data is already available in the DevTools of modern browsers. However, there is no standard way to
21+
use those tools in an automated way to measure web app performance, especially not across platforms.
22+
23+
Benchpress tries to fill this gap, i.e. allow to access all kinds of performance metrics in an automated way.
24+
25+
26+
# How it works
27+
28+
Benchpress uses webdriver to read out the so called "performance log" of browsers. This contains all kinds of interesting
29+
data, e.g. when a script started/ended executing, gc started/ended, the browser painted something to the screen, ...
30+
31+
As browsers are different, benchpress has plugins to normalizes these events.
32+
33+
34+
# Features
35+
36+
* Provides a loop (so called "Sampler") that executes the benchmark multiple times
37+
* Automatically waits/detects until the browser is "warm"
38+
* Reporters provide a normalized way to store results:
39+
- console reporter
40+
- file reporter
41+
- Google Big Query reporter (coming soon)
42+
* Supports micro benchmarks as well via `console.time()` / `console.timeEnd()`
43+
- `console.time()` / `console.timeEnd()` mark the timeline in the DevTools, so it makes sense
44+
to use them in micro benchmark to visualize and understand them, with or without benchpress.
45+
- running micro benchmarks in benchpress leverages the already existing reporters,
46+
the sampler and the auto warmup feature of benchpress.
47+
48+
49+
# Supported browsers
50+
51+
* Chrome on all platforms
52+
* Mobile Safari (iOS)
53+
* Firefox (work in progress)
54+
55+
56+
# How to write a benchmark
57+
58+
A benchmark in benchpress is made by an application under test
59+
and a benchmark driver. The application under test is the
60+
actual application consisting of html/css/js that should be tests.
61+
A benchmark driver is a webdriver test that interacts with the
62+
application under test.
63+
64+
65+
## A simple benchmark
66+
67+
Let's assume we want to measure the script execution time, as well as the render time
68+
that it takes to fill a container element with a complex html string.
69+
70+
The application under test could look like this:
71+
72+
```
73+
index.html:
74+
75+
<button id="reset" onclick="reset()">Reset</button>
76+
<button id="fill" onclick="fill()">fill innerHTML</button>
77+
<div id="container"></div>
78+
<script>
79+
var container = document.getElementById('container');
80+
var complexHtmlString = '...'; // TODO
81+
82+
function reset() { cotainer.innerHTML = ''; }
83+
84+
function fill() {
85+
container.innerHTML = complexHtmlString;
86+
}
87+
</script>
88+
```
89+
90+
A benchmark driver could look like this:
91+
92+
```
93+
// A runner contains the shared configuration
94+
// and can be shared across multiple tests.
95+
var runner = new Runner(...);
96+
97+
driver.get('http://myserver/index.html');
98+
99+
var resetBtn = driver.findElement(By.id('reset'));
100+
var fillBtn = driver.findElement(By.id('fill'));
101+
102+
runner.sample({
103+
id: 'fillElement',
104+
// Prepare is optional...
105+
prepare: () {
106+
resetBtn.click();
107+
},
108+
execute: () {
109+
fillBtn.click();
110+
// Note: if fillBtn would use some asynchronous code,
111+
// we would need to wait here for its end.
112+
}
113+
});
114+
```
115+
116+
## Measuring in the browser
117+
118+
If the application under test would like to, it can measure on its own.
119+
E.g.
120+
121+
```
122+
index.html:
123+
124+
<button id="measure" onclick="measure()">Measure document.createElement</button>
125+
<script>
126+
function measure() {
127+
console.time('createElement*10000');
128+
for (var i=0; i<100000; i++) {
129+
document.createElement('div');
130+
}
131+
console.timeEnd('createElement*10000');
132+
}
133+
</script>
134+
```
135+
136+
When the `measure` button is clicked, it marks the timeline and creates 10000 elements.
137+
It uses the special names `createElement*10000` to tell benchpress that the
138+
time that was measured is for 10000 calls to createElement and that benchpress should
139+
take the average for it.
140+
141+
A test driver for this would look like this:
142+
143+
````
144+
driver.get('.../index.html');
145+
146+
var measureBtn = driver.findElement(By.id('measure'));
147+
runner.sample({
148+
id: 'createElement test',
149+
microMetrics: {
150+
'createElement': 'time to create an element (ms)'
151+
},
152+
execute: () {
153+
measureBtn.click();
154+
}
155+
});
156+
````
157+
158+
When looking into the DevTools Timeline, we see a marker as well:
159+
![Marked Timeline](marked_timeline.png)
160+
161+
# Best practices
162+
163+
* Use normalized environments
164+
- metrics that are dependent on the performance of the execution environment must be executed on a normalized machine
165+
- e.g. a real mobile device whose cpu frequency is set to a fixed value
166+
- e.g. a calibrated machine that does not run background jobs, has a fixed cpu frequency, ...
167+
168+
* Use relative comparisons
169+
- relative comparisons are less likely to change over time and help to interpret the results of benchmarks
170+
- e.g. compare an example written using a ui framework against a hand coded example and track the ratio
171+
172+
* Assert post-commit for commit ranges
173+
- running benchmarks can take some time. Running them before every commit is usually too slow.
174+
- when a regression is detected for a commit range, use bisection to find the problematic commit
175+
176+
* Repeat benchmarks multiple times in a fresh window
177+
- run the same benchmark multiple times in a fresh window and then take the minimal average value of each benchmark run
178+
179+
* Use force gc with care
180+
- forcing gc can skew the script execution time and gcTime numbers,
181+
but might be needed to get stable gc time / gc amount numbers
182+
183+
* Open a new window for every test
184+
- browsers (e.g. chrome) might keep JIT statistics over page reloads and optimize pages differently depending on what has been loaded before
185+
186+
# Detailed overview
187+
188+
![Overview](overview.svg)
189+
190+
Definitions:
191+
192+
* valid sample: a sample that represents the world that should be measured in a good way.
193+
* complete sample: sample of all measure values collected so far
194+
195+
Components:
196+
197+
* Runner
198+
- contains a default configuration
199+
- creates a new injector for every sample call, via which all other components are created
200+
201+
* Sampler
202+
- gets data from the metrics
203+
- reports measure values immediately to the reporters
204+
- loops until the validator is able to extract a valid sample out of the complete sample (see below).
205+
- reports the valid sample and the complete sample to the reporters
206+
207+
* Metric
208+
- gets measure values from the browser
209+
- e.g. reads out performance logs, DOM values, JavaScript values
210+
211+
* Validator
212+
- extracts a valid sample out of the complete sample of all measure values.
213+
- e.g. wait until there are 10 samples and take them as valid sample (would include warmup time)
214+
- e.g. wait until the regression slope for the metric `scriptTime` through the last 10 measure values is >=0, i.e. the values for the `scriptTime` metric are no more decreasing
215+
216+
* Reporter
217+
- reports measure values, the valid sample and the complete sample to backends
218+
- e.g. a reporter that prints to the console, a reporter that reports values into Google BigQuery, ...
219+
220+
* WebDriverAdapter
221+
- abstraction over the used web driver client
222+
- one implementation for every webdriver client
223+
E.g. one for selenium-webdriver Node.js module, dart async webdriver, dart sync webdriver, ...
224+
225+
* WebDriverExtension
226+
- implements additional methods that are standardized in the webdriver protocol using the WebDriverAdapter
227+
- provides functionality like force gc, read out performance logs in a normalized format
228+
- one implementation per browser, e.g. one for Chrome, one for mobile Safari, one for Firefox
229+
230+
21.8 KB
Loading

modules/benchpress/docs/overview.svg

Lines changed: 4 additions & 0 deletions
Loading

0 commit comments

Comments
 (0)