blob: c58fed307f35ffdbd00f998e0f7ea60704d6d7df [file] [log] [blame] [view]
Chris Palmeraef94dd2019-01-18 00:34:151# The Rule Of 2
2
3When you write code to parse, evaluate, or otherwise handle untrustworthy inputs
4from the Internet which is almost everything we do in a web browser! we like
5to follow a simple rule to make sure it's safe enough to do so. The Rule Of 2
6is: Pick no more than 2 of
7
8 * untrustworthy inputs;
9 * unsafe implementation language; and
10 * high privilege.
11
Adrian Taylore1f34902019-08-10 01:07:4512![alt text](rule-of-2-drawing.png "Venn diagram showing you should always use
13a safe language, a sandbox, or not be processing untrustworthy inputs in the first
Adrian Taylora4aa0162019-08-12 20:33:1614place.")
Adrian Taylore1f34902019-08-10 01:07:4515
16(drawing source
Adrian Taylora4aa0162019-08-12 20:33:1617[here](https://docs.google.com/drawings/d/12WoPI7-E5NAINHUZqEPGn38aZBYBxq20BgVBjZIvgCQ/edit?usp=sharing))
Adrian Taylore1f34902019-08-10 01:07:4518
Chris Palmeraef94dd2019-01-18 00:34:1519## Why?
20
21When code that handles untrustworthy inputs at high privilege has bugs, the
22resulting vulnerabilities are typically of Critical or High severity. (See our
23[Severity Guidelines](severity-guidelines.md).) We'd love to reduce the severity
24of such bugs by reducing the amount of damage they can do (lowering their
Chris Palmer80708032019-03-06 20:21:2825privilege), avoiding the various types of memory corruption bugs (using a safe
Chris Palmeraef94dd2019-01-18 00:34:1526language), or reducing the likelihood that the input is malicious (asserting the
27trustworthiness of the source).
28
Chris Palmer80708032019-03-06 20:21:2829For the purposes of this document, our main concern is reducing (and hopefully,
30ultimately eliminating) bugs that arise due to _memory unsafety_. [A recent
31study by Matt Miller from Microsoft
32Security](https://github.com/Microsoft/MSRC-Security-Research/blob/master/presentations/2019_02_BlueHatIL/2019_01%20-%20BlueHatIL%20-%20Trends%2C%20challenge%2C%20and%20shifts%20in%20software%20vulnerability%20mitigation.pdf)
33states that "~70% of the vulnerabilities addressed through a security update
34each year continue to be memory safety issues". A trip through Chromium's bug
35tracker will show many, many vulnerabilities whose root cause is memory
Chris Palmer57171f922019-03-08 22:42:2836unsafety. (As of March 2019, only about 5 of 130 [public Critical-severity
37bugs](https://bugs.chromium.org/p/chromium/issues/list?can=1&q=Type%3DBug-Security+Security_Severity%3DCritical+-status%3AWontFix+-status%3ADuplicate&sort=&groupby=&colspec=ID+Pri+M+Stars+ReleaseBlock+Component+Status+Owner+Summary+OS+Modified&x=m&y=releaseblock&mode=&cells=ids&num=)
38are not obviously due to memory corruption.)
Chris Palmer80708032019-03-06 20:21:2839
40Security engineers in general, very much including Chrome Security Team, would
41like to advance the state of engineering to where memory safety issues are much
42more rare. Then, we could focus more attention on the application-semantic
43vulnerabilities. 😊 That would be a big improvement.
44
Chris Palmeraef94dd2019-01-18 00:34:1545## What?
46
Chris Palmer80708032019-03-06 20:21:2847Some definitions are in order.
48
49### Untrustworthy Inputs
50
Chris Palmeraef94dd2019-01-18 00:34:1551_Untrustworthy inputs_ are inputs that
52
Chris Palmer80708032019-03-06 20:21:2853 * have non-trivial grammars; and/or
Chris Palmeraef94dd2019-01-18 00:34:1554 * come from untrustworthy sources.
55
Chris Palmer80708032019-03-06 20:21:2856If there were an input type so simple that it were straightforward to write a
57memory-safe handler for it, we wouldn't need to worry much about where it came
58from **for the purposes of memory safety**, because we'd be sure we could handle
59it. We would still need to treat the input as untrustworthy after
60parsing, of course.
61
Chris Palmer42cd4012019-01-26 02:06:0762Unfortunately, it is very rare to find a grammar trivial enough that we can
63trust ourselves to parse it successfully or fail safely. (But see
Jeremy Roman67611d892022-10-13 17:45:0164[Normalization](#normalization) for a potential example.) Therefore, we do need
Chris Palmer80708032019-03-06 20:21:2865to concern ourselves with the provenance of such inputs.
Chris Palmer42cd4012019-01-26 02:06:0766
Chris Palmer80708032019-03-06 20:21:2867Any arbitrary peer on the Internet is an untrustworthy source, unless we get
68some evidence of its trustworthiness (which includes at least [a strong
69assertion of the source's
70identity](#verifying-the-trustworthiness-of-a-source)). When we can know with
71certainty that an input is coming from the same source as the application itself
72(e.g. Google in the case of Chrome, or Mozilla in the case of Firefox), and that
73the transport is integrity-protected (such as with HTTPS), then it can be
74acceptable to parse even complex inputs from that source. It's still ideal,
Chris Palmer57171f922019-03-08 22:42:2875where feasible, to reduce our degree of trust in the source — such as by parsing
76the input in a sandbox.
Chris Palmeraef94dd2019-01-18 00:34:1577
Chris Palmer80708032019-03-06 20:21:2878### Unsafe Implementation Languages
79
80_Unsafe implementation languages_ are languages that lack [memory
81safety](https://en.wikipedia.org/wiki/Memory_safety), including at least C, C++,
82and assembly language. Memory-safe languages include Go, Rust, Python, Java,
83JavaScript, Kotlin, and Swift. (Note that the safe subsets of these languages
84are safe by design, but of course implementation quality is a different story.)
85
danakjbb48339e752022-10-06 22:59:3586#### Unsafe Code in Safe Languages
87
88Some memory-safe languages provide a backdoor to unsafety, such as the `unsafe`
89keyword in Rust. This functions as a separate unsafe language subset inside the
90memory-safe one.
91
92The presence of unsafe code does not negate the memory-safety properties of the
93memory-safe language around it as a whole, but _how_ unsafe code is used is
94critical. Poor use of an unsafe language subset is not meaningfully different
95from any other unsafe implementation language.
96
97In order for a library with unsafe code to be safe for the purposes of the Rule
98of 2, all unsafe usage must be able to be reviewed and verified by humans with
99simple local reasoning. To achieve this, we expect all unsafe usage to be:
100* Small: The minimal possible amount of code to perform the required task
101* Encapsulated: All access to the unsafe code is through a safe API
102* Documented: All preconditions of an unsafe block (e.g. a call to an unsafe
103 function) are spelled out in comments, along with explanations of how they are
104 satisfied.
105
106Because unsafe code reaches outside the normal expectations of a memory-safe
107language, it must follow strict rules to avoid undefined behaviour and
108memory-safety violations, and these are not always easy to verify. A careful
109review by one or more experts in the unsafe language subset is required.
110
111It should be safe to use any code in a memory-safe language in a high-privilege
112context. As such, the requirements on a memory-safe language implementation are
113higher: All code in a memory-safe language must be capable of satisfying the
114Rule of 2 in a high-privilege context (including any unsafe code) in order to be
115used or admitted anywhere in the project.
116
Chris Palmer80708032019-03-06 20:21:28117### High Privilege
Chris Palmeraef94dd2019-01-18 00:34:15118
119_High privilege_ is a relative term. The very highest-privilege programs are the
120computer's firmware, the bootloader, the kernel, any hypervisor or virtual
121machine monitor, and so on. Below that are processes that run as an OS-level
danakjbb4d0c772023-10-13 13:22:28122account representing a person; this includes the Chrome Browser process and Gpu
123process. We consider such processes to have high privilege. (After all, they
124can do anything the person can do, with any and all of the person's valuable
125data and accounts.)
Chris Palmeraef94dd2019-01-18 00:34:15126
danakjbb4d0c772023-10-13 13:22:28127Processes with slightly reduced privilege will (hopefully soon) include the
128network process. These are still pretty high-privilege processes. We are always
129looking for ways to reduce their privilege without breaking them.
Chris Palmeraef94dd2019-01-18 00:34:15130
131Low-privilege processes include sandboxed utility processes and renderer
danakjbb4d0c772023-10-13 13:22:28132processes with [Site Isolation](
133https://www.chromium.org/Home/chromium-security/site-isolation) (very good) or
134[origin isolation](
135https://cloud.google.com/docs/chrome-enterprise/policies/?policy=IsolateOrigins)
Chris Palmeraef94dd2019-01-18 00:34:15136(even better).
137
Chris Palmere4b62db52021-05-10 16:59:48138### Processing, Parsing, And Deserializing
139
140Turning a stream of bytes into a structured object is hard to do correctly and
141safely. For example, turning a stream of bytes into a sequence of Unicode code
142points, and from there into an HTML DOM tree with all its elements, attributes,
143and metadata, is very error-prone. The same is true of QUIC packets, video
144frames, and so on.
145
146Whenever the code branches on the byte values it's processing, the risk
147increases that an attacker can influence control flow and exploit bugs in the
148implementation.
149
150Although we are all human and mistakes are always possible, a function that does
151not branch on input values has a better chance of being free of vulnerabilities.
152(Consider an arithmetic function, such as SHA-256, for example.)
153
Chris Palmeraef94dd2019-01-18 00:34:15154## Solutions To This Puzzle
155
Alex Gaynor56975112019-02-07 19:15:07156Chrome Security Team will generally not approve landing a CL or new feature
Chris Palmeraef94dd2019-01-18 00:34:15157that involves all 3 of untrustworthy inputs, unsafe language, and high
158privilege. To solve this problem, you need to get rid of at least 1 of those 3
159things. Here are some ways to do that.
160
danakjbb4d0c772023-10-13 13:22:28161### Safe Languages
162
163Where possible, it's great to use a memory-safe language. The following
164memory-safe languages are approved for use in Chromium:
165* Java (on Android only)
166* Swift (on iOS only)
Minseop Choi2aad6d1f2025-05-16 03:39:05167* [Rust](../rust.md) (for [third-party use](
168 ../adding_to_third_party.md#Rust))
danakjbb4d0c772023-10-13 13:22:28169* JavaScript or WebAssembly (although we don't currently use them in
170 high-privilege processes like the browser/gpu process)
171
172One can imagine Kotlin on Android, too, although it is not currently
173used in Chromium.
174
175For an example of image processing, we have the pure-Java class
176[BaseGifImage](https://cs.chromium.org/chromium/src/third_party/gif_player/src/jp/tomorrowkey/android/gifplayer/BaseGifImage.java?rcl=27febd503d1bab047d73df26db83184fff8d6620&l=27).
177On Android, where we can use Java and also face a particularly high cost for
178creating new processes (necessary for sandboxing), using Java to decode tricky
Dustin J. Mitchell038459f2025-02-19 15:11:22179formats can be a great approach. Before switching to a Rust-based parser, we
180used a Java [JsonSanitizer](https://cs.chromium.org/chromium/src/services/data_decoder/public/cpp/android/java/src/org/chromium/services/data_decoder/JsonSanitizer.java),
danakjbb4d0c772023-10-13 13:22:28181to 'vet' incoming JSON in a memory-safe way before passing the input to the C++
182JSON implementation.
183
184On Android, many system APIs that are exposed via Java are not actually
185implemented in a safe language, and are instead just facades around an unsafe
186implementation. A canonical example of this is the
187[BitmapFactory](https://developer.android.com/reference/android/graphics/BitmapFactory)
188class, which is a Java wrapper [around C++
189Skia](https://cs.android.com/android/platform/superproject/+/master:frameworks/base/libs/hwui/jni/BitmapFactory.cpp;l=586;drc=864d304156d1ef8985ee39c3c1858349b133b365).
190These APIs are therefore not considered memory-safe under the rule.
191
192The [QR code generator](
193https://source.chromium.org/chromium/chromium/src/+/main:components/qr_code_generator/;l=1;drc=b185db5d502d4995627e09d62c6934590031a5f2)
194is an example of a cross-platform memory-safe Rust library in use in Chromium.
195
Chris Palmeraef94dd2019-01-18 00:34:15196### Privilege Reduction
197
198Also known as [_sandboxing_](https://cs.chromium.org/chromium/src/sandbox/),
199privilege reduction means running the code in a process that has had some or
200many of its privileges revoked.
201
202When appropriate, try to handle the inputs in a renderer process that is Site
203Isolated to the same site as the inputs come from. Take care to validate the
Chris Palmer57171f922019-03-08 22:42:28204parsed (processed) inputs in the browser, since only the browser can trust
205itself to validate and act on the meaning of an object.
Chris Palmeraef94dd2019-01-18 00:34:15206
207Equivalently, you can launch a sandboxed utility process to handle the data, and
Chris Palmer57171f922019-03-08 22:42:28208return a well-formed response back to the caller in an IPC message. See [Safe
Chris Palmeraef94dd2019-01-18 00:34:15209Browsing's ZIP
Chris Palmer57171f922019-03-08 22:42:28210analyzer](https://cs.chromium.org/chromium/src/chrome/common/safe_browsing/zip_analyzer.h)
Tim Sergeantbc92ae82022-01-18 22:00:07211for an example. The [Data Decoder Service](https://source.chromium.org/chromium/chromium/src/+/main:services/data_decoder/public/cpp/data_decoder.h)
212facilitates this safe decoding process for several common data formats.
Chris Palmeraef94dd2019-01-18 00:34:15213
214### Verifying The Trustworthiness Of A Source
215
216If you can be sure that the input comes from a trustworthy source, it can be OK
217to parse/evaluate it at high privilege in an unsafe language. A "trustworthy
Adrian Taylor064694482020-05-12 00:12:58218source" means that Chromium can cryptographically prove that the data comes
219from a business entity that you can or do trust (e.g.
220for Chrome, an [Alphabet](https://abc.xyz) company).
221
222Such cryptographic proof can potentially be obtained by:
223
224 * Component Updater;
Carlos IL66614d482022-10-05 17:34:54225 * The variations framework.
Adrian Taylor064694482020-05-12 00:12:58226 * Pinned TLS (see below).
227
228Pinned TLS needs to meet all these criteria to be effective:
Chris Palmeraef94dd2019-01-18 00:34:15229
230 * communication happens via validly-authenticated TLS, HTTPS, or QUIC;
Chris Palmer57171f922019-03-08 22:42:28231 * the peer's keys are [pinned in Chrome](https://cs.chromium.org/chromium/src/net/http/transport_security_state_static.json?sq=package:chromium&g=0); and
Adrian Taylor064694482020-05-12 00:12:58232 * pinning is active on all platforms where the feature will launch.
Carlos IL66614d482022-10-05 17:34:54233 (Currently pinning is not enabled in iOS or Android WebView).
Adrian Taylor064694482020-05-12 00:12:58234
Carlos IL66614d482022-10-05 17:34:54235It is generally preferred to use Component Updater if possible because pinning
236may be disabled by locally installed root certificates.
Adrian Taylor064694482020-05-12 00:12:58237
238One common pattern is to deliver a cryptographic hash of some content via such
239a trustworthy channel, but deliver the content itself via an untrustworthy
240channel. So long as the hash is properly verified, that's fine.
Chris Palmeraef94dd2019-01-18 00:34:15241
Chris Palmer32301112019-02-06 00:02:56242### Normalization {#normalization}
Chris Palmeraef94dd2019-01-18 00:34:15243
244You can 'defang' a potentially-malicious input by transforming it into a
Chris Palmer42cd4012019-01-26 02:06:07245_normal_ or _minimal_ form, usually by first transforming it into a format with
Chris Palmerf4bff3f2019-02-05 19:51:55246a simpler grammar. We say that all data, file, and wire formats are defined by a
247_grammar_, even if that grammar is implicit or only partially-specified (as is
Chris Palmerf587d3f2021-11-03 00:37:47248so often the case). A data format with a particularly simple grammar is
249[`SkPixmap`](https://source.chromium.org/chromium/chromium/src/+/3df9ac8e76132c586e888d1ddc7d2217574f17b0:third_party/skia/include/core/SkPixmap.h;l=712).
250(The 'grammar' is represented by the private data fields: a region of raw pixel
251data, the size of that region, and simple metadata (`SkImageInfo`) about how to
252interpret the pixels.)
Chris Palmer80708032019-03-06 20:21:28253
254It's rare to find such a simple grammar for input formats, however.
Chris Palmer42cd4012019-01-26 02:06:07255
256For example, consider the PNG image format, which is complex and whose [C
257implementation has suffered from memory corruption bugs in the
Chris Palmeraef94dd2019-01-18 00:34:15258past](https://www.cvedetails.com/vulnerability-list/vendor_id-7294/Libpng.html).
Chris Palmer42cd4012019-01-26 02:06:07259An attacker could craft a malicious PNG to trigger such a bug. But if you
260transform the image into a format that doesn't have PNG's complexity (in a
261low-privilege process, of course), the malicious nature of the PNG 'should' be
262eliminated and then safe for parsing at a higher privilege level. Even if the
263attacker manages to compromise the low-privilege process with a malicious PNG,
264the high-privilege process will only parse the compromised process' output with
265a simple, plausibly-safe parser. If that parse is successful, the
266higher-privilege process can then optionally further transform it into a
267normalized, minimal form (such as to save space). Otherwise, the parse can fail
268safely, without memory corruption.
269
270The trick of this technique lies in finding a sufficiently-trivial grammar, and
271committing to its limitations.
Chris Palmeraef94dd2019-01-18 00:34:15272
Chris Palmer9bee1fc2019-04-03 20:04:40273Another good approach is to
Chris Palmerf4bff3f2019-02-05 19:51:55274
Chris Palmer9bee1fc2019-04-03 20:04:40275 1. define a new Mojo message type for the information you want;
276 2. extract that information from a complex input object in a sandboxed
277 process; and then
278 3. send the result to a higher-privileged process in a Mojo message using the
279 new message type.
280
281That way, the higher-privileged process need only process objects adhering to a
282well-defined, generally low-complexity grammar. This is a big part of why [we
283like for Mojo messages to use structured types](mojo.md#Use-structured-types).
284
285For example, it should be safe enough to convert a PNG to an `SkBitmap` in a
Chris Palmer80708032019-03-06 20:21:28286sandboxed process, and then send the `SkBitmap` to a higher-privileged process
287via IPC. Although there may be bugs in the IPC message deserialization code
288and/or in Skia's `SkBitmap` handling code, we consider this safe enough for a
289few reasons:
290
Chris Palmer57171f922019-03-08 22:42:28291 * we must accept the risk of bugs in Mojo deserialization; but thankfully
292 * Mojo deserialization is very amenable to fuzzing; and
Chris Palmer9bee1fc2019-04-03 20:04:40293 * it's a big improvement to scope bugs to smaller areas, like IPC
294 deserialization functions and very simple classes like `SkBitmap` and
295 `SkPixmap`.
296
297Ultimately this process results in parsing significantly simpler grammars. (PNG
298→ Mojo + `SkBitmap` in this case.)
Chris Palmer80708032019-03-06 20:21:28299
300> (We have to accept the risk of memory safety bugs in Mojo deserialization
301> because C++'s high performance is crucial in such a throughput- and
302> latency-sensitive area. If we could change this code to be both in a safer
303> language and still have such high performance, that'd be ideal. But that's
304> unlikely to happen soon.)
305
Alex Goughdf5ea3c2024-03-28 22:36:20306### Exception: Protobuf
307
Robert Sesek16cedb52020-10-19 22:43:09308While less preferable to Mojo, we also similarly trust Protobuf for
309deserializing messages at high privilege from potentially untrustworthy senders.
310For example, Protobufs are sometimes embedded in Mojo IPC messages. It is
311always preferable to use a Mojo message where possible, though sometimes
Alex Goughdf5ea3c2024-03-28 22:36:20312external constraints require the use of Protobuf.
313
314Protobuf's threat model does not include parsing a protobuf from shared
315memory. Always copy the proto buffer bytes from untrustworthy shared
316memory regions before deserializing to a Message.
317
318If you must pass protobuf bytes over mojo use
319[mojo_base::ProtoWrapper](https://chromium.googlesource.com/chromium/src/+/main/mojo/public/cpp/base/proto_wrapper.h)
320as this provides limited type safety for the top-level protobuf message and
321ensures copies are taken before deserializing.
322
323Note that this exception only applies to Protobuf as a container format;
324complex data contained within a Protobuf must be handled according to this
325rule as well.
326
327### Exception: RE2
Robert Sesek16cedb52020-10-19 22:43:09328
Matthew Riley27b0596222023-05-31 22:43:56329As another special case, we trust the
330[RE2](https://cs.chromium.org/chromium/src/third_party/re2/README.chromium)
331regular expression library to evaluate untrustworthy patterns over untrustworthy
332input strings, because its grammar is sufficiently limited and hostile input is
333part of the threat model against which it's been tested for years. It is **not**
334the case, however, that text matched by an RE2 regular expression is necessarily
335"sanitized" or "safe". That requires additional security judgment.
336
Dustin J. Mitchell038459f2025-02-19 15:11:22337## Safe Types and Abstractions
Robert Sesekf64a25f72021-02-26 00:23:24338
339As discussed above in [Normalization](#normalization), there are some types that
340are considered "safe," even though they are deserialized from an untrustworthy
341source, at high privilege, and in an unsafe language. These types are
342fundamental for passing data between processes using IPC, tend to have simpler
343grammar or structure, and/or have been audited or fuzzed heavily.
344
Chris Palmer19086732021-05-07 18:17:50345* `GURL` and `url::Origin`
danakjbb48339e752022-10-06 22:59:35346* `SkBitmap` (in [N32 format](https://source.chromium.org/chromium/chromium/src/+/main:third_party/skia/include/core/SkColorType.h;l=54-58;drc=8d399817282e3c12ed54eb23ec42a5e418298ec6) only)
347* `SkPixmap` (in [N32 format](https://source.chromium.org/chromium/chromium/src/+/main:third_party/skia/include/core/SkColorType.h;l=54-58;drc=8d399817282e3c12ed54eb23ec42a5e418298ec6) only)
Robert Sesekf64a25f72021-02-26 00:23:24348* Protocol buffers (see above; this is not a preferred option and should be
349 avoided where possible)
350
danakjbb48339e752022-10-06 22:59:35351There are also classes in `//base` that internally hold simple values that
Robert Sesekf64a25f72021-02-26 00:23:24352represent potentially complex data, such as:
353
354* `base::FilePath`
355* `base::Token` and `base::UnguessableToken`
356* `base::Time` and `base::TimeDelta`
357
358The deserialization of these is safe, though it is important to remember that
359the value itself is still untrustworthy (e.g. a malicious path trying to escape
360its parent using `../`).
361
Dustin J. Mitchell038459f2025-02-19 15:11:22362The JSON parser in `//base/json` is implemented in Rust and considered safe for
363use at high privilege with untrusted data.
364
Chris Palmeraef94dd2019-01-18 00:34:15365## Existing Code That Violates The Rule
366
Matthew Riley7cfc2702025-10-28 23:28:12367We know there is code in Chromium that violates the Rule of 2. For example, the
368networking process on Windows is written in C++ and handles plenty of
369untrustworthy data, yet it is not (at present) sandboxed by default. There is
370[ongoing work](https://bugs.chromium.org/p/chromium/issues/detail?id=841001) to
371change that.
372
373Our top priority is avoiding any *new* violations of the Rule of 2. We also try
374to keep track of existing violations and mitigate them over time: for example,
375some less-safe uses of JSON parsing in the privileged browser process were
376defanged when we swapped out our C++ JSON parser for one written in Rust.