Data Oriented Design
Data Oriented Design
Data-Oriented Design
Online release of Data-Oriented Design :
This is the free, online, reduced version. Some inessential chapters are
excluded from this version, but in the spirit of this being an education
resource, the essentials are present for anyone wanting to learn about data-
oriented design.
Expect some odd formatting and some broken images and listings as this is auto
generated and the Latex to html converters available are not perfect. If the
source code listing is broken, you should be able to find the referenced source
on github. If you like what you read here, consider purchasing the real paper
book from here, as not only will it look a lot better, but it will help keep
this version online for those who cannot afford to buy it. Please send any
feedback to [email protected]
Subsections
Data-oriented design has been around for decades in one form or another but was only
officially given a name by Noel Llopis in his September 2009 article[#!NoelDOD!#] of
the same name. Whether it is, or is not a programming paradigm is seen as contentious.
Many believe it can be used side by side with other programming paradigms such as
object-oriented, procedural, or functional programming. In one respect they are right,
data-oriented design can function alongside the other paradigms, but that does not
preclude it from being a way to approach programming in the large. Other programming
paradigms are known to function alongside each other to some extent as well. A Lisp
programmer knows that functional programming can coexist with object-oriented
programming and a C programmer is well aware that object-oriented programming can
coexist with procedural programming. We shall ignore these comments and claim data-
oriented design as another important tool; a tool just as capable of coexistence as the
rest. 1.1
The time was right in 2009. The hardware was ripe for a change in how to develop.
Potentially very fast computers were hindered by a hardware ignorant programming
paradigm. The way game programmers coded at the time made many engine
programmers weep. The times have changed. Many mobile and desktop solutions now
seem to need the data-oriented design approach less, not because the machines are better
at mitigating an ineffective approach, but the games being designed are less demanding
and less complex. The trend for mobile seems to be moving to AAA development, which
should bring the return of a need for managing complexity and getting the most out of
the hardware.
As we now live in a world where multi-core machines include the ones in our pockets,
learning how to develop software in a less serial manner is important. Moving away from
objects messaging and getting responses immediately is part of the benefits available to
the data-oriented programmer. Programming, with a firm reliance on awareness of the
data flow, sets you up to take the next step to GPGPU and other compute approaches.
This leads to handling the workloads that bring game titles to life. The need for data-
oriented design will only grow. It will grow because abstractions and serial thinking will
be the bottleneck of your competitors, and those that embrace the data-oriented
approach will thrive.
1 of 11 02/02/2020, 8:39 am
Data-Oriented Design http://www.dataorienteddesign.com/dodbook/node2.html#SECTION0...
Data is all we have. Data is what we need to transform in order to create a user
experience. Data is what we load when we open a document. Data is the graphics on the
screen, the pulses from the buttons on your gamepad, the cause of your speakers
producing waves in the air, the method by which you level up and how the bad guy knew
where you were so as to shoot at you. Data is how long the dynamite took to explode and
how many rings you dropped when you fell on the spikes. It is the current position and
velocity of every particle in the beautiful scene that ended the game which was loaded off
the disc and into your life via transformations by machinery driven by decoded
instructions themselves ordered by assemblers instructed by compilers fed with source-
code.
No application is anything without its data. Adobe Photoshop without the images is
nothing. It's nothing without the brushes, the layers, the pen pressure. Microsoft Word is
nothing without the characters, the fonts, the page breaks. FL Studio is worthless without
the events. Visual Studio is nothing without source. All the applications that have ever
been written, have been written to output data based on some input data. The form of
that data can be extremely complex, or so simple it requires no documentation at all, but
all applications produce and need data. If they don't need recognisable data, then they
are toys or tech demos at best.
Instructions are data too. Instructions take up memory, use up bandwidth, and can be
transformed, loaded, saved and constructed. It's natural for a developer to not think of
instructions as being data1.2, but there is very little differentiating them on older, less
protective hardware. Even though memory set aside for executables is protected from
harm and modification on most contemporary hardware, this relatively new invention is
still merely an invention, and the modified Harvard architecture relies on the same
memory for data as it does for instructions. Instructions are therefore still data, and they
are what we transform too. We take instructions and turn them into actions. The number,
size, and frequency of them is something that matters. The idea that we have control over
which instructions we use to solve problems leads us to optimisations. Applying our
knowledge of what the data is allows us to make decisions about how the data can be
treated. Knowing the outcome of instructions gives us the data to decide what
instructions are necessary, which are busywork, and which can be replaced with
equivalent but less costly alternatives.
This forms the basis of the argument for a data-oriented approach to development, but
leaves out one major element. All this data and the transforming of data, from strings, to
images, to instructions, they all have to run on something. Sometimes that thing is quite
abstract, such as a virtual machine running on unknown hardware. Sometimes that thing
is concrete, such as knowing which specific CPU and GPU you have, and the memory
capacity and bandwidth you have available. But in all cases, the data is not just data, but
data that exists on some hardware somewhere, and it has to be transformed by that same
hardware. In essence, data-oriented design is the practice of designing software by
developing transformations for well-formed data where the criteria for well-formed is
guided by the target hardware and the patterns and types of transforms that need to
operate on it. Sometimes the data isn't well defined, and sometimes the hardware is
equally evasive, but in most cases a good background of hardware appreciation can help
out almost every software project.
If the ultimate result of an application is data, and all input can be represented by data,
and it is recognised that all data transforms are not performed in a vacuum, then a
software development methodology can be founded on these principles; the principles of
understanding the data, and how to transform it given some knowledge of how a machine
will do what it needs to do with data of this quantity, frequency, and its statistical
qualities. Given this basis, we can build up a set of founding statements about what
makes a methodology data-oriented.
For some, it would seem that data-oriented design is the antithesis of most other
2 of 11 02/02/2020, 8:39 am
Data-Oriented Design http://www.dataorienteddesign.com/dodbook/node2.html#SECTION0...
programming paradigms because data-oriented design is a technique that does not
readily allow the problem domain to enter into the software as written in source. It does
not promote the concept of an object as a mapping to the context of the user in any way,
as data is intentionally and consistently without meaning. Abstraction heavy paradigms
try to pretend the computer and its data do not exist at every turn, abstracting away the
idea that there are bytes, or CPU pipelines, or other hardware features, and instead
bringing the model of the problem into the program. They regularly bring either the
model of the view into the code, or the model of the world as a context for the problem.
That is, they either structure the code around attributes of the expected solution, or they
structure the code around the description of the problem domain.
Meaning can be applied to data to create information. Meaning is not inherent in data.
When you say 4, it means very little, but say 4 miles, or 4 eggs, it means something.
When you have 3 numbers, they mean very little as a tuple, but when you name them
x,y,z, you can put meaning on them as a position. When you have a list of positions in a
game, they mean very little without context. Object-oriented design would likely have the
positions as part of an object, and by the class name and neighbouring data (also named)
you can get an idea of what that data means. Without the connected named
contextualising data, the positions could be interpreted in a number of different ways,
and though putting the numbers in context is good in some sense, it also blocks thinking
about the positions as just sets of three numbers, which can be important for thinking of
solutions to the real problems the programmers are trying to solve.
For an example of what can happen when you put data so deep inside an object that you
forget its impact, consider the numerous games released, and in production, where a 2D
or 3D grid system could have been used for the data layout, but for unknown reasons the
developers kept with the object paradigm for each entity on the map. This isn't a singular
event, and real shipping games have seen this object-centric approach commit crimes
against the hardware by having hundreds of objects placed in WorldSpace at grid
coordinates, rather than actually being driven by a grid. It's possible that programmers
look at a grid, and see the number of elements required to fulfil the request, and are
hesitant to the idea of allocating it in a single lump of memory. Consider a simple 256 by
256 tilemap requiring 65,536 tiles. An object-oriented programmer may think about
those sixty-five thousand objects as being quite expensive. It might make more sense for
them to allocate the objects for the tiles only when necessary, even to the point where
there literally are sixty-five thousand tiles created by hand in editor, but because they
were placed by hand, their necessity has been established, and they are now something to
be handled, rather than something potentially worrying.
Not only is this pervasive lack of an underlying form a poor way to handle rendering and
simple element placement, but it leads to much higher complexity when interpreting
locality of elements. Gaining access to elements on a grid-free representation often
requires jumping through hoops such as having neighbour links (which need to be kept
up to date), running through the entire list of elements (inherently costly), or references
to an auxiliary augmented grid object or spatial mapping system connecting to the
objects which are otherwise free to move, but won't, due to the design of the game. This
fake form of freedom introduced by the grid-free design presents issues with
understanding the data, and has been the cause of some significant performance
penalties in some titles. Thus also causing a significant waste of programmer mental
resources in all.
Other than not having grids where they make sense, many modern games also seem to
carry instances for each and every item in the game. An instance for each rather than a
variable storing the number of items. For some games this is an optimisation, as creation
and destruction of objects is a costly activity, but the trend is worrying, as these ways of
storing information about the world make the world impenetrable to simple
interrogation.
Many games seem to try to keep everything about the player in the player class. If the
player dies in-game, they have to hang around as a dead object, otherwise, they lose
3 of 11 02/02/2020, 8:39 am
Data-Oriented Design http://www.dataorienteddesign.com/dodbook/node2.html#SECTION0...
access to their achievement data. This linking of what the data is, to where it resides and
what it shares lifetime with, causes monolithic classes and hard to untangle relationships
which frequently turn out to be the cause of bugs. I will not name any of the games, but
it's not just one title, nor just one studio, but an epidemic of poor technical design that
seems to infect those who use off the shelf object-oriented engines more than those who
develop their own regardless of paradigm.
The data-oriented design approach doesn't build the real-world problem into the code.
This could be seen as a failing of the data-oriented approach by veteran object-oriented
developers, as examples of the success of object-oriented design come from being able to
bring the human concepts to the machine, then in this middle ground, a solution can be
written that is understandable by both human and computer. The data-oriented
approach gives up some of the human readability by leaving the problem domain in the
design document, bringing elements of constraints and expectations into the transforms,
but stops the machine from having to handle human concepts at any data level by just
that same action.
Let us consider how the problem domain becomes part of the software in programming
paradigms that promote needless abstraction. In the case of objects, we tie meanings to
data by associating them with their containing classes and their associated functions. In
high-level abstraction, we separate actions and data by high-level concepts, which might
not apply at the low level, thus reducing the likelihood the functions can be implemented
efficiently.
When a class owns some data, it gives that data a context which can sometimes limit the
ability to reuse the data or understand the impact of operations upon it. Adding functions
to a context can bring in further data, which quickly leads to classes containing many
different pieces of data that are unrelated in themselves, but need to be in the same class
because an operation required a context and the context required more data for other
reasons such as for other related operations. This sounds awfully familiar, and Joe
Armstrong is quoted to have said ``I think the lack of reusability comes in object-
oriented languages, not functional languages. Because the problem with object-oriented
languages is they've got all this implicit environment that they carry around with them.
You wanted a banana but what you got was a gorilla holding the banana and the entire
jungle."1.3 which certainly seems to resonate with the issue of contextual referencing that
seems to be plaguing the object-oriented languages.
You could be forgiven for believing that it's possible to remove the connections between
contexts by using interfaces or dependency injection, but the connections lie deeper than
that. The contexts in the objects are often connecting different classes of data about
different categories in which the object fits. Consider how this banana has many different
purposes, from being a fruit, to being a colour, to being a word beginning with the letter
B. We have to consider the problem presented by the idea of the banana as an instance,
as well as the banana being a class of entity too. If we need to gain information about
bananas from the point of view of the law on imported goods, or about its nutritional
value, it's going to be different from information about how many we are currently
stocking. We were lucky to start with the banana. If we talk about the gorilla, then we
have information about the individual gorilla, the gorillas in the zoo or jungle, and the
class of gorilla too. This is three different layers of abstraction about something which we
might give one name. At least with a banana, each individual doesn't have much in the
way of important data. We see this kind of contextual linkage all the time in the real
world, and we manage the complexity very well in conversation, but as soon as we start
putting these contexts down in hard terms we connect them together and make them
brittle.
All these mixed layers of abstraction become hard to untangle as functions which operate
over each context drag in random pieces of data from all over the classes meaning many
data items cannot be removed as they would then be inaccessible. This can be enough to
stop most programmers from attempting large-scale evolving software projects, but there
is another issue caused by hiding the actions applied to the data that leads to unnecessary
4 of 11 02/02/2020, 8:39 am
Data-Oriented Design http://www.dataorienteddesign.com/dodbook/node2.html#SECTION0...
complexity. When you see lists and trees, arrays and maps, tables and rows, you can
reason about them and their interactions and transformations. If you attempt to do the
same with homes and offices, roads and commuters, coffee shops and parks, you can
often get stuck in thinking about the problem domain concepts and not see the details
that would provide clues to a better data representation or a different algorithmic
approach.
There are very few computer science algorithms that cannot be reused on primitive data
types, but when you introduce new classes with their own internal layouts of data, that
don't follow clearly in the patterns of existing data-structures, then you won't be able to
fully utilise those algorithms, and might not even be able to see how they would apply.
Putting data structures inside your object designs might make sense from what they are,
but they often make little sense from the perspective of data manipulation.
When we consider the data from the data-oriented design point of view, data is mere
facts that can be interpreted in whatever way necessary to get the output data in the
format it needs to be. We only care about what transforms we do, and where the data
ends up. In practice, when you discard meanings from data, you also reduce the chance of
tangling the facts with their contexts, and thus you also reduce the likelihood of mixing
unrelated data just for the sake of an operation or two.
The second principle: Data is the type, frequency, quantity, shape, and probability.
The second statement is that data is not just the structure. A common misconception
about data-oriented design is that it's all about cache misses. Even if it was all about
making sure you never missed the cache, and it was all about structuring your classes so
the hot and cold data was split apart, it would be a generally useful addition to your
programming toolkit, but data-oriented design is about all aspects of the data. To write a
book on how to avoid cache misses, you need more than just some tips on how to
organise your structures, you need a grounding in what is really happening inside your
computer when it is running your program. Teaching that in a book is also impossible as
it would only apply to one generation of hardware, and one generation of programming
languages, however, data-oriented design is not rooted in just one language and just
some unusual hardware, even though the language to best benefit from it is C++, and the
hardware to benefit the approach the most is anything with unbalanced bottlenecks. The
schema of the data is important, but the values and how the data is transformed are as
important, if not more so. It is not enough to have some photographs of a cheetah to
determine how fast it can run. You need to see it in the wild and understand the true
costs of being slow.
The data-oriented design model is centred around data. It pivots on live data, real data,
data that is also information. Object-oriented design is centred around the problem
definition. Objects are not real things but abstract representations of the context in which
the problem will be solved. The objects manipulate the data needed to represent them
without any consideration for the hardware or the real-world data patterns or quantities.
This is why object-oriented design allows you to quickly build up first versions of
applications, allowing you to put the first version of the design document or problem
definition directly into the code, and make a quick attempt at a solution.
5 of 11 02/02/2020, 8:39 am
Data-Oriented Design http://www.dataorienteddesign.com/dodbook/node2.html#SECTION0...
Data-oriented design takes its cues from the data which is seen or expected. Instead of
planning for all eventualities, or planning to make things adaptable, there is a preference
for using the most probable input to direct the choice of algorithm. Instead of planning to
be extendable, it plans to be simple and replaceable, and get the job done. Extendable can
be added later, with the safety net of unit tests to ensure it remains working as it did
while it was simple. Luckily, there is a way to make your data layout extendable without
requiring much thought, by utilising techniques developed many years ago for working
with databases.
Database technology took a great turn for the positive when the relational model was
introduced. In the paper Out of the Tar Pit[#!TarPit!#], Functional Relational
Programming takes it a step further when it references the idea of using relational model
data-structures with functional transforms. These are well defined, and much literature
on how to adapt their form to match your requirements is available.
Designs change for multiple reasons, occasionally including times when they actually
haven't. A misunderstanding of a design, or a misinterpretation of a design, will cause as
much change in the implementation as a literal request for change of design. A data-
oriented approach to code design considers the change in design through the lens of
understanding the change in the meaning of the data. The data-oriented approach to
design also allows for change to the code when the source of data changes, unlike the
encapsulated internal state manipulations of the object-oriented approach. In general,
data-oriented design handles change better as pieces of data and transforms can be more
simply coupled and decoupled than objects can be mutated and reused.
The reason this is so, comes from linking the intention, or the aspect, to the data. When
lumping data and functions in with concepts of objects, you find the objects are the
schema of the data. The aspect of the data is linked to that object, which means it's hard
to think of the data from another point of view. The use case of the data, and the real-
world or design, are now linked to the data layout through a singular vision implied by
the object definition. If you link your data layout to the union of the required data for
6 of 11 02/02/2020, 8:39 am
Data-Oriented Design http://www.dataorienteddesign.com/dodbook/node2.html#SECTION0...
your expected manipulations, and your data manipulations are linked by aspects of your
data, then you make it hard to unlink data related by aspect. The difficulty comes when
different aspects need different subsets of the data, and they overlap. When they overlap,
they create a larger and larger set of values that need to travel around the system as one
unit. It's common to refactor a class out into two or more classes, or give ownership of
data to a different class. This is what is meant by tying data to an aspect. It is tied to the
lens through which the data has purpose, but with static typed objects that purpose is
predefined, a union of multiple purposes, and sometimes carries around defunct
relationships. Some purposes may no longer required by the design. Unfortunately, it's
easier to see when a relationship needs to exist, than when it doesn't, and that leads to
more connections, not fewer, over time.
If you link your operations by related data, such as when you put methods on a class, you
make it hard to unlink your operations when the data changes or splits, and you make it
hard to split data when an operation requires the data to be together for its own
purposes. If you keep your data in one place, operations in another place, and keep the
aspects and roles of data intrinsic from how the operations and transforms are applied to
the data, then you will find that many times when refactoring would have been large and
difficult in object-oriented code, the task now becomes trivial or non-existent. With this
benefit comes a cost of keeping tabs on what data is required for each operation, and the
potential danger of de-synchronisation. This consideration can lead to keeping some cold
code in an object-oriented style where objects are responsible for maintaining internal
consistency over efficiency and mutability. Examples of places where object-oriented
design is far superior to data-oriented can be that of driver layers for systems or
hardware. Even though Vulkan and OpenGL are object-oriented, the granularity of the
objects is large and linked to stable concepts in their space, just like the object-oriented
approach of the FILE type or handle, in open, close, read, and write operations in
filesystems.
A big misunderstanding for many new to the data-oriented design paradigm, a concept
brought over from abstraction based development, is that we can design a static library or
set of templates to provide generic solutions to everything presented in this book as a
data-oriented solution. Much like with domain driven design, data-oriented design is
product and work-flow specific. You learn how to do data-oriented design, not how to
add it to your project. The fundamental truth is that data, though it can be generic by
type, is not generic in how it is used. The values are different and often contain patterns
we can turn to our advantage. The idea that data can be generic is a false claim that data-
oriented design attempts to rectify. The transforms applied to data can be generic to
some extent, but the order and selection of operations are literally the solution to the
problem. Source code is the recipe for conversion of data from one form into another.
There cannot be a library of templates for understanding and leveraging patterns in the
data, and that's what drives a successful data-oriented design. It's true we can build
algorithms to find patterns in data, otherwise, how would it be possible to do
compression, but the patterns we think about when it comes to data-oriented design are
higher level, domain-specific, and not simple frequency mappings.
Our run-time benefits from specialisation through performance tricks that sometimes
make the code harder to read, but it is frequently discouraged as being not object-
oriented, or being too hard-coded. It can be better to hard-code a transform than to
pretend it's not hard-coded by wrapping it in a generic container and using less direct
algorithms on it. Using existing templates like this provides a benefit of an increase in
readability for those who already know the library, and potentially fewer bugs if the
functionality was in some way generic. But, if the functionality was not well mapped to
the existing generic solution, writing it with a function template and then extending will
make the code harder to understand. Hiding the fact that the technique had been
changed subtly will introduced false assumptions. Hard-coding a new algorithm is a
better choice as long as it has sufficient tests, and is objectively new. Tests will also be
easier to write if you constrain yourself to the facts about concrete data and only test with
real, but simple data for your problem, and not generic types on generic data.
7 of 11 02/02/2020, 8:39 am
Data-Oriented Design http://www.dataorienteddesign.com/dodbook/node2.html#SECTION0...
The games we write have a lot of data, in a lot of different formats. We have textures in
multiple formats for multiple platforms. There are animations, usually optimised for
different skeletons or types of playback. There are sounds, lights, and scripts. Don't forget
meshes, they consist of multiple buffers of attributes. Only a very small proportion of
meshes are old fixed function type with vertices containing positions, UVs, and normals.
The data in game development is hard to box, and getting harder to pin down as more
ideas which were previously considered impossible have now become commonplace. This
is why we spend a lot of time working on editors and tool-chains, so we can take the free-
form output from designers and artists and find a way to put it into our engines. Without
our tool-chains, editors, viewers, and tweaking tools, there would be no way we could
produce a game with the time we have. The object-oriented approach provides a good
way to wrap our heads around all these different formats of data. It gives a centralised
view of where each type of data belongs and classifies it by what can be done to it. This
makes it very easy to add and use data quickly, but implementing all these different
wrapper objects takes time. Adding new functionality to these objects can sometimes
require large amounts of refactoring as occasionally objects are classified in such a way
that they don't allow for new features to exist. For example, in many old engines, textures
were always 1,2, or 4 bytes per pixel. With the advent of floating point textures, all that
code required a minor refactoring. In the past, it was not possible to read a texture from
the vertex shader, so when texture based skinning came along, many engine
programmers had to refactor their render update. They had to allow for a vertex shader
texture upload because it might be necessary when uploading transforms for rendering a
skinned mesh. When the PlayStation2 came along, or an engine first used shaders, the
very idea of what made a material had to change. In the move from small 3D
environments to large open worlds with level of detail caused many engineers to start
thinking about what it meant for something to need rendering. When newer hardware
became more picky about alignment, other hard to inject changes had to be made. In
many engines, mesh data is optimised for rendering, but when you have to do mesh ray
casting to see where bullets have hit, or for doing IK, or physics, then you need multiple
representations of an entity. At this point, the object-oriented approach starts to look
cobbled together as there are fewer objects that represent real things, and more objects
used as containers so programmers can think in larger building blocks. These blocks
hinder though, as they become the only blocks used in thought, and stop potential mental
connections from happening. We went from 2D sprites to 3D meshes, following the
format of the hardware provider, to custom data streams and compute units turning the
streams into rendered triangles. Wave data, to banks, to envelope controlled grain tables
and slews of layered sounds. Tilemaps, to portals and rooms, to streamed, multiple levels
of detail chunks of world, to hybrid mesh palette, props, and unique stitching assets.
From flip-book to Euler angle sequences, to quaternions and spherical interpolated
animations, to animation trees and behaviour mapping/trees. Change is the only
constant.
All these types of data are pretty common if you've worked in games at all, and many
engines do provide an abstraction to these more fundamental types. When a new type of
data becomes heavily used it is promoted into engines as a core type. We normally
consider the trade-off of new types being handled as special cases until they become
ubiquitous to be one of usability vs performance. We don't want to provide free access to
the lesser understood elements of game development. People who are not, or can not,
invest time in finding out how best to use new features, are discouraged from using them.
The object-oriented game development way to do that is to not provide objects which
represent them, and instead only offer the features to people who know how to utilise the
more advanced tools.
Apart from the objects representing digital assets, there are also objects for internal game
logic. For every game, there are objects which only exist to further the game-play.
Collectable card games have a lot of textures, but they also have a great deal of rules, card
stats, player decks, match records, with many objects to represent the current state of
play. All of these objects are completely custom designed for one game. There may be
sequels, but unless it's primarily a re-skin, it will use quite different game logic in many
places, and therefore require different data, which would imply different methods on the
8 of 11 02/02/2020, 8:39 am
Data-Oriented Design http://www.dataorienteddesign.com/dodbook/node2.html#SECTION0...
now guaranteed to be internally different objects.
Game data is complex. Any first layout of the data is inspired by the game's initial design.
Once development is underway, the layout needs to keep up with whichever way the
game evolves. Object-oriented techniques offer a quick way to implement any given
design, are very quick at implementing each singular design in turn, but don't offer a
clean or graceful way to migrate from one data schema to the next. There are hacks, such
as those used in version based asset handlers, or in frameworks backed by update
systems and conversion scripts, but normally, game developers change the tool-chain and
the engine at the same time, do a full re-export of all the assets, then commit to the next
version all in one go. This can be quite a painful experience if it has to happen over
multiple sites at the same time, or if you have a lot of assets, or if you are trying to
provide engine support for more than one title, and only one wants to change to the new
revision. An example of an object-oriented approach that handles migration of design
with some grace is the Django framework, but the reason it handles the migration well is
that the objects would appear to be views into data models, not the data itself.
There have not yet been any successful efforts to build a generic game asset solution. This
may be because all games differ in so many subtle ways that if you did provide a generic
solution, it wouldn't be a game solution, just a new language. There is no solution to be
found in trying to provide all the possible types of object a game can use. But, there is a
solution if we go back to thinking about a game as merely running a set of computations
on some data. The closest we can get in 2018 is the FBX format, with some dependence
on the current standard shader languages. The current solutions appear to have excess
baggage which does not seem easy to remove. Due to the need to be generic, many details
are lost through abstractions and strategies to present data in a non-confrontational way.
Game developers are notorious for thinking about game development from either a low
level all out performance perspective or from a very high-level gameplay and interaction
perspective. This may have come about because of the widening gap between the amount
of code that has to be high performance, and the amount of code to make the game
complete. Object-oriented techniques provide good coverage of the high-level aspect, so
the high-level programmers are content with their tools. The performance specialists
have been finding ways of doing more with the hardware, so much so that a lot of the
time content creators think they don't have a part in the optimisation process. There has
never been much of a middle ground in game development, which is probably the
primary reason why the structure and performance techniques employed by big-iron
companies didn't seem useful. The secondary reason could be that game developers don't
normally develop systems and applications which have decade-long maintenance
expectations1.4 and therefore are less likely to be concerned about why their code should
be encapsulated and protected or at least well documented. When game development was
first flourishing into larger studios in the late 1990's, academic or corporate software
engineering practices were seen as suspicious because wherever they were employed,
there was a dramatic drop in game performance, and whenever any prospective
employees came from those industries, they failed to impress. As games machines
became more like the standard micro-computers, and standard micro-computers drew
closer in design to the mainframes of old, the more apparent it became that some of those
standard professional software engineering practices could be useful. Now the scale of
games has grown to match the hardware, but the games industry has stopped looking at
where those non-game development practices led. As an industry, we should be looking
to where others have gone before us, and the closest set of academic and professional
development techniques seem to be grounded in simulation and high volume data
analysis. We still have industry-specific challenges such as the problems of high
frequency highly heterogeneous transformational requirements that we experience in
sufficiently voluminous AI environments, and we have the issue of user proximity in
networked environments, such as the problems faced by MMOs when they have location-
based events, and bandwidth starts to hit issues as everyone is trying to message
everyone else.
With each successive generation, the number of developer hours to create a game has
9 of 11 02/02/2020, 8:39 am
Data-Oriented Design http://www.dataorienteddesign.com/dodbook/node2.html#SECTION0...
grown, which is why project management and software engineering practices have
become standardised at the larger games companies. There was a time when game
developers were seen as cutting-edge programmers, inventing new technology as the
need arises, but with the advent of less adventurous hardware (most notably in the x86
based recent 8thgenerations), there has been a shift away from ingenious coding
practices, and towards a standardised process. This means game development can be
tuned to ensure the release date will coincide with marketing dates. There will always be
an element of randomness in high profile game development. There will always be an
element of innovation that virtually guarantees you will not be able to predict how long
the project, or at least one part of the project, will take. Even if data-oriented design isn't
needed to make your game go faster, it can be used to make your game development
schedule more regular.
Part of the difficulty in adding new and innovative features to a game is the data layout. If
you need to change the data layout for a game, it will need objects to be redesigned or
extended in order to work within the existing framework. If there is no new data, then a
feature might require that previously separate systems suddenly be able to talk to each
other quite intimately. This coupling can often cause system-wide confusion with
additional temporal coupling and corner cases so obscure they can only be reproduced
one time in a million. These odds might sound fine to some developers, but if you're
expecting to sell five to fifty million copies of your game, at one in a million, that's five to
fifty people who will experience the problem, can take a video of your game behaving
oddly, post it on the YouTube, and call your company rubbish, or your developers lazy,
because they hadn't fixed an obvious bug. Worse, what if the one in a million issue was a
way to circumvent in-app-purchases, and was reproducible if you knew what to do and
the steps start spreading on Twitter, or maybe created an economy-destroying influx of
resources in a live MMO universe1.5. In the past, if you had sold five to fifty million copies
of your game, you wouldn't care, but with the advent of free-to-play games, five million
players might be considered a good start, and poor reviews coming in will curb the
growth. IAP circumventions will kill your income, and economy destruction will end you.
Big iron developers had these same concerns back in the 1970's. Their software had to be
built to high standards because their programs would frequently be working on data
concerned with real money transactions. They needed to write business logic that
operated on the data, but most important of all, they had to make sure the data was
updated through a provably careful set of operations in order to maintain its integrity.
Database technology grew from the need to process stored data, to do complex analysis
on it, to store and update it, and be able to guarantee it was valid at all times. To do this,
the ACID test was used to ensure atomicity, consistency, isolation, and durability.
Atomicity was the test to ensure all transactions would either complete or do nothing. It
could be very bad for a database to update only one account in a financial transaction.
There could be money lost or created if a transaction was not atomic. Consistency was
added to ensure all the resultant state changes which should happen during a transaction
do happen, that is, all triggers which should fire, do fire, even if the triggers cause triggers
recursively, with no limit. This would be highly important if an account should be
blocked after it has triggered a form of fraud detection. If a trigger has not fired, then the
company using the database could risk being liable for even more than if they had
stopped the account when they first detected fraud. Isolation is concerned with ensuring
all transactions which occur cannot cause any other transactions to differ in behaviour.
Normally this means that if two transactions appear to work on the same data, they have
to queue up and not try to operate at the same time. Although this is generally good, it
does cause concurrency problems. Finally, durability. This was the second most
important element of the four, as it has always been important to ensure that once a
transaction has completed, it remains so. In database terminology, durability meant the
transaction would be guaranteed to have been stored in such a way that it would survive
server crashes or power outages. This was important for networked computers where it
would be important to know what transactions had definitely happened when a server
crashed or a connection dropped.
10 of 11 02/02/2020, 8:39 am
Data-Oriented Design http://www.dataorienteddesign.com/dodbook/node2.html#SECTION0...
Modern networked games also have to worry about highly important data like this. With
non-free downloadable content, consumers care about consistency. With consumable
downloadable content, users care a great deal about every transaction. To provide much
of the functionality required of the database ACID test, game developers have gone back
to looking at how databases were designed to cope with these strict requirements and
found reference to staged commits, idempotent functions, techniques for concurrent
development, and a vast literature base on how to design tables for a database.
We've talked about data-oriented design being a way to think about and lay out your data
and to make decisions about your architecture. We have two principles that can drive
many of the decisions we need to make when doing data-oriented design. To finish the
chapter, there are some takeaways you can use immediately to begin your journey.
Consider how your data is being influenced by what it's called. Consider the possibility
that the proximity of other data can influence the meaning of your data, and in doing so,
trap it in a model that inhibits flexibility. For the consideration of the first principle, data
is not the problem domain, it's worth thinking about the following items.
[noitemsep,nolistsep]
You are not targeting an unknown device with unknowable characteristics. Know your
data, and know your target hardware. To some extent, understand how much each
stream of data matters, and who is consuming it. Understand the cost and potential value
of improvements. Access patterns matter, as you cannot hit the cache if you're accessing
things in a burst, then not touching them again for a whole cycle of the application. For
the consideration of the second principle, data is the type, frequency, quantity, shape,
and probability, it's worth thinking about the following items.
[noitemsep,nolistsep]
11 of 11 02/02/2020, 8:39 am