0% found this document useful (0 votes)

52 views10 pages

Modern Data Architecture: Bywhinmon

Modern Data Architecture

Uploaded by

Teszt ELek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views10 pages

Modern Data Architecture: Bywhinmon

Modern Data Architecture

Uploaded by

Teszt ELek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Modern Data Architecture

By W H Inmon
OCTOBER 2015

© Copyright 2015 Forest Rim Technology, all rights reserved

Forest Rim Technology, LLC

Modern Data Architecture

DATA WAREHOUSE
Data warehouse is an established concept and discipline that is discussed in books, conferences
and seminars. Indeed data warehouses are a standard feature of modern corporations.
Corporations use data warehouses to make business decisions every day. In a word, the data
warehouse represents “conventional wisdom” and is a standard part of the corporate
infrastructure.

ENTER BIG DATA

Into this world comes a new technology – Big Data. In some ways Big Data competes (or thinks
that it competes) with data warehousing. Indeed there are some similarities between Big Data
and a data warehouse. Big Data and data warehouses both hold data electronically. Big Data
and data warehouses both hold lots of data. Big Data and data warehouses both hold data that
can be used for decision making. So it is natural for vendors of Big Data to proclaim that with
Big Data you don’t need a data warehouse. At least that is the impression that many Big Data
vendors seem to give.

DO YOU NEED A DATA WAREHOUSE WHEN YOU HAVE BIG

DATA?
But is that impression correct? Is it true that with Big Data you don’t need a data warehouse?
This paper will explore this issue.

First off, what is a data warehouse? From the beginning, the accepted definition of a data
warehouse is a collection of data that is:
Subject oriented
Integrated
Time variant
Non-volatile
set of data in support of management’s decisions. This definition of data warehouse is widely
quoted as the definition of what is a data warehouse. (See BUILDING THE DATA WAREHOUSE,
John Wiley, originally published 1991.)

The definition of Big Data is not quite as clear. Indeed there are different interpretations as to
what is meant by “Big Data”. But for the purposes of this paper the following definition of Big
Data will be used.

Big data:
Encompasses very large volumes of data
Is stored on affordable storage
Is stored in an unstructured manner
Is managed using the “Roman census” technique.

Forest Rim Technology, LLC 1

Modern Data Architecture

(For an in depth discussion of this definition, refer to the book BIG DATA – A PRIMER FOR THE
DATA SCIENTIST, Elsevier Publishing 2014.)

AN ARCHITECTURE
Data warehouse is an architecture. Data warehouse requires a discipline to build and store. A
data warehouse can be stored on a variety of media. The essence of a data warehouse is
integrity of data. Another way of thinking of a data warehouse is that a data warehouse is a
single version of the truth. The data that enters a data warehouse is carefully crafted and
vetted. The data found in a data warehouse is data that is used for the most basic decisions the
corporation makes.

Traditionally the data entering a data warehouse is integrated by means of technology called
“ETL” (extract/transform/load). Data typically starts off in an application and is recast into a
singular, integrated corporate format when it is placed inside a data warehouse.

The essence of a data warehouse is an architecture of integrity of data.

A TECHNOLOGY
Big Data is a technology. Big Data is capable of storing a large amount of data. Big Data is a
physical media. In Big Data, there are storage mechanisms that cause data to be written and
then sought when desired.

There is a fundamental difference between a technology and an architecture. Analogically

speaking, an architecture is like the time of day and a technology is like the clock that keeps the
time of day. While the time of day is certainly related to the clock that keeps the time of day,
there is nevertheless a fundamental difference between an architecture and a technology. The
fact that it is 2:36 pm in Orange County, California is quite different than the fact that the time
is kept on a Rolex or a Seiko watch. A Rolex in Orange County can show 4:13 pm when it is
actually 2:36 pm. Just because a Rolex is a fine timepiece does not mean it has the right time.
And it is still 2:36 pm in Orange County regardless of whether a wristwatch agrees or not. So it
is seen that there is a difference between the actual time and the clocks that keep the time.

The time of day is the time of day regardless of what a Rolex says, and one Rolex may show one
time and another Rolex may have another time.

There is the same difference between an architecture and a technology. You can put a data
warehouse on Big Data or on standard storage technology. It is still a data warehouse wherever
it is located. Or you can put any data that is not a data warehouse in Big Data or on storage
technology.

There is no competition between Big Data and a data warehouse. They are entirely different
things.

2 Forest Rim Technology, LLC

Modern Data Architecture

HARMONIOUS COEXISTENCE
Despite the confusion that is sown by the vendors of Big Data, there is a need to understand
how Big Data and a data warehouse can coexist. There is a need from an architectural
standpoint to have a “big picture” that outlines how Big Data and a data warehouse can coexist
and work in harmony and in a constructive manner.

That architectural rendition is seen in this figure:

REPETITIVE/NON-REPETITIVE DATA
The figure – general architecture – shows lots of major architectural features. The first major
architectural feature shown is that Big Data is divided into two major subdivisions – repetitive
occurrences of data and non-repetitive occurrences of data.

Repetitive occurrences of data consist of data where the same structure of data is repeated
many times. There are many different examples of repetitive data. Typical repetitive data
consists of log tape records, telephone call record detail records, click stream data, metering
data, meteorological data, and so forth. In repetitive data, the same structure of data occurs
over and over again. In many cases repetitive data is machine-written data or is produced by
analog processing.

Forest Rim Technology, LLC 3

Modern Data Architecture

Non-repetitive data also has many examples. Some examples of non-repetitive data include
email, call center conversations, survey comments, help desk conversations, warranty claim
data, and so forth. In non-repetitive data it is only an accident if the same data or the same
structure of data ever occurs twice. In almost every case, non-repetitive data is textual-based
data that was generated by the written or the spoken word.

THE “GREAT DIVIDE”

The difference between repetitive data and non-repetitive data in Big Data has been called the
“Great Divide”. There is a TREMENDOUS amount of difference between the ways that these
two types of data need to be handled. The storage of the data, the writing of the data, and the
reading of the data all require a very different approach and very different technology.

DATA MODELING
One of the interesting differences between repetitive data and non-repetitive data is in terms
of how the data is modeled. Repetitive data is typically modeled by an ERD (entity relationship
diagram) data model. Non-repetitive data is modeled in an entirely different manner by the
usage of taxonomies and ontologies.

With an ERD the designer is free to change the data to fit the model. But with taxonomies and
ontologies, the base data NEVER changes. As a consequence, if there is a need to make
changes, it is the taxonomy or ontology that changes, not the base data.

Both types of data models can be (and usually should be) built generically. There is very little
difference between the models built for an industry. As a consequence, generic models – at
least as a starting point – are strongly advised.

TEXTUAL DISAMBIGUATION
The typical path for non-repetitive data to be handled and managed is through the passage of
the non-repetitive data through technology known as “textual disambiguation”. Non-repetitive
data is read and reformatted and – more importantly – contextualized. In order to make any
sense out of the non-repetitive data, it must have the context of the data established. The job
of textual disambiguation is to derive and identify the context of non-repetitive data. In many
cases the context of the non-repetitive data is MORE important than the data itself. In any case,
non-repetitive data cannot be used for decision making until the context has been established.

CONTEXT ENRICHED BIG DATA

Once the context of non-repetitive data has been established, the data can be transported to
one of two places. The contextualized data can be sent to the analytical environment and
combined with other data warehouse data, or the contextualized data can be sent back to the
Big Data environment. If the contextualized data is sent back to the Big Data environment it is
sent back as “context enriched” data.

4 Forest Rim Technology, LLC

Modern Data Architecture

It is possible that there is so much contextualized data that it cannot be sent to the data
warehouse environment because of the sheer volume of data. If, however, the contextualized
data is sent to the classic data warehouse, the processing that takes place on it can be done
with standard analytical tools such as Tableau, Qlik, Business Objects, SAS, Excel, and so forth.

TWO KINDS OF DATA IN THE DATA WAREHOUSE

Note that if contextualized data is sent to the data warehouse, it is stored in a special place in
the data warehouse. The data warehouse ends up having two kinds of data inside of it – data
whose source is traditional transaction-based structured data, and data whose source is
unstructured data. Even though the data warehouse is housed in a classic relational structure,
the sources of data in the data warehouse are drastically different. For that reason then, the
classic data warehouse ends up having two distinct types of data – transaction-based,
structured data and unstructured, contextualized data.

One of the really nice things about the two types of data in the data warehouse is that because
all the data arrives in a structured relational format, the data can be freely mixed and matched,
and joins and analysis across the different data types can be done.

A NEW TYPE OF ANALYTICAL PROCESSING

The ability to join the different types of data gives rise to analytical processing that heretofore
could not be done. Previously structured relational data could not be analytically mixed and
matched with unstructured textual data. But with the advent of contextualization, these types
of analysis can be done and are natural and easy to do.

REPETITIVE DATA/DATA WAREHOUSE INTERFACE

There is another type of data found in the Big Data environment and that data is the repetitive
type of data. Repetitive data does not need to be passed through textual disambiguation
because repetitive data is not textual-based. But repetitive data can be placed in a data
warehouse if so desired. There are two basic ways that repetitive data is passed into a data
warehouse. One way is through filtering. In filtering, repetitive data is read and then after the
data has been selected, the data is sent to the data warehouse. For example, the analyst may
wish to find all telephone call detail records for St Louis, MO for Sept 22, 2015 and have those
records sent to the data warehouse. Once the records are stored in the data warehouse, they
are subject to further analysis and scrutiny.

Filtering then is merely the reading and selection of records that are then sent to the data
warehouse.

The second kind of processing is distillation. Distillation is similar to filtering except distillation
requires that further processing be done before the records are sent to the data warehouse. A
simple example of distillation might be the counting of records that have been selected. For

Forest Rim Technology, LLC 5

Modern Data Architecture

example, the distillation process may simply count the number of sales of items greater than
$10.00 for each Wal-Mart store for the month of September 2015.

The result of both the distillation and filtering of Big Data is placed in the data warehouse.
Usually the results are placed in a separate part of the data warehouse since the basis of the
data found in the data warehouse is not structured, transaction-based data.

It is noted that the process of filtering and distillation of repetitive data can become quite
involved. Usually the complications come in the form of handling the volume of data that is
needed for analysis. In some cases, there is an enormous amount of data that must be
processed. In other cases, the characteristics of the data being sought are not clearly defined
and are ambiguous.

ARCHIVAL DATA TO BIG DATA

There is another flow of data that occurs between the data warehouse and Big Data. That flow
occurs when it is time to archive data off of the data warehouse. Big Data – repetitive data – is
used to house the archival of data found in the data warehouse that has aged.

DOING ANALYTICS
Analytics can be done all over the landscape. Classic analytical processing of transaction-based
data is done in the data warehouse as it has always been done. Nothing has changed there.
But now analytics on contextualized data can be done, and that form of analytics is new and
novel. Most organizations have not been able to base decision making on unstructured textual
data before. And there is a new form of analytics that is possible in the data warehouse, which
is the possibility of blended analytics. Blended analytics is analytics done using a blend of
structured transactional data and unstructured contextualized data.

But there are many other forms of analytics that are possible as well. There is the possibility of
doing analytics inside the repetitive data Big Data environment. This is where NoSQL analytical
processing is a possibility. And another form of analytics is analytics of the context-enriched Big
Data. A certain portion of the Big Data environment is context-enriched data which can produce
its own analytical results as well.

Each of these different forms of analytical processing produces is own unique results.

DATA MARTS AND THE DIMENSIONAL MODEL

One of the interesting questions in the data architecture that has been presented is what has
happened to data marts? Data marts are still part of the architectural landscape. End users still
need their own individual renditions of their own data. There still is a need for dimensional
technology. Data marts are still fed and fueled by the raw data that resides in the data
warehouse.

6 Forest Rim Technology, LLC

Modern Data Architecture

WHAT ABOUT MODELING?

The basis of design for the data warehouse remains the data model and the relational
structure. The basis for the design of the data mart environment is still the dimensional model.
The basis of understanding non-repetitive data is the taxonomy and the ontology. And the basis
of design and management of repetitive data is the schema for the occurrences of repetitive
records.

Interestingly, the data model, the dimensional model, the taxonomy and the ontology are all
very related but still different. They are like multiple blood-related siblings in a family. If you
take a look at a group of siblings, you see that they are all either boys or girls, they all have
similar skin color, you see that they have similar noses and mouths and eyes. And at the same
time there are individual differences that each sibling has. They are all clearly from the same
family and at the same time they are all still unique individuals.

THE SYSTEM OF RECORD

A related and interesting question is whether a system of record can be defined across the
architecture that has been described. The answer is yes – it is absolutely possible to define and
administer a system of record across the architecture. However, the tools that are needed and
the administrative techniques are still to be discovered.

But then the more important question arises – can we achieve integrity of data across the
architectural landscape? The answer is a resounding yes. By using a consistent modeling
strategy across all types of data, you can establish the foundation for data integrity.

THE REMAINING ISSUES

The net result of the architecture that has been described is that the architecture can handle an
unlimited amount of data. Big Data and the data warehouse work together in a
complementary, cooperative manner. Different kinds of analysis can be done in different
places. There still remain the issues that have faced the data administrators for ages:
How can I understand my data?
How can I manage my data?
How can I ensure the integrity of my data?
How can I manage the budget needed for my data?
What technology do I need for doing all of this?

References:
BUILDING THE DATA WAREHOUSE, John Wiley, NY, NY – the original book on data warehousing
DATA ARCHITECTURE – A PRIMER FOR THE DATA SCIENTIST, Elsevier Press, 2014 – a complete
description of data architecture
THE DATA WAREHOUSE TOOLKIT, John Wiley – a guide to dimensional modeling and the building of data
marts

Forest Rim Technology, LLC 7

Modern Data Architecture

About the author:

William Inmon of Castle Rock, Colorado, is the father of data warehouse and the developer of textual
disambiguation at Forest Rim Technology. Bill has written 56 books translated into 9 languages. Bill was
named as one of the ten most influential people in the history of computing by ComputerWorld in 2007.

8 Forest Rim Technology, LLC

Fil Ntra:l L6lna:ro: of :conoa: Ouia:tion, Llfi
No ratings yet
Fil Ntra:l L6lna:ro: of :conoa: Ouia:tion, Llfi
1 page
Participatory Action Research Theory and Methods for Engaged Inquiry 2nd Edition Jacques M. Chevalier - Download the ebook now and own the full detailed content
100% (1)
Participatory Action Research Theory and Methods for Engaged Inquiry 2nd Edition Jacques M. Chevalier - Download the ebook now and own the full detailed content
59 pages
(Original PDF) How to Think Straight About Psychology 11th Editioninstant download
100% (3)
(Original PDF) How to Think Straight About Psychology 11th Editioninstant download
56 pages
Sarkaria commission
No ratings yet
Sarkaria commission
3 pages
Ch4_ Breakthrough Advertising Mastery
No ratings yet
Ch4_ Breakthrough Advertising Mastery
7 pages
An Overview of Snowflake Apache Iceberg Tables by Augusto Kiniama Rosa Snowflake Feb, 2024 Medium
No ratings yet
An Overview of Snowflake Apache Iceberg Tables by Augusto Kiniama Rosa Snowflake Feb, 2024 Medium
20 pages
El Pretérito (Reference Sheet)
No ratings yet
El Pretérito (Reference Sheet)
2 pages
Study Material Bsa
No ratings yet
Study Material Bsa
49 pages
12 Notification 2025
No ratings yet
12 Notification 2025
2 pages
L3 - Data Models
No ratings yet
L3 - Data Models
13 pages
The Roles of Alien Spirits in Indig
No ratings yet
The Roles of Alien Spirits in Indig
4 pages
M1
No ratings yet
M1
8 pages
Apache Iceberg - Java and Python APIs
No ratings yet
Apache Iceberg - Java and Python APIs
9 pages
Test 2 Group B
No ratings yet
Test 2 Group B
3 pages
The Delta Lake Series Lakehouse 012921
No ratings yet
The Delta Lake Series Lakehouse 012921
19 pages
ProMoTe A Data Product Model Template For Data Meshes
No ratings yet
ProMoTe A Data Product Model Template For Data Meshes
18 pages
The Complete Buyers Guide To A Semantic Layer
No ratings yet
The Complete Buyers Guide To A Semantic Layer
17 pages
Language Focus - Theory
No ratings yet
Language Focus - Theory
2 pages
Instant Access To Data Lake Architecture Designing The Data Lake and Avoiding The Garbage Dump First Edition Bill Inmon Ebook Full Chapters
100% (5)
Instant Access To Data Lake Architecture Designing The Data Lake and Avoiding The Garbage Dump First Edition Bill Inmon Ebook Full Chapters
62 pages
Apache Iceberg - Additional Real World Use Cases
No ratings yet
Apache Iceberg - Additional Real World Use Cases
25 pages
Open Source Data Engineering Landscape 2024 by Alireza Sadeghi Feb, 2024 Medium
No ratings yet
Open Source Data Engineering Landscape 2024 by Alireza Sadeghi Feb, 2024 Medium
25 pages
Cloudera Kudu
100% (1)
Cloudera Kudu
102 pages
AtScale Technical Overview
No ratings yet
AtScale Technical Overview
18 pages
Jeffrey Oafericua A. BSED-Filipino: Field Study 6
No ratings yet
Jeffrey Oafericua A. BSED-Filipino: Field Study 6
106 pages
B A-Prog
No ratings yet
B A-Prog
80 pages
Data Lakehouse
No ratings yet
Data Lakehouse
7 pages
Spring Idioms 2020
No ratings yet
Spring Idioms 2020
8 pages
The Benefits of Delta Lake and Lakehouse Architecture
No ratings yet
The Benefits of Delta Lake and Lakehouse Architecture
3 pages
1202990.an Overview of Current Data Lake Architecture Models
No ratings yet
1202990.an Overview of Current Data Lake Architecture Models
6 pages
Dragonstar Starfarers Handbook (10009267)
67% (3)
Dragonstar Starfarers Handbook (10009267)
177 pages
Kudu
No ratings yet
Kudu
9 pages
T N P D - : He Success and Failure of EW Roduct Evelopment A Study With Focus On The Early Phases
No ratings yet
T N P D - : He Success and Failure of EW Roduct Evelopment A Study With Focus On The Early Phases
24 pages
Surface Area Volume
No ratings yet
Surface Area Volume
19 pages
Eutopia Ensemble: Call For Scores "The Smell of Protest"
No ratings yet
Eutopia Ensemble: Call For Scores "The Smell of Protest"
6 pages
Fabric
100% (1)
Fabric
46 pages
WHP English10 1ST Q
No ratings yet
WHP English10 1ST Q
8 pages
Introduction to Data Platforms: How to leverage data fabric concepts to engineer your organization's data for today's cloud-based digital world
From Everand
Introduction to Data Platforms: How to leverage data fabric concepts to engineer your organization's data for today's cloud-based digital world
Anthony David Giordano
No ratings yet
Data Virtualization: Selected Writings
From Everand
Data Virtualization: Selected Writings
Rick F. van der Lans
No ratings yet
DWDM Unit 2 PDF
No ratings yet
DWDM Unit 2 PDF
16 pages
Data Lake Bootcamp: Building Reliable Data Lakes
No ratings yet
Data Lake Bootcamp: Building Reliable Data Lakes
29 pages
Data Warehousing: Optimizing Data Storage And Retrieval For Business Success
From Everand
Data Warehousing: Optimizing Data Storage And Retrieval For Business Success
Rob Botwright
No ratings yet
PMS & Pa
No ratings yet
PMS & Pa
24 pages
Comparison of Power BI Tableau and Cognos Webinar Senturus
No ratings yet
Comparison of Power BI Tableau and Cognos Webinar Senturus
35 pages
Readings in Philippine History: Module 4 & 5 Contents/ Lessons
100% (1)
Readings in Philippine History: Module 4 & 5 Contents/ Lessons
21 pages
Big Book of Data Warehousing and Bi v7 113023 Final
No ratings yet
Big Book of Data Warehousing and Bi v7 113023 Final
88 pages
Case Study by PM
100% (1)
Case Study by PM
3 pages
2000CFA
No ratings yet
2000CFA
18 pages
Classic Star Schema As Data Model of Data Warehouse
No ratings yet
Classic Star Schema As Data Model of Data Warehouse
7 pages
Ho 4. 5. Boooorrriinng!!! 6. 7.: Great Leaders
No ratings yet
Ho 4. 5. Boooorrriinng!!! 6. 7.: Great Leaders
60 pages
Enabling Scalable OLAP Directly On A Data Lakehouse Architecture
No ratings yet
Enabling Scalable OLAP Directly On A Data Lakehouse Architecture
39 pages
PDF Winnie and Friends - Compress
100% (7)
PDF Winnie and Friends - Compress
21 pages
Rules of Thumb in Data Engineering
No ratings yet
Rules of Thumb in Data Engineering
10 pages
data engineering design patterns
No ratings yet
data engineering design patterns
53 pages
DP Chem Calendar Year 1 Final
No ratings yet
DP Chem Calendar Year 1 Final
2 pages
Azure AnalysisServiceOverview
No ratings yet
Azure AnalysisServiceOverview
173 pages
Whitepaper Neo Core Banking Def EN
No ratings yet
Whitepaper Neo Core Banking Def EN
10 pages
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
From Everand
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
Rob Botwright
No ratings yet
Cloud Data Lakes For Dummies Snowflake Special Edition V1 4
No ratings yet
Cloud Data Lakes For Dummies Snowflake Special Edition V1 4
10 pages
BTCM2033-Group Assignment Marking Rubic 2020-21
No ratings yet
BTCM2033-Group Assignment Marking Rubic 2020-21
2 pages
Data Architecture
No ratings yet
Data Architecture
1 page
Dimensional Modeling in Data Warehousing
No ratings yet
Dimensional Modeling in Data Warehousing
23 pages
TOGAF® Business Architecture Level 1 Study Guide
From Everand
TOGAF® Business Architecture Level 1 Study Guide
Andrew Josey
No ratings yet
Troubleshooting Spark Challenges
No ratings yet
Troubleshooting Spark Challenges
7 pages
Getting Started with Greenplum for Big Data Analytics
From Everand
Getting Started with Greenplum for Big Data Analytics
Sunila Gollapudi
No ratings yet
Govindarajan Data Vault PDF
100% (1)
Govindarajan Data Vault PDF
29 pages
Dimensional Models Intro
No ratings yet
Dimensional Models Intro
18 pages
Cloud Data Warehouse
No ratings yet
Cloud Data Warehouse
7 pages
Decentralized Web Platform - Public
No ratings yet
Decentralized Web Platform - Public
18 pages
AICPA-Rel-Qs REG 2015 PDF
No ratings yet
AICPA-Rel-Qs REG 2015 PDF
26 pages
IBM - Architecting A Big Data Platform For - White Paper - IML14333USEN PDF
No ratings yet
IBM - Architecting A Big Data Platform For - White Paper - IML14333USEN PDF
36 pages
Report - Atlan - Data Catalog Primer
No ratings yet
Report - Atlan - Data Catalog Primer
24 pages
Clinical Research Methodology
No ratings yet
Clinical Research Methodology
11 pages
Seminar
No ratings yet
Seminar
16 pages
Star Schema and Technology Review: Musa Sami Ata Abdel-Rahman Supervisor: Professor Sebastian Link
No ratings yet
Star Schema and Technology Review: Musa Sami Ata Abdel-Rahman Supervisor: Professor Sebastian Link
15 pages
Set Your Data in Motion
No ratings yet
Set Your Data in Motion
8 pages
Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified in Apache Spark using practical examples with Python
From Everand
Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified in Apache Spark using practical examples with Python
Saba Shah
No ratings yet
Software Life-Cycle Management: Openup and Architecture Handbook Overview
No ratings yet
Software Life-Cycle Management: Openup and Architecture Handbook Overview
58 pages
Oracle Warehouse Builder 11g: Getting Started
From Everand
Oracle Warehouse Builder 11g: Getting Started
Bob Griesemer
No ratings yet
Cloud Anywhere:: Azure For Hybrid and Multicloud Environments
No ratings yet
Cloud Anywhere:: Azure For Hybrid and Multicloud Environments
36 pages
Ebook ISV Five Steps To Modernizing Your Data
No ratings yet
Ebook ISV Five Steps To Modernizing Your Data
38 pages
A Tale of Two Architectures
No ratings yet
A Tale of Two Architectures
16 pages
The Data Warehousing Development Lifecycle
100% (1)
The Data Warehousing Development Lifecycle
5 pages
How To Build A Self-Service Data Analytics Stack Final - Google Docs Pdxule
No ratings yet
How To Build A Self-Service Data Analytics Stack Final - Google Docs Pdxule
12 pages
How To Sell A Data Warehouse To Upper Management Checklist
No ratings yet
How To Sell A Data Warehouse To Upper Management Checklist
6 pages
Databricks Essentials: A Guide to Unified Data Analytics
From Everand
Databricks Essentials: A Guide to Unified Data Analytics
Robert Johnson
No ratings yet
Metadata Management On A Hadoop Eco-System: Whitepaper by
No ratings yet
Metadata Management On A Hadoop Eco-System: Whitepaper by
12 pages
Attachment Staffing Contract Service Agreement Lyst6342
No ratings yet
Attachment Staffing Contract Service Agreement Lyst6342
11 pages
Battle of The Giants - Comparing Kimball and Inmon
No ratings yet
Battle of The Giants - Comparing Kimball and Inmon
15 pages
Data Vault and HQDM Principles PDF
No ratings yet
Data Vault and HQDM Principles PDF
8 pages
Insurance DataWare House Design Vechiles
No ratings yet
Insurance DataWare House Design Vechiles
2 pages

Modern Data Architecture: Bywhinmon

Uploaded by

Modern Data Architecture: Bywhinmon

Uploaded by

Modern Data Architecture

© Copyright 2015 Forest Rim Technology, all rights reserved

Forest Rim Technology, LLC

ENTER BIG DATA

DO YOU NEED A DATA WAREHOUSE WHEN YOU HAVE BIG

Forest Rim Technology, LLC 1

The essence of a data warehouse is an architecture of integrity of data.

There is a fundamental difference between a technology and an architecture. Analogically

2 Forest Rim Technology, LLC

That architectural rendition is seen in this figure:

Forest Rim Technology, LLC 3

THE “GREAT DIVIDE”

CONTEXT ENRICHED BIG DATA

4 Forest Rim Technology, LLC

TWO KINDS OF DATA IN THE DATA WAREHOUSE

A NEW TYPE OF ANALYTICAL PROCESSING

REPETITIVE DATA/DATA WAREHOUSE INTERFACE

Forest Rim Technology, LLC 5

ARCHIVAL DATA TO BIG DATA

DATA MARTS AND THE DIMENSIONAL MODEL

6 Forest Rim Technology, LLC

WHAT ABOUT MODELING?

THE SYSTEM OF RECORD

THE REMAINING ISSUES

Forest Rim Technology, LLC 7

© Copyright 2015 Forest Rim Technology, all rights reserved

About the author:

8 Forest Rim Technology, LLC

You might also like