Stream Processing With: Tamás István Ujj
Stream Processing With: Tamás István Ujj
Machine
Learnin
g
Manufacturing
Telecommunications
A real-time data architecture
I want to do complex
calculations on large
amounts of data.
Recomputation will
cost you extra.
New Staging Master Transformation
Data ETL Logic Results
Area Dataset
Transformation Results
Logic (New) (New)
Interesting.
That’s half the
costs.
A well-designed Offset
streaming system
provides exactly-once
semantics, even in case 0123456789
of failure.
Transformation Real-
Offset Logic (New)
Transformation Time
Staging
ETL Master Batch
(New)
Area Logic Results
Dataset Results
Update
Create
Update
Delete
Create
Delete
view over this
stream of events.
Database
Responding to single
events in real-time or a
general analysis over the
stream.
Event Processing Micro-Batch Processing
Latency Sub-second Seconds to minutes
Power Simple triggers Complex transformations
Akka Streams
Kafka Streams
Reactive Streams with
back pressure.
YARN
• Hadoop and related components.
• Job request comes in, YARN places the job.
MESOS
• Any application.
• Job request comes in, MESOS offers resources,
job accepts or rejects.
Upstream Sources
Downstream
Applications
An architecture for
converting large amounts of
raw data into vauable
information in real-time.
Business Intelligence
Inspiration: Nathan Marz, Jay Kreps, Tyler Akidau, Martin Kleppmann, Dean Wampler