Django Tutorial
Django Tutorial
Greg Porpora
Query
Query Data
Data Results
Results Data Query
Query Results
Data Results
Transportation
• Intelligent traffic Fraud prevention
management • Detecting multi-party fraud
• Real time fraud prevention
Manufacturing
• Process control for Radio Astronomy
microchip fabrication • Detection of transient events
4
4 © 2010 IBM Corporation
What kinds of analysis is Streams used for?
Mining in Microseconds Acoustic
(IBM Research)
(included with Streams)
(Open Source)
Statistics
Predictive
(IBM Research)
� R( s , a )
population
t t (included with Streams)
REPORTS
Ad-hoc Queries
WAREHOUSE
DATA SOURCES
DATAMARTS
DATA INTEGRATION
into
OPERATIONAL DATA STORES
Data in motion
Situational Awareness and rapid response
must be processed on the fly
demand rapid analysis, much earlier than
traditional data mining technologies can deliver,
examining much more data, from many more sources
6` 6 © 2010 IBM Corporation
IBM InfoSphere Streams v2.0 Platform
Development Runtime Toolkits, Adapters
Environment Environment & Samples
Front
Front Office
Office 3.0
3.0
• Standard Toolkit
• Internet Toolkit
• Database Toolkit
• Streams Processing • RHEL v5.3 and above • Financial Toolkit
Language (SPL) • x86 multicore hardware • Mining Toolkit
• Eclipse IDE • InfiniBand support • Big Data Toolkit (New)
• Streams Instance • Clustered runtime for • Text Toolkit (New)
Graph near-limitless capacity • User defined toolkits
• Streams Debugger • Web Admin Console • Over 50 samples
77 © 2010 IBM Corporation
What is Stream Computing?
Continuous Ingestion Continuous Analysis in Microseconds
Correlate
Classify
10 Q Earnings
Extraction
SEC Edgar
Trade Decision
torrents timely
of data complex analyses insights
10 Q Earnings
Extraction
Video News
News
News
Speech
Recognition
Speech
Recognition
Recognition
torrents timely
of data complex analyses insights
VWAP
Calculation
Ahead of the news cycle,
observe and predict real-
NYSE world events indicating
skinny stream (very) Dynamic
risk or opportunity
(better as DB enrichment) P/E Ratio
Calculation
10 Q Earnings
Extraction
Join P/E with
medium-sized streams Earnings Aggregate
SEC Edgar independently processed in parallel Moving Impact
Average
Calculation
Caption
Caption
Extraction
Caption
Extraction Earnings
Extraction Earnings
Topic Related
Topic Related
Earnings Earnings
Topic
Filtration
Topic News
Filtration News
Related News
Video
Video
Filtration
Filtration Analysis
Analysis
News Join Trade Decision
Video
Video News
News
Speech
Speech Analysis
News
News
Recognition
Speech
Recognition
Recognition
Hurricane
Hurricane
Forecast Hurricane Hurricane
Weather Hurricane Hurricane
Model 1 Risk Industry
Data Forecast
Hurricane Encoder Impact Impact
Extraction Model 2
Forecast
Hurricane
Weather Data Model …
Forecast
Model N
streams that can be substituted with high
volume, higher precision streams
torrents timely
of data complex analyses insights
10 Q Earnings
Extraction
Join P/E with
Earnings Aggregate
SEC Edgar Moving Impact
Average
Calculation
Caption
Caption
Extraction
Caption
Extraction Earnings
Extraction Earnings
Topic Related
Topic Related
Earnings Earnings
Topic
Filtration
Topic News
Filtration News
Related News
Video
Video
Video News Speech
Filtration
Filtration Analysis
Analysis
News
Analysis
Join Trade Decision
Speech
News
Video News
News
Recognition
Speech
Recognition
Recognition
Hurricane
Hurricane
Forecast Hurricane Hurricane
Weather Hurricane Hurricane
Model 1 Risk Industry
Data Forecast
Hurricane Encoder Impact Impact
Extraction Model 2
Forecast
Hurricane
Weather Data Model …
Forecast
Model N
Streams instance
Connections
PE
Streams PE Source
compiler Application
source PE Sink
PE Manager
PE
PE
PE
PE Sink
Source PE PE PE Sink
Source PE PE Sink
PE
x86
X86Node x86
X86Node x86
X86Node x86
X86Node x86
X86Node
Blade Blade Blade Blade Blade
15
15 © 2010 IBM Corporation
A quick peek inside …
InfoSphere
InfoSphere Streams
Streams Instance
Instance
Management
Management Services
Services
Streams
Streams Web
Web Service
Service (SWS)
(SWS)
Streams
Streams Application
Application Manager
Manager (SAM)
(SAM)
Streams
Streams Resource
Resource Manager
Manager (SRM)
(SRM)
Authorization
Authorization and
and Authentication
Authentication Service
Service (AAS)
(AAS)
Scheduler
Scheduler Recover
Recover DB
DB Name
Name Server
Server
Shared
Shared File
File System
System
Application
Application Host
Host Application
Application Host
Host Application
Application Host
Host
Host
Host Controller
Controller Host
Host Controller
Controller Host
Host Controller
Controller
Processing
Processing Element
Element Processing
Processing Element
Element Processing
Processing Element
Element
Container
Container Container
Container Container
Container
16
16 © 2010 IBM Corporation
InfoSphere Streams - Summary
The
The focus
focus of
of this
this lab
lab is
is the
the TECHNOLOGY
TECHNOLOGY
17
17 © 2010 IBM Corporation
Operators versus PEs
� Operators cannot be deployed directly to a processing node
– For an operator to be deployed it must be associated with a single deployable unit
called a processing element (aka, a PE)
� A PE can contain a single operator
� Typically, a PE contains many operators
– For higher performance on a single processing node, two or more
operators – and the streams connecting them – can be fused into a single PE
h/w node X
Streams instance
18
18 © 2010 IBM Corporation
Streams Mining Toolkit
Use when there’s value in immediate awareness of anomalies
� Regression
– Predicts the quantity or probability of an outcome
• What is the likelihood of heart attack, given age, weight, …?
• What is the expected profit a customer will generate?
• What is the forecasted price of a stock?
– Algorithms: Logistic, Linear, Polynomial, Transform
� Incremental Learning
– Learns model incrementally, as data arrives
• Is the data being received drifting from the model?
• Should I use a model based on more recent events?
– Algorithms: Incremental decision tree learner
22
22
22 © 2010 IBM Corporation
TerraEchos - Smarter Surveillance & Covert
Intrusion Detection
� State-of-the-art covert
surveillance based on InfoSphere
Streams
24
24 © 2010 IBM Corporation
25
25 © 2010 IBM Corporation
���������������������������������������������������������������������������
���������������������������������������������������������������������������������
�����������������������������������������������������