0% found this document useful (0 votes)
67 views

Django Tutorial

The document provides an overview of InfoSphere Streams, which is a stream computing platform from IBM. It describes how stream computing processes data in motion in real-time, unlike traditional batch processing. The document also outlines different industries and use cases where stream computing can be applied, such as transportation, manufacturing, health, and more. It then discusses the types of analysis that can be performed using stream computing and how it allows for faster real-time analysis of big data compared to traditional approaches.

Uploaded by

rshekha4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views

Django Tutorial

The document provides an overview of InfoSphere Streams, which is a stream computing platform from IBM. It describes how stream computing processes data in motion in real-time, unlike traditional batch processing. The document also outlines different industries and use cases where stream computing can be applied, such as transportation, manufacturing, health, and more. It then discusses the types of analysis that can be performed using stream computing and how it allows for faster real-time analysis of big data compared to traditional approaches.

Uploaded by

rshekha4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

InfoSphere Streams Overview

Greg Porpora

January 20, 2012 © 2010 IBM Corporation


Traditional computing versus stream computing
Traditional Computing Stream Computing

Historical fact finding Real-time analysis


with data-at-rest of data-in-motion
• Batch paradigm
– Pull model Streaming data
• Query-driven: • Streams of structured and/or unstructured data-
– Submits a query to static data in-motion
• Relies on databases/data
Stream Computing
warehouses
• Difficulty processing large volumes –Analytic operations on streaming data
of streaming data in real time. Most appropriate where large
volumes of data need to be processed in very
short time intervals.

Query
Query Data
Data Results
Results Data Query
Query Results
Data Results

2 © 2010 IBM Corporation


Something meaningful is happening…
Stock market
• Impact of weather on
securities prices
Natural Systems • Analyze market data at
ultra-low latencies
• Seismic monitoring
• Wildfire management Law Enforcement
• Water management • Real-time multimodal surveillance
(eg., monitoring cameras to detect
faces)

Transportation
• Intelligent traffic Fraud prevention
management • Detecting multi-party fraud
• Real time fraud prevention

Manufacturing
• Process control for Radio Astronomy
microchip fabrication • Detection of transient events

Health and Life Sciences


• Neonatal ICU monitoring (eg., Telecom
detection of systemic infection) • Processing of Call Detail records
• Epidemic early warning system • Real-time services, billing, advertizing
• Remote healthcare monitoring • Business intelligence
• Churn Analysis, Fraud Detection

3 © 2010 IBM Corporation


Where Does Streams Fit?

Any Environment which requires Real time delivery

In-Motion Analytics on BIG Data ICU


Monitoring
Environment
Monitoring

Algo Powerful Telco churn


Terabytes per second Trading predict
Volume Cyber
Analytics
Smart
Petabytes per day Security Government / Grid
Law enforcement

All kinds of data


Variety
All kinds of analytics Millions of Microsecond
events per Latency
second

Velocity Insights in microseconds


Traditional / Non-traditional
data sources
Where results are required in
less than seconds, not hours

4
4 © 2010 IBM Corporation
What kinds of analysis is Streams used for?
Mining in Microseconds Acoustic
(IBM Research)
(included with Streams)
(Open Source)

Text Simple & Advanced Text Advanced


(listen, verb), (included with Streams) Mathemetical
(radio, noun) (IBM Research) Models
(Open Source UIMA) (IBM Research)

Statistics
Predictive
(IBM Research)
� R( s , a )
population
t t (included with Streams)

Image & Video


(Open Source)
GeoSpatial
(IBM Research)

5 © 2010 IBM Corporation


Traditional Data Mining is an involved process
Data must
It takes
Many
So be
time
muchtoingested
store,
informed
data into Stream
mine, offline
isdecisions
skipped, simplycomputing
stores,
analyze,
dropped, selecting
inform...
cannot
ignored...lets
wait... youto reduce footprint…
subset
Analytical Modeling
…observe a broader swath of data
& Information

…analyze the data on the fly


Elapsed
Time toTime to Action
Action …fuse, analyze much more data,
much sooner Analytical Modeling & Information
…use new classes of analytics
Operational Dashboards Planning
Reports
Bus. Process &
Event Mgmt Scorecarding

REPORTS
Ad-hoc Queries

WAREHOUSE

DATA SOURCES

DATAMARTS

DATA INTEGRATION
into
OPERATIONAL DATA STORES
Data in motion
Situational Awareness and rapid response
must be processed on the fly
demand rapid analysis, much earlier than
traditional data mining technologies can deliver,
examining much more data, from many more sources
6` 6 © 2010 IBM Corporation
IBM InfoSphere Streams v2.0 Platform
Development Runtime Toolkits, Adapters
Environment Environment & Samples

Front
Front Office
Office 3.0
3.0

• Standard Toolkit
• Internet Toolkit
• Database Toolkit
• Streams Processing • RHEL v5.3 and above • Financial Toolkit
Language (SPL) • x86 multicore hardware • Mining Toolkit
• Eclipse IDE • InfiniBand support • Big Data Toolkit (New)
• Streams Instance • Clustered runtime for • Text Toolkit (New)
Graph near-limitless capacity • User defined toolkits
• Streams Debugger • Web Admin Console • Over 50 samples
77 © 2010 IBM Corporation
What is Stream Computing?
Continuous Ingestion Continuous Analysis in Microseconds

8 © 2010 IBM Corporation


How Streams Works
� Continuous ingestion Infrastructure provides services for
� Continuous analysis Scheduling analytics across hardware nodes,
Establishing streaming connectivity
Filter / Sample
Transform Annotate

Correlate
Classify

Achieve scale: Where appropriate:


By enabling partitioning of applications into software components Elements can be fused together
By distributing across stream-connected hardware nodes for lower communication latencies

9 © 2010 IBM Corporation


Notional example – trading enriched by stream processing
fat stream
Calculate P/E
Ratio as prices
VWAP
Calculation
change
NYSE
skinny stream (very) Dynamic
(better as DB enrichment) P/E Ratio
Calculation

10 Q Earnings
Extraction

SEC Edgar

Trade Decision

torrents timely
of data complex analyses insights

10 10 © 2010 IBM Corporation


Notional example – trading enriched by stream processing
fat stream
Enrich basic analyses
by examining
VWAP relevant news stories
Calculation
Ahead of the news cycle,
NYSE observe and predict real-
world events indicating
skinny stream (very) Dynamic
risk or opportunity
(better as DB enrichment) P/E Ratio
Calculation

10 Q Earnings
Extraction

medium-sized streams Earnings


SEC Edgar independently processed in parallel Moving
Average
Calculation
Caption
Caption
Extraction
Caption
Extraction Earnings
Extraction Earnings
Topic Related
Topic Related
Earnings Earnings
Topic
Filtration
Topic News
Filtration News
Related News
Video
Video Speech
Filtration
Filtration Analysis
Analysis
News
Analysis
Join Trade Decision

Video News
News
News
Speech
Recognition
Speech
Recognition
Recognition

torrents timely
of data complex analyses insights

11 11 © 2010 IBM Corporation


Notional example – trading enriched by stream processing
fat stream

VWAP
Calculation
Ahead of the news cycle,
observe and predict real-
NYSE world events indicating
skinny stream (very) Dynamic
risk or opportunity
(better as DB enrichment) P/E Ratio
Calculation

10 Q Earnings
Extraction
Join P/E with
medium-sized streams Earnings Aggregate
SEC Edgar independently processed in parallel Moving Impact
Average
Calculation
Caption
Caption
Extraction
Caption
Extraction Earnings
Extraction Earnings
Topic Related
Topic Related
Earnings Earnings
Topic
Filtration
Topic News
Filtration News
Related News
Video
Video
Filtration
Filtration Analysis
Analysis
News Join Trade Decision
Video
Video News
News
Speech
Speech Analysis
News
News
Recognition
Speech
Recognition
Recognition

Hurricane
Hurricane
Forecast Hurricane Hurricane
Weather Hurricane Hurricane
Model 1 Risk Industry
Data Forecast
Hurricane Encoder Impact Impact
Extraction Model 2
Forecast
Hurricane
Weather Data Model …
Forecast
Model N
streams that can be substituted with high
volume, higher precision streams

torrents timely
of data complex analyses insights

12 12 © 2010 IBM Corporation


Notional example – trading enriched by stream processing
Resource management,
Scale-out for dozens, hundreds scheduling infrastructure
of sources VWAP
Calculation
are strong enablers
NYSE
Dynamic
P/E Ratio
multiple means of analysis Calculation

10 Q Earnings
Extraction
Join P/E with
Earnings Aggregate
SEC Edgar Moving Impact
Average
Calculation
Caption
Caption
Extraction
Caption
Extraction Earnings
Extraction Earnings
Topic Related
Topic Related
Earnings Earnings
Topic
Filtration
Topic News
Filtration News
Related News
Video
Video
Video News Speech
Filtration
Filtration Analysis
Analysis
News
Analysis
Join Trade Decision
Speech
News
Video News
News
Recognition
Speech
Recognition
Recognition

Hurricane
Hurricane
Forecast Hurricane Hurricane
Weather Hurricane Hurricane
Model 1 Risk Industry
Data Forecast
Hurricane Encoder Impact Impact
Extraction Model 2
Forecast
Hurricane
Weather Data Model …
Forecast
Model N

Parallel competing analyses


torrents timely
of data complex analyses insights

13 13 © 2010 IBM Corporation


From Essential Elements to Deployed, Running Jobs

� Streams application graph:


– A directed, possibly cyclic, dataflow graph
– Contains a collection of sources, operators, & sinks
– Connected by streams

� Each complete application is a potentially deployable job


� Jobs are deployed to a Streams runtime environment, known as a
Streams Instance (or simply, an instance)
� An instance can include a single processing node (hardware)
� Or multiple processing nodes

h/w node h/w node h/w node


h/w node h/w node

h/w node h/w node


h/w node

Streams instance

14 © 2010 IBM Corporation


InfoSphere Streams Runtime
Streams
PE PE Streams

Connections
PE
Streams PE Source
compiler Application
source PE Sink
PE Manager
PE
PE
PE

PE Sink

Source PE PE PE Sink

Source PE PE Sink

PE

Processing Processing Processing Processing Processing


Element Element Element Element Element
Container Container Container Container Container
Streams Data Fabric
TCP-IP
Physical/ Ethernet
Network

x86
X86Node x86
X86Node x86
X86Node x86
X86Node x86
X86Node
Blade Blade Blade Blade Blade

15
15 © 2010 IBM Corporation
A quick peek inside …
InfoSphere
InfoSphere Streams
Streams Instance
Instance

Management
Management Services
Services
Streams
Streams Web
Web Service
Service (SWS)
(SWS)
Streams
Streams Application
Application Manager
Manager (SAM)
(SAM)

Streams
Streams Resource
Resource Manager
Manager (SRM)
(SRM)

Authorization
Authorization and
and Authentication
Authentication Service
Service (AAS)
(AAS)

Scheduler
Scheduler Recover
Recover DB
DB Name
Name Server
Server

Shared
Shared File
File System
System

Application
Application Host
Host Application
Application Host
Host Application
Application Host
Host
Host
Host Controller
Controller Host
Host Controller
Controller Host
Host Controller
Controller
Processing
Processing Element
Element Processing
Processing Element
Element Processing
Processing Element
Element
Container
Container Container
Container Container
Container

16
16 © 2010 IBM Corporation
InfoSphere Streams - Summary

� InfoSphere Streams capabilities and performance allow…


– Very complex analytics… on
– Incredible volumes and variety of streaming data.. with
– Sub-millisecond latency and response time.. while
– Data is still in motion… to
– Provide customers with a very flexible yet extremely powerful solution to
remain highly competitive and productive

� InfoSphere Streams technology provides…


– Scalable architecture. Architected for 100+ nodes, yet runs on a single node.
– Dynamic Job Handling. Jobs can be added and removed from the runtime
engine without requiring a restart.
– Dynamic Connectivity. Jobs can be run and connect to existing streaming
applications without requiring applications to be restarted.
– Data Flexibility. Handles structured and unstructured as well as binary data
formats.

The
The focus
focus of
of this
this lab
lab is
is the
the TECHNOLOGY
TECHNOLOGY

17
17 © 2010 IBM Corporation
Operators versus PEs
� Operators cannot be deployed directly to a processing node
– For an operator to be deployed it must be associated with a single deployable unit
called a processing element (aka, a PE)
� A PE can contain a single operator
� Typically, a PE contains many operators
– For higher performance on a single processing node, two or more
operators – and the streams connecting them – can be fused into a single PE

h/w node X
Streams instance

� One or more PEs can be deployed to a single processing node


� But a PE cannot be deployed across multiple processing nodes
� Performance and flexibility are considerations in determining where to fuse
– Operators can be fused manually or automatically (based on resource profiling)

18
18 © 2010 IBM Corporation
Streams Mining Toolkit
Use when there’s value in immediate awareness of anomalies

Supports Predictive Model Markup Language (PMML)


� PMML: Supported by many vendors, e.g. SAS Enterprise Miner, SPSS,
R/Rattle, Weka, InfoSphere Warehouse
� Integrates mining algorithms from InfoSphere Warehouse

Operator Name Supported PMML


Algorithm
(Algorithm Type) Versions
Classification Decision Tree 2.0 - 3.0
Logistic Regression 2.0 - 3.2
Naïve Bayes 2.0 - 3.2
Regression Linear Regression 2.0 - 3.0
Polynomial Regression 2.0 - 3.0
Transform Regression 2.0 - 3.0
Clustering Demographic Clustering 2.0 - 3.0
Kohonen Clustering 2.0 - 3.0
Associations Association Rules 2.0 - 3.2

19 © 2010 IBM Corporation


Streams Mining Toolkit
� Classification
– Predicts whether a record belongs to a certain class
• Which type of vehicle part is most likely to fail?
• Is this employee likely to leave?
– Algorithms: Decision Trees, Naïve Bayes

� Regression
– Predicts the quantity or probability of an outcome
• What is the likelihood of heart attack, given age, weight, …?
• What is the expected profit a customer will generate?
• What is the forecasted price of a stock?
– Algorithms: Logistic, Linear, Polynomial, Transform

20 20 © 2010 IBM Corporation


Streams Mining Toolkit
� Clustering
– Identifies groups with common characteristics, or properties of similar groups
• What are behavior-based properties of various types of servers (e.g.,
database, application, …)
• Which healthcare providers may be submitting fraudulent claims?
– Algorithms: Demographic, Kohonen

� Incremental Learning
– Learns model incrementally, as data arrives
• Is the data being received drifting from the model?
• Should I use a model based on more recent events?
– Algorithms: Incremental decision tree learner

21 21 © 2010 IBM Corporation


‘Smart’ applications are in use today
Neonatal Care Trading Advantage Environment

Law Enforcement Radio Astronomy Telecom

Manufacturing Traffic Control Fraud Prevention

22
22
22 © 2010 IBM Corporation
TerraEchos - Smarter Surveillance & Covert
Intrusion Detection
� State-of-the-art covert
surveillance based on InfoSphere
Streams

� Acoustic signals from buried fiber-


optic cables are monitored,
analyzed and reported in real
time to locate intruders

� Transforming surveillance &


Intelligence systems that save
both money and lives

23 © 2010 IBM Corporation


Key Resources – InfoSphere Streams

� Greg Porpora, Federal SW InfoSphere Streams Sales Leader


� Mike Moody, Federal SW InfoSphere Streams Technical Lead
IBM InfoSphere Streams
http://www-01.ibm.com/software/data/infosphere/streams/

InfoSphere Streams Information Center home site


http://publib.boulder.ibm.com/infocenter/streams/v2r0/index.jsp

InfoSphere Streams Forum


http://www.ibm.com/developerworks/forums/forum.jspa?forumID=1664&start=0

Streams Business Community


https://www.ibm.com/developerworks/mydeveloperworks/groups/service/html/communit
yview?communityUuid=0bacd3f7-068f-441e-af3f-5c30bd0fdbe6

DeveloperWorks Reference Materials site


http://www.ibm.com/developerworks/wikis/display/streams/Reference%20Materials

24
24 © 2010 IBM Corporation
25
25 © 2010 IBM Corporation
���������������������������������������������������������������������������
���������������������������������������������������������������������������������
�����������������������������������������������������

You might also like