0% found this document useful (0 votes)
18 views

Module 04 Ba

Uploaded by

ashokmlply
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Module 04 Ba

Uploaded by

ashokmlply
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

MODULE-04

BIG DATA ANALYTICS

Prof. PUNYASHREE K H

MBA, SAMS
Introduction to Big Data Analytics
• Analytics is the systematic computational analysis
of data or statistics.
• Data analytics is the process of collecting,
transforming, and organizing data in order to
draw conclusions, make predictions, and drive
informed decision making.
• Data analytics is a multidisciplinary field that
employs a wide range of analysis techniques,
including math, statistics, and computer science, to
draw insights from data sets.
How is data analytics used? Data analytics examples

• Data is everywhere, and people use data every day, whether they
realize it or not. Daily tasks such as measuring coffee beans to make
your morning cup, checking the weather report before deciding what
to wear, or tracking your steps throughout the day with a fitness
tracker can all be forms of analyzing and using data.
• Data is also crucial in a professional sense. Organizations that use
data to drive business strategies often find that they are more
confident, proactive, and financially savvy. As a result, data analytics
is important across many industries.
• A sneaker manufacturer might look at sales data to determine which
designs to continue and which to retire, or a health care administrator
may look at inventory data to determine the medical supplies they
should order.
Data analytics: Key concepts
• There are four key types of data analytics: descriptive,
diagnostic, predictive, and prescriptive. Together, these four
types of data analytics can help an organization make data-
driven decisions. At a glance, each of them tells us the
following:
• Descriptive analytics tell us what happened.
• Diagnostic analytics tell us why something happened.
• Predictive analytics tell us what will likely happen in the
future.
• Prescriptive analytics tell us how to act.
Big Data Analytics
• Big data comes from many sources, including transaction
processing systems, customer databases, documents, emails,
medical records, internet clickstream logs, mobile apps and
social networks.
• Big data analytics is the often complex process of
examining big data to uncover information -- such as
hidden patterns, correlations, market trends and
customer preferences -- that can help organizations
make informed business decisions.
• On a broad scale, data analytics technologies and techniques
give organizations a way to analyze data sets and gather
new information. Business intelligence (BI) queries answer
basic questions about business operations and performance.
• Big data analytics is a form of advanced analytics, which involve
complex applications with elements such as predictive models,
statistical algorithms and what-if analysis powered by analytics
systems.
• An example of big data analytics can be found in the healthcare
industry, where millions of patient records, medical claims, clinical
results, care management records and other data must be collected,
aggregated, processed and analyzed. Big data analytics is used for
accounting, decision-making, predictive analytics and many other
purposes. This data varies greatly in type, quality and accessibility,
presenting significant challenges but also offering tremendous
benefits.
Why is big data analytics important?

Organizations can use big data analytics systems and software to make
data-driven decisions that can improve their business-related outcomes.
The benefits can include more effective marketing, new revenue
opportunities, customer personalization and improved operational
efficiency. With an effective strategy, these benefits can provide
competitive advantages over competitors.
History and growth of big data analytics
• The term big data was first used to refer to increasing data volumes
in the mid-1990s. In 2001, Doug Laney, then an analyst at
consultancy Meta Group Inc., expanded the definition of big data.
This expansion described the increase of the following:
1. Volume of data being stored and used by organizations.
2. Variety of data being generated by organizations.
3. Velocity, or speed, in which that data was being created and
updated.
• Those three factors became known as the 3V's of big data.
• Gartner popularized this concept in 2005 after acquiring Meta Group
and hiring Laney.
• Over time, the 3V's became the 5V's by adding value and veracity and
sometimes a sixth V for variability.
• Another significant development in the history of big data was the
launch of the Hadoop distributed processing framework. The Hadoop
framework of software tools is widely used for managing big data.
• By 2011, big data analytics began to take a firm hold in organizations
and the public eye, along with Hadoop and various related big data
technologies.
• Big data applications were primarily used by large internet and e-
commerce companies such as Yahoo, Google and Facebook, as well
as analytics and marketing services providers.
• More recently, users have embraced big data analytics as a key
technology driving digital transformation..
• Users include retailers, financial services firms, insurers, healthcare
organizations, manufacturers, energy companies and other enterprises.
• High-quality decision-making using data analysis can help contribute
to a high-performance organization
Characteristics of Big Data
Characteristics of Big Data
Characteristics of Big Data
• Big Data has 9V's characteristics
1.Veracity
2.Variety
3.Velocity
4.Volume
5.Validity
6. Variability
7. Volatility
8. Visualization
9.Value
In recent years, Big Data was defined by the “3Vs” but now there is
“6Vs” of Big Data which are also termed as the characteristics of Big
Data as follows:
1. Volume:
• The name ‘Big Data’ itself is related to a size which is enormous.
• Volume is a huge amount of data.
• To determine the value of data, size of data plays a very crucial
role. If the volume of data is very large, then it is actually
considered as a ‘Big Data’. This means whether a particular data
can actually be considered as a Big Data or not, is dependent
upon the volume of data.
• Hence while dealing with Big Data it is necessary to consider a
characteristic ‘Volume’.
• Example: Indian mobile data traffic reached 28.9 gigabytes per
month per device in 2023, & it was estimated to increase to over
67.8 gigabytes in 2029. ( due to 4G & 5G n/w)
2. Velocity:
• Velocity refers to the high speed of accumulation of data.

• In Big Data data flows in from sources like machines, networks, social
media, mobile phones etc.

• There is a massive and continuous flow of data. This determines the


potential of data that how fast the data is generated and processed
to meet the demands.

• Sampling data can help in dealing with the issue like ‘velocity’.

• Example: There are more than 3.5 billion searches per day are made
on Google. Also, Facebook users are increasing by 22%(Approx.) year
by year.
3. Variety:
• It refers to nature of data that is structured, semi-structured
and unstructured data.
• It also refers to heterogeneous sources.
• Variety is basically the arrival of data from new sources that
are both inside and outside of an enterprise. It can be
structured, semi-structured and unstructured.
– Structured data: This data is basically an organized data. It
generally refers to data that has defined the length and format of data.
– Semi- Structured data: This data is basically a semi-organised data.
It is generally a form of data that do not conform to the formal
structure of data. Log files are the examples of this type of data.
– Unstructured data: This data basically refers to unorganized data. It
generally refers to data that doesn’t fit neatly into the traditional row
and column structure of the relational database. Texts, pictures,
videos etc. are the examples of unstructured data which can’t be
stored in the form of rows and columns.
4. Veracity:

• It refers to inconsistencies and uncertainty in data, that is data which


is available can sometimes get messy and quality and accuracy are
difficult to control.

• Big Data is also variable because of the multitude of data dimensions


resulting from multiple disparate data types and sources.

• Example: Data in bulk could create confusion whereas less amount of


data could convey half or Incomplete Information.
5. Value:
• After having the 4 V’s into account there comes one more V which
stands for Value! The bulk of Data having no Value is of no good to
the company, unless you turn it into something useful.
• Data in itself is of no use or importance but it needs to be converted
into something valuable to extract Information. Hence, you can state
that Value! is the most important V of all the 6V’s.

6. Variability:
• How fast or available data that extent is the structure of your data is
changing?
• How often does the meaning or shape of your data change?
• Example: if you are eating same ice-cream daily and the taste just
keep changing.
Structure of Big Data
Structured Data
• Structured data is a predefined data structure that generally comes in
tabular (in the form of rows and columns) form. As it adheres to the
conventional model, it can directly be used in formulas and
algorithms.
• Structured data can be crudely defined as the data that resides in a
fixed field within a record.
• It is type of data most familiar to our everyday lives. for ex:
birthday,address
• Generally, SQL (structured query language) is used to manage and
update the structured database. There is a concept that big data
contains only structured data which is not right. It is the most
traditional data structure but has some limits.
• Data that does not fit into the tabular form cannot be processed with
structured data formulas. So, structured data in big data is an integral
part, but there are some other valuable concepts too.
Unstructured Data
• Unstructured data is the kind of data that doesn’t adhere to
any definite schema or set of rules. Its arrangement is
unplanned and haphazard.
• Photos, videos, text documents, and log files can be
generally considered unstructured data. Even though the
metadata accompanying an image or a video may be semi-
structured, the actual data being dealt with is unstructured.
• Additionally, Unstructured data is also known as “dark
data” because it cannot be analyzed without the proper
software tools.
Unstructured Data
Semi-Structured Data
Semi-structured data is a type of data that is not formatted in a
conventional way, but it does contain some structure. It is more flexible
than structured data and can accommodate a wider variety of data
types and formats.
Here are some characteristics of semi-structured data:
Structure:- Semi-structured data contains some structure, such as tags,
keys, or other markers that separate elements and enforce hierarchies
within the data.
Flexibility:- Semi-structured data doesn't conform to a fixed schema,
allowing it to accommodate a wider variety of data types and formats.
Ease of analysis:- Semi-structured data is easier to analyze and extract
insights from compared to unstructured data.

Examples:- Common semi-structured data formats include JSON, Avro,


and XML.
Advantages of Unstructured Data:
• It supports the data that lacks a proper format or sequence
• The data is not constrained by a fixed schema
• Very Flexible due to the absence of schema.
• Data is portable
• It is very scalable
• It can deal easily with the heterogeneity of sources.
• These types of data have a variety of business intelligence and
analytics applications.
Disadvantages Of Unstructured Data:
• It is difficult to store and manage unstructured data due to
lack of schema and structure.
• Indexing the data is difficult and error-prone due to unclear
structure and not having pre-defined attributes. Due to this
search results are not very accurate.
• Ensuring the security of data is a difficult task.
REAL TIME DATA
Real-time data refers to information that is made
available for use as soon as it is generated. Ideally, the
data is passed instantly between the source and the
consuming application but bottlenecks in data
infrastructure or bandwidth can create a lag. Real-time
data is used in time sensitive applications such as
stock trading or navigation and it powers real-time
analytics, which brings in-the-moment insights and
helps you quickly react to changing conditions.
Benefits
Real-time data is applied in nearly every industry today. This
is because of the rapid pace of modern business, high
customer expectations for immediate personalization and
response, and the growth of real-time applications, big data,
and the Internet of Things (IoT).
• Make Faster, Better Decisions. Using a real-time analytics tool, you
can have in-the-moment understanding of what’s happening in your
business. This tool can automatically trigger alarms, develop
dashboards and reports, and other actions in response to realtime data.
These timely insights help you optimize your business faster than
competitors. For example, your revenue operations team will be able
to spot revenue risks before they progress.
• Meet Customer Expectations. Customers today rely on applications
that deliver time-sensitive data–such as weather, navigation, and ride-
sharing apps–and they expect this level of instant and personalized
service in all aspects of their life. Leveraging data in real time allows
you to provide your customers the information they need instantly.
• Reduce Fraud, Cybercrime, and Outages. Issues such as fraud,
security breaches, production problems, and inventory outages can
escalate quickly and result in significant losses for your organization.
Realtime data lets you monitor every aspect of your business so that
you can respond and prevent these issues before they become critical.
• Reduce IT Infrastructure Expense. Working with
data in real time allows you to better monitor and
report on your IT systems and take a more proactive
approach to troubleshooting servers, systems, and
devices. Plus, realtime data is usually stored in lower
volumes which results in lower storage and hardware
costs.
Real-Time Data Architecture
1. Aggregate your data sources. Typical real-time data sources include
IoT/sensors, server logs, app activity, online advertising, and
clickstream data. Connect all these data sources from your
transactional systems or your relational databases to a stream
processor using a CDC streaming tool.
2. Implement a stream processor. Using a tool such as Amazon Kinesis
or Apache Kafka you then process your streaming data on a record-
by-record basis, sequentially and incrementally or over sliding time
windows. To keep up with fast moving big data, your stream
processor will need to be fast, scalable, and fault tolerant. You will
also integrate it with downstream applications for presentation or
triggered actions.
3. Perform real-time queries (or store your data). Now your
infrastructure needs to filter, aggregate, correlate, and sample your data
using a tool such as Google BigQuery, Snowflake, Dataflow, or Amazon
Kinesis Data Analytics. You can query the realtime data stream itself as
it’s streaming using a streaming SQL engine for Apache Kafka called
ksqlDB. And, if you choose, you can also store this data in the cloud for
future use. For storage, you can use a database or cloud data warehouse
such as Amazon S3, Amazon Redshift, or Google Storage.

4. Support Use Cases. Now your real-time is ready to support whatever


use case you have in mind. A real-time data analytics tool lets you
conduct analysis, data science, and machine learning or AutoML without
having to wait for data to reside in a database. These tools can also
trigger alerts and events in other applications.
Here are some specific use cases:
• Trigger events in other applications such as in ad buying software that
buys online advertising based on predefined rules or in a content
publishing system which makes personalized recommendations to
users.
• Update data and calculations in time-sensitive apps such as stock
trading, medical monitoring, navigation, and weather reporting.
• Produce interactive data dashboards and visualizations that deliver
alerts and insights in real time.
Benefits of Big Data Analytics
Big Data Analytics offers a host of real-world advantages, and let’s
understand with examples:
1. Informed Decisions: Imagine a store like Walmart. Big Data
Analytics helps them make smart choices about what products to
stock. This not only reduces waste but also keeps customers happy
and profits high.
2. Enhanced Customer Experiences: Think about Amazon. Big Data
Analytics is what makes those product suggestions so accurate. It’s
like having a personal shopper who knows your taste and helps you
find what you want.
3. Fraud Detection: Credit card companies, like MasterCard, use Big
Data Analytics to catch and stop fraudulent transactions. It’s like
having a guardian that watches over your money and keeps it safe.
4. Optimized Logistics: FedEx, for example, uses Big Data Analytics
to deliver your packages faster and with less impact on the
environment. It’s like taking the fastest route to your destination
while also being kind to the
Challenges of Big data analytics

While Big Data Analytics offers incredible benefits, it also comes with
its set of challenges:
• Data Overload: Consider Twitter, where approximately 6,000 tweets
are posted every second. The challenge is sifting through this
avalanche of data to find valuable insights.
• Data Quality: If the input data is inaccurate or incomplete, the
insights generated by Big Data Analytics can be flawed. For example,
incorrect sensor readings could lead to wrong conclusions in weather
forecasting.
• Privacy Concerns: With the vast amount of personal data used, like
in Facebook’s ad targeting, there’s a fine line between providing
personalized experiences and infringing on privacy.
• Security Risks: With cyber threats increasing, safeguarding sensitive
data becomes crucial. For instance, banks use Big Data Analytics to
detect fraudulent activities, but they must also protect this information
from breaches.
• Costs: Implementing and maintaining Big Data Analytics systems can
be expensive. Airlines like Delta use analytics to optimize flight
schedules, but they need to ensure that the benefits outweigh the
costs.
Usage of Big Data Analytics

Big Data Analytics has a significant impact in various sectors:


• Healthcare: It aids in precise diagnoses and disease prediction,
elevating patient care.
• Retail: Amazon’s use of Big Data Analytics offers personalized
product recommendations based on your shopping history, creating a
more tailored and enjoyable shopping experience.
• Finance: Credit card companies such as Visa rely on Big Data
Analytics to swiftly identify and prevent fraudulent transactions,
ensuring the safety of your financial assets.
• Transportation: Companies like Uber use Big Data Analytics to
optimize drivers’ routes and predict demand, reducing wait times and
improving overall transportation experiences.
• Agriculture: Farmers make informed decisions, boosting crop yields
while conserving resources.
• Manufacturing: Companies like General Electric (GE) use
Big Data Analytics to predict machinery maintenance needs,
reducing downtime and enhancing operational efficiency.
MOBILE DATA ANALYTICS
MDA is the process of collecting and analysing data
about how users interact with mobile devices and
applications.

This data can be used to improve the user


experience , increase engagement, and drive business
outcomes.
Here are some examples of what can be learned
from mobile data analytics:
• User behavior: How users interact with ads, what
they do in the app, and when they open and
close the app
• User friction: Where users are dropping off,
taking unexpected steps, or hesitating before
taking the next step
• User acquisition: Which channels users are
coming from
• User churn: When users are leaving the app, and
why
Some common examples of mobile data analytics
include:
• Funnel analysis
• Click-through rates
• Retention rates
• Heatmaps
• Web analytics
• Traffic sources
• Demographic information of signed-up users
• Mobile conversion rates
Some best practices for mobile data analytics include:
• Identifying the right KPIs
• Setting specific, measurable, and achievable goals
• Understanding data visualization
• Finding patterns and trends in user behavior
• Integrating with third-party tools
• Continuously monitoring user flows
Social Media Analytics
Social media analytics is the process of collecting and
analysing audience data shared on social network to
improve organisations strategic business decisions.
OR
Social media analytics is the ability to gather and find
meaning in data gathers from social channels to
support business decisions- and measure the
performance of cations based on those decisions
through social media.
Softwares List to handle Big data
• Hadoop: Best for large-scale data processing
• Apache Spark: Best for real-time analytics
• Google BigQuery: Best for data handling in
Google Cloud
• Snowflake: Best for cloud-based data
warehousing
• Tableau: Best for data visualization
• PowerBI: Best for in-depth analysis
• Databricks: Best for team collaboration

You might also like