Module 04 Ba
Module 04 Ba
Prof. PUNYASHREE K H
MBA, SAMS
Introduction to Big Data Analytics
• Analytics is the systematic computational analysis
of data or statistics.
• Data analytics is the process of collecting,
transforming, and organizing data in order to
draw conclusions, make predictions, and drive
informed decision making.
• Data analytics is a multidisciplinary field that
employs a wide range of analysis techniques,
including math, statistics, and computer science, to
draw insights from data sets.
How is data analytics used? Data analytics examples
• Data is everywhere, and people use data every day, whether they
realize it or not. Daily tasks such as measuring coffee beans to make
your morning cup, checking the weather report before deciding what
to wear, or tracking your steps throughout the day with a fitness
tracker can all be forms of analyzing and using data.
• Data is also crucial in a professional sense. Organizations that use
data to drive business strategies often find that they are more
confident, proactive, and financially savvy. As a result, data analytics
is important across many industries.
• A sneaker manufacturer might look at sales data to determine which
designs to continue and which to retire, or a health care administrator
may look at inventory data to determine the medical supplies they
should order.
Data analytics: Key concepts
• There are four key types of data analytics: descriptive,
diagnostic, predictive, and prescriptive. Together, these four
types of data analytics can help an organization make data-
driven decisions. At a glance, each of them tells us the
following:
• Descriptive analytics tell us what happened.
• Diagnostic analytics tell us why something happened.
• Predictive analytics tell us what will likely happen in the
future.
• Prescriptive analytics tell us how to act.
Big Data Analytics
• Big data comes from many sources, including transaction
processing systems, customer databases, documents, emails,
medical records, internet clickstream logs, mobile apps and
social networks.
• Big data analytics is the often complex process of
examining big data to uncover information -- such as
hidden patterns, correlations, market trends and
customer preferences -- that can help organizations
make informed business decisions.
• On a broad scale, data analytics technologies and techniques
give organizations a way to analyze data sets and gather
new information. Business intelligence (BI) queries answer
basic questions about business operations and performance.
• Big data analytics is a form of advanced analytics, which involve
complex applications with elements such as predictive models,
statistical algorithms and what-if analysis powered by analytics
systems.
• An example of big data analytics can be found in the healthcare
industry, where millions of patient records, medical claims, clinical
results, care management records and other data must be collected,
aggregated, processed and analyzed. Big data analytics is used for
accounting, decision-making, predictive analytics and many other
purposes. This data varies greatly in type, quality and accessibility,
presenting significant challenges but also offering tremendous
benefits.
Why is big data analytics important?
Organizations can use big data analytics systems and software to make
data-driven decisions that can improve their business-related outcomes.
The benefits can include more effective marketing, new revenue
opportunities, customer personalization and improved operational
efficiency. With an effective strategy, these benefits can provide
competitive advantages over competitors.
History and growth of big data analytics
• The term big data was first used to refer to increasing data volumes
in the mid-1990s. In 2001, Doug Laney, then an analyst at
consultancy Meta Group Inc., expanded the definition of big data.
This expansion described the increase of the following:
1. Volume of data being stored and used by organizations.
2. Variety of data being generated by organizations.
3. Velocity, or speed, in which that data was being created and
updated.
• Those three factors became known as the 3V's of big data.
• Gartner popularized this concept in 2005 after acquiring Meta Group
and hiring Laney.
• Over time, the 3V's became the 5V's by adding value and veracity and
sometimes a sixth V for variability.
• Another significant development in the history of big data was the
launch of the Hadoop distributed processing framework. The Hadoop
framework of software tools is widely used for managing big data.
• By 2011, big data analytics began to take a firm hold in organizations
and the public eye, along with Hadoop and various related big data
technologies.
• Big data applications were primarily used by large internet and e-
commerce companies such as Yahoo, Google and Facebook, as well
as analytics and marketing services providers.
• More recently, users have embraced big data analytics as a key
technology driving digital transformation..
• Users include retailers, financial services firms, insurers, healthcare
organizations, manufacturers, energy companies and other enterprises.
• High-quality decision-making using data analysis can help contribute
to a high-performance organization
Characteristics of Big Data
Characteristics of Big Data
Characteristics of Big Data
• Big Data has 9V's characteristics
1.Veracity
2.Variety
3.Velocity
4.Volume
5.Validity
6. Variability
7. Volatility
8. Visualization
9.Value
In recent years, Big Data was defined by the “3Vs” but now there is
“6Vs” of Big Data which are also termed as the characteristics of Big
Data as follows:
1. Volume:
• The name ‘Big Data’ itself is related to a size which is enormous.
• Volume is a huge amount of data.
• To determine the value of data, size of data plays a very crucial
role. If the volume of data is very large, then it is actually
considered as a ‘Big Data’. This means whether a particular data
can actually be considered as a Big Data or not, is dependent
upon the volume of data.
• Hence while dealing with Big Data it is necessary to consider a
characteristic ‘Volume’.
• Example: Indian mobile data traffic reached 28.9 gigabytes per
month per device in 2023, & it was estimated to increase to over
67.8 gigabytes in 2029. ( due to 4G & 5G n/w)
2. Velocity:
• Velocity refers to the high speed of accumulation of data.
• In Big Data data flows in from sources like machines, networks, social
media, mobile phones etc.
• Sampling data can help in dealing with the issue like ‘velocity’.
• Example: There are more than 3.5 billion searches per day are made
on Google. Also, Facebook users are increasing by 22%(Approx.) year
by year.
3. Variety:
• It refers to nature of data that is structured, semi-structured
and unstructured data.
• It also refers to heterogeneous sources.
• Variety is basically the arrival of data from new sources that
are both inside and outside of an enterprise. It can be
structured, semi-structured and unstructured.
– Structured data: This data is basically an organized data. It
generally refers to data that has defined the length and format of data.
– Semi- Structured data: This data is basically a semi-organised data.
It is generally a form of data that do not conform to the formal
structure of data. Log files are the examples of this type of data.
– Unstructured data: This data basically refers to unorganized data. It
generally refers to data that doesn’t fit neatly into the traditional row
and column structure of the relational database. Texts, pictures,
videos etc. are the examples of unstructured data which can’t be
stored in the form of rows and columns.
4. Veracity:
6. Variability:
• How fast or available data that extent is the structure of your data is
changing?
• How often does the meaning or shape of your data change?
• Example: if you are eating same ice-cream daily and the taste just
keep changing.
Structure of Big Data
Structured Data
• Structured data is a predefined data structure that generally comes in
tabular (in the form of rows and columns) form. As it adheres to the
conventional model, it can directly be used in formulas and
algorithms.
• Structured data can be crudely defined as the data that resides in a
fixed field within a record.
• It is type of data most familiar to our everyday lives. for ex:
birthday,address
• Generally, SQL (structured query language) is used to manage and
update the structured database. There is a concept that big data
contains only structured data which is not right. It is the most
traditional data structure but has some limits.
• Data that does not fit into the tabular form cannot be processed with
structured data formulas. So, structured data in big data is an integral
part, but there are some other valuable concepts too.
Unstructured Data
• Unstructured data is the kind of data that doesn’t adhere to
any definite schema or set of rules. Its arrangement is
unplanned and haphazard.
• Photos, videos, text documents, and log files can be
generally considered unstructured data. Even though the
metadata accompanying an image or a video may be semi-
structured, the actual data being dealt with is unstructured.
• Additionally, Unstructured data is also known as “dark
data” because it cannot be analyzed without the proper
software tools.
Unstructured Data
Semi-Structured Data
Semi-structured data is a type of data that is not formatted in a
conventional way, but it does contain some structure. It is more flexible
than structured data and can accommodate a wider variety of data
types and formats.
Here are some characteristics of semi-structured data:
Structure:- Semi-structured data contains some structure, such as tags,
keys, or other markers that separate elements and enforce hierarchies
within the data.
Flexibility:- Semi-structured data doesn't conform to a fixed schema,
allowing it to accommodate a wider variety of data types and formats.
Ease of analysis:- Semi-structured data is easier to analyze and extract
insights from compared to unstructured data.
While Big Data Analytics offers incredible benefits, it also comes with
its set of challenges:
• Data Overload: Consider Twitter, where approximately 6,000 tweets
are posted every second. The challenge is sifting through this
avalanche of data to find valuable insights.
• Data Quality: If the input data is inaccurate or incomplete, the
insights generated by Big Data Analytics can be flawed. For example,
incorrect sensor readings could lead to wrong conclusions in weather
forecasting.
• Privacy Concerns: With the vast amount of personal data used, like
in Facebook’s ad targeting, there’s a fine line between providing
personalized experiences and infringing on privacy.
• Security Risks: With cyber threats increasing, safeguarding sensitive
data becomes crucial. For instance, banks use Big Data Analytics to
detect fraudulent activities, but they must also protect this information
from breaches.
• Costs: Implementing and maintaining Big Data Analytics systems can
be expensive. Airlines like Delta use analytics to optimize flight
schedules, but they need to ensure that the benefits outweigh the
costs.
Usage of Big Data Analytics