0% found this document useful (0 votes)

159 views

Data Analysis Using Python (Python For Beginners) - CloudxLab

- NumPy is a Python library used for working with arrays and matrices for numerical computing. - NumPy provides multidimensional arrays and matrices, along with tools to work with these numeric data structures. - Common NumPy functions include np.array() for creating arrays, np.zeros() and np.ones() for creating arrays of zeros or ones, and np.random.rand() for generating random numbers.

Uploaded by

Gizliusta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

159 views

Data Analysis Using Python (Python For Beginners) - CloudxLab

Uploaded by

Gizliusta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 152

Data Analysis with Python (For Beginners)

[email protected]
About CloudxLab

Making learning fun and for life

Videos Quizzes Hands-On Projects Case Studies

Real Life Use Cases

CloudxLab - Playground with
Feedback
Playground for hands-on. System evaluates the code automatically
and nudges the user by giving appropriate feedback

Content Playground

Feedback
CloudxLab - Online Cloud Based Lab

Cloud-based Lab with pre-installed tools and software for

practicing AI, Machine Learning, Deep Learning, Data Science, Big
Data and related technologies
CloudxLab - Online Cloud Based Lab

Real-world Experience Seamless Experience

Lab setup is exactly same as of setup in No endless downloading/ installations. No
Enterprises. Become job ready from hardware, permissions or conﬁguration
Day 1 issues

Central Dataset Any Device Anywhere

Upload your own dataset Connect from ANY browser,
Or use open source datasets available on lab SSH, device or operating system
CloudxLab - Social
We learn better with peers. Social proof and leaderboard
increases engagement and motivation
CloudxLab - Hiring Partners
Dedicated Job Portal → Upgrade career, enhance salary & move
jobs by applying to jobs posted by our hiring partners
CloudxLab - University Partners
Instructors / Authors

Praveen
Sandeep Giri Abhinav Singh
Pavithran
Founder at CloudxLab.com | AI CTO/Co-Founder at Yatis | IOT, Co-Founder, CloudxLab.com | AI,
Advisor at Algoworks | Speaker - ML, Computer Vision, Edge ML & Big Data | Visiting Faculty at
AI, Machine Learning, Deep SCMHRD
Learning,Big Data Cypress Semiconductors, Philips,
Multiple patents Byjus, HashCube
Amazon, InMobi, D.E.Shaw conference papers, 9+ Years of Exp. in EdTech, Game
18+ Years of Exp. in Enterprise IIT Bombay Dual Degree Development & Building Product
Softwares, Machine Learning &
Churning Humongous Data
What is Python

[email protected]
What is Python

- Python is a interpreted,
high-level language

[email protected]
What is Python

- Python is a interpreted,
high-level language
- Invented in 1991 by Guido van
Rossum

[email protected]
What is Python

- Python is a interpreted,
high-level language
- Invented in 1991 by Guido van
Rossum
- It is easy to use and improves
engineer productivity

[email protected]
What is Python

- Python is a interpreted,
high-level language
- Invented in 1991 by Guido van
Rossum
- It is easy to use and improves
engineer productivity
- Libraries for multiple
applications

[email protected]
What is Python

- Python is a interpreted,
high-level language
- Invented in 1991 by Guido van
Rossum
- It is easy to use and improves
engineer productivity
- Libraries for multiple
applications
- Django framework for web
applications
- We will focus on libraries for
Data Analysis
[email protected]
What is Python

[email protected]
What is NumPy

Stands for "Numeric Python" or "Numerical Python".

● Open Source
● Module of Python
● Provides fast mathematical functions

[email protected]
What is NumPy

scikitlearn tensorflow

numpy
Python
matplotlib
pandas

The complete Machine Learning eco-system.

[email protected]
Why use NumPy ?

● Array-oriented computing
● Efficiently implemented multi-dimensional arrays
● Designed for scientific computation
● Library of high-level mathematical functions

[email protected]
Numpy - Introduction

● NumPy’s main object is the homogeneous multidimensional

array
● It is a table of elements
○ usually numbers
○ all of the same type
○ indexed by a tuple of positive integers
● In NumPy dimensions are called axes
● The number of axes is rank

[email protected]
Numpy - Introduction

First Dimension / Axis, Len = 4

Second Dimension / Axis, Len = 3

[[ 0., 0., 0., 0.],

[ 0., 0., 0., 0.],

[ 0., 0., 0., 0.]])

The above array has a rank of 2 since it is 2

dimensional.

[email protected]
Creating Numpy arrays
np.array - Creating NumPy array from Python Lists/Tuple

Numpy arrays can be created from Python lists or tuple in the

following way.

>>> import numpy as np

>>> a = np.array([1, 2, 3])
>>> type(a)
<type 'numpy.ndarray'>
>>> b = np.array((3, 4, 5))
>>> type(b)
<type 'numpy.ndarray'>

[email protected]
Creating Numpy arrays
np.zeroes - An array with all Zeroes

To create an array with all zeroes the function np.zeroes is

used

>>> x = np.zeros( (3,4) )

>>> x
array([[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]])

[email protected]
Creating Numpy arrays
np.ones - An array with all Ones

To create an array with all ones the function np.ones is used.

>>> np.ones( (3,4), dtype=np.int16 )

array([[ 1, 1, 1, 1],
[ 1, 1, 1, 1],
[ 1, 1, 1, 1]])

[email protected]
Creating Numpy arrays
np.full - An array with a given value

To create an array with a given shape and a given value np.full

is used.

>>> np.full( (3,4), 0.11 )

array([[ 0.11, 0.11, 0.11, 0.11],
[ 0.11, 0.11, 0.11, 0.11],
[ 0.11, 0.11, 0.11, 0.11]])

[email protected]
Creating Numpy arrays
np.arange - Creating sequence of Numbers

>>> np.arange( 10, 30, 5 )

array([10, 15, 20, 25])
>>> np.arange( 0, 2, 0.3 )
# it accepts float arguments
array([ 0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8])

[email protected]
Creating Numpy arrays
np.linspace - Creating an array with evenly distributed numbers

● Returns an array having a specific number of points

● Evenly distributed between two values
● The maximum value is included, contrary to arange
Ending Number Total Number of points
Starting Number

>>> np.linspace(0, 5/3, 6)

array([0. , 0.33333333 , 0.66666667 , 1. , 1.33333333 1.66666667])

[email protected]
Creating Numpy arrays
np.random.rand - Creating an array with random numbers

Make a 2x3 matrix having random floats between 0 and 1:

>>> np.random.rand(2,3)
array([[ 0.55365951, 0.60150511, 0.36113117],
[ 0.5388662 , 0.06929014, 0.07908068]])

[email protected]
Creating Numpy arrays
np.empty - Creating an empty array

To create an uninitialised array with a given shape. Its content

is not predictable.

>>> np.empty((2,3))
array([[ 0.21288689, 0.20662218, 0.78018623],
[ 0.35294004, 0.07347101, 0.54552084]])

[email protected]
Important attributes of a NumPy object

The NumPy’s array class is called ndarray. The important

attributes of a ndarray object are -

ndarray.ndim
the number of axes (dimensions) of the array.
[[ 1., 0., 0.],
[ 0., 1., 2.]]

For the above array the value of ndarray.ndim is 2.

[email protected]
Important attributes of a NumPy object

ndarray.shape
the dimensions of the array. This is a tuple of integers
indicating the size of the array in each dimension.
[[ 1., 0., 0.],
[ 0., 1., 2.]]
For the above array the value of ndarray.shape is (2,3)

[email protected]
Important attributes of a NumPy object

ndarray.size
the total number of elements of the array. This is equal to
the product of the elements of shape.
[[ 1., 0., 0.],
[ 0., 1., 2.]]

For the above array the value of ndarray.size is 6.

[email protected]
Important attributes of a NumPy object

ndarray.dtype
Tells the datatype of the elements in the numpy array. All
the elements in a numpy array have the same type.
>>> c = np.arange(1, 5)
>>> c.dtype
dtype('int64')

[email protected]
Important attributes of a NumPy object

ndarray.itemsize
The itemsize attribute returns the size (in bytes) of each
item:
>>> c = np.arange(1, 5)
>>> c.itemsize
8

[email protected]
Reshaping Arrays

The function reshape is used to reshape the numpy array.

The following example illustrates this.

>>> a = np.arange(6)
>>> print(a)
[0 1 2 3 4 5]
>>> b = a.reshape(2, 3)
>>> print(b)
[[0 1 2],
[3 4 5]]

[email protected]
Indexing and Accessing NumPy arrays

[email protected]
Indexing one dimensional NumPy Arrays

0 1 2 3 4 5 6 Index

>>> a = np.array([1, 5, 3, 19, 13, 7, 3])

>>> a[3]
19
>>> a[2:5] #range
array([ 3, 19, 13])
>>> a[2::2] # How many to jump
array([ 3, 13, 3])
>>> a[::-1] #Go reverse
array([ 3, 7, 13, 19, 3, 5, 1])

[email protected]
Difference with regular Python arrays

1. If you assign a single value to an ndarray slice, it is copied

across the whole slice :
>>> a = np.array([1, 2, 5, 7, 8])
>>> a[1:3] = -1
>>> a
array([ 1, -1, -1, 7, 8])
----
>>> b = [1, 2, 5, 7, 8]
>>> b[1:3] = -1
TypeError: can only assign an iterable

[email protected]
Difference with regular Python arrays

2. ndarray slices are actually views on the same data buffer. If

you modify it, it is going to modify the original ndarray as well.

>>> a = np.array([1, 2, 5, 7, 8])

>>> a_slice = a[1:5]
>>> a_slice[1] = 1000
>>> a
array([ 1, 2, 1000, 7, 8])
# Original array was modified

[email protected]
Important attributes of a NumPy object

3. If you want a copy of the data, you need to use the copy
method as another_slice = a[2:6].copy() ,
if we modify another_slice, a remains same.

[email protected]
Indexing multi dimensional NumPy arrays
Multi-dimensional arrays can be accessed as
>>> b[1, 2] # row 1, col 2
>>> b[1, :] # row 1, all columns
>>> b[:, 1] # all rows, column 1

The following format is used while indexing multi-dimensional

arrays
Array[row_start_index:row_end_index, column_start_index:
column_end_index]

[email protected]
Boolean Indexing

We can also index arrays using an ndarray of boolean values on

one axis to specify the indices that we want to access.

>>> a = np.arange(12).reshape(3, 4)
>>> rows_on = np.array([ True, False, True])
>>> a[rows_on , : ] # Rows 0 and 3, all columns
array([[ 0, 1, 2, 3],
[ 8, 9, 10, 11]])

[email protected]
Linear Algebra with NumPy

[email protected]
Vectors

● A vector is a quantity defined by a magnitude and a direction.

● A vector can be represented by an array of numbers called
scalars.

[email protected]
Vectors

For example, say the rocket is going up at a slight angle: it has a

vertical speed of 5,000 m/s, and also a slight speed towards the
East at 10 m/s, and a slight speed towards the North at 50 m/s.
The rocket's velocity may be represented by the following
vector:

velocity 50 m/s

10 m/s

5,000 m/s
[email protected]
Use of Vectors in Machine Learning
● Vectors have many purposes in Machine Learning, most
notably to represent observations and predictions.
● For example, say we built a Machine Learning system to
classify videos into 3 categories (good, spam, clickbait) based
on what we know about them.
Good

Spam

Clickbait

[email protected]
Use of Vectors in Machine Learning
● For each video, we would have a vector representing what
we know about it, such as:

Video

● This vector could represent a video that lasts 10.5 minutes,

but only 5.2% viewers watch for more than a minute, it gets
3.25 views per day on average, and it was flagged 7 times as
spam. As you can see, each axis may have a different
meaning.

[email protected]
Use of Vectors in Machine Learning

● Based on this vector our Machine Learning system may

predict that there is an 80% probability that it is a spam
video, 18% that it is clickbait, and 2% that it is a good video.
This could be represented as the following vector:
Spam

class_probabilities Clickbait
Good

[email protected]
Representing Vectors in Python

● In python, a vector can be represented in many ways, the

simplest being a regular python list of numbers.
○ [1,1,1,1]
● Since Machine Learning requires lots of scientific calculations,
it is much better to use NumPy's ndarray, which provides a
lot of convenient and optimized implementations of essential
mathematical operations on vectors.
● numpy.array([1,1,1,1])

[email protected]
Vectorized Operations

● Vectorized operations are far more efficient

● Than loops written in Python to do the same thing
● Let’s test it

[email protected]
Vectorized Operations

Matrix multiplication
1. Using for loop
>>> def multiply_loops(A, B):
C = np.zeros((A.shape[0], B.shape[1]))
for i in range(A.shape[1]):
for j in range(B.shape[0]):
C[i, j] = A[i, j] * B[j, i]
return C

2. Using NumPy's matrix-matrix multiplication operator

>>> def multiply_vector(A, B):
return A @ B

[email protected]
Vectorized Operations

Matrix multiplication - Sample data

# Two randomly-generated, 100x100 matrices

>>> X = np.random.random((100, 100))

>>> Y = np.random.random((100, 100))

[email protected]
Vectorized Operations
Matrix multiplication - Loops - timeit Matrix multiplication - Vector - timeit

# First, using the explicit # Second, the NumPy

loops: multiplication:
>>> %timeit >>> %timeit
multiply_loops(X, Y) multiply_vector(X, Y)

4.23 ms ± 107 µs per loop 46.6 µs ± 346 ns per loop

(mean ± std. dev. of 7 runs, (mean ± std. dev. of 7 runs,
100 loops each) 10000 loops each)

Result - It took about 4.23 Result - 46.6 microseconds (46.4

milliseconds (4.23∗10−3 seconds) to ∗10−6 seconds) per multiplication
perform one matrix-matrix
multiplication Conclusion - Two orders of
magnitude faster

[email protected]
Basic Operations on NumPy arrays

[email protected]
Addition in NumPy arrays

Addition can be performed on NumPy arrays as shown below.

They apply element wise.

>>> a = np.array( [20, 30, 40, 50] )

>>> b = np.arange( 4 )
>>> b
array([0, 1, 2, 3])
>>> c = a + b
>>> c
array([20, 31, 42, 53])

[email protected]
Subtraction in NumPy arrays

Subtraction can be performed on NumPy arrays as shown

below. They apply element wise.
>>> a = np.array( [20, 30, 40, 50] )
>>> b = np.arange( 4 )
>>> b
array([0, 1, 2, 3])
>>> c = a - b
>>> c
array([20, 29, 38, 47])

[email protected]
Element wise product in NumPy arrays

Element wise product can be performed on NumPy arrays as

shown below.
>>> A = np.array( [[1,1],
... [0,1]] )
>>> B = np.array( [[2,0],
... [3,4]] )
>>> A*B # element wise product
array([[2, 0],
[0, 4]])

[email protected]
Matrix Product in NumPy arrays

Matrix product can be performed on NumPy arrays as shown

below.
>>> A = np.array( [[1,1],
... [0,1]] )
>>> B = np.array( [[2,0],
... [3,4]] )
>>> np.dot(A, B) # matrix product
array([[5, 4],
[3, 4]])

[email protected]
Division in NumPy arrays

Division can be performed on NumPy arrays as shown below.

They apply element wise.

a = np.array( [20, 30, 40, 50] )

b = np.arange(1, 5)
c = a / b
c
array([ 20. , 15. , 13.33333333, 12.5
])

[email protected]
Integer Division in NumPy arrays

Division can be performed on NumPy arrays as shown below.

They apply element wise.

a = np.array( [20, 30, 40, 50] )

b = np.arange(1, 5)
c = a // b
c
array([20, 15, 13, 12])

[email protected]
Modulus in NumPy arrays

Modulus operator can be applied on NumPy arrays as shown

below. They apply element wise.
a = np.array( [20, 30, 40, 50] )
b = np.arange(1, 5)
c = a % b
c
array([0, 0, 1, 2])

[email protected]
Exponents in NumPy arrays

We can find the exponent of each element in a NumPy array

in the following way. It is applied element wise.

a = np.array( [20, 30, 40, 50] )

b = np.arange(1, 5)
c = a ** b
c
array([ 20, 900, 64000, 6250000])

[email protected]
Conditional Operators on NumPy arrays

Conditional operators are also applied element-wise

m = np.array([20, -5, 30, 40])
m < [15, 16, 35, 36]
array([False, True, True, False], dtype=bool)

m < 25
array([ True, True, False, False], dtype=bool)

To get the elements below 25

m[m < 25]
array([20, -5])

[email protected]
Broadcasting in NumPy arrays

[email protected]
What is Broadcasting ?

1 2 0 2 1 4

4 5 3 4 7 9

1 2 5
???
4 5 7

[email protected]
What is Broadcasting ?

In general, when NumPy expects arrays of the same shape but

finds that this is not the case, it applies the so-called
broadcasting rules.

Basically there are 2 rules of Broadcasting to remember.

[email protected]
First rule of Broadcasting

[[[1, 3 ]]] + [5] [[[6, 8]]]

Shape (1, 1, 2) (1, ) (1, 1, 2)

If the arrays do not have the same rank, then a 1 will be

prepended to the smaller ranking arrays until their ranks match.

[email protected]
First rule of Broadcasting

>>> h = np.arange(5).reshape(1, 1, 5)
h
>>> array([[[0, 1, 2, 3, 4]]])
Let's try to add a 1D array of shape (5,) to this 3D array of
shape (1,1,5), applying the first rule of broadcasting.
h + [10, 20, 30, 40, 50] # same as: h + [[[10, 20, 30, 40, 50]]]
array([[[10, 21, 32, 43, 54]]])

[email protected]
Second rule of Broadcasting

On adding a 2D array of shape (2,1) to a 2D ndarray of shape

(2, 3). NumPy will apply the second rule of broadcasting

>>> k = np.arange(6).reshape(2, 3)
>>> k
array([[0, 1, 2],
[3, 4, 5]])

>>> k + [100, 200, 300]

array([[100, 201, 302],
[103, 204, 305]])

[email protected]
Mathematical and statistical
functions on NumPy arrays

[email protected]
Finding Mean of NumPy array elements

The ndarray object has a method mean() which finds the mean
of all the elements in the array regardless of the shape of the
numpy array.

>>> a = np.array([[-2.5, 3.1, 7], [10, 11, 12]])

>>> print("mean =", a.mean())
mean = 6.76666666667

[email protected]
Other useful ndarray methods

Similar to mean there are other ndarray methods which can be

used for various computations.

min - returns the minimum element in the ndarray

max - returns the maximum element in the ndarray
sum - returns the sum of the elements in the ndarray
prod - returns the product of the elements in the ndarray
std - returns the standard deviation of the elements in the
ndarray.
var - returns the variance of the elements in the ndarray.

[email protected]
Other useful ndarray methods
>>> a = np.array([[-2.5, 3.1, 7], [10, 11, 12]])

>>> for func in (a.min, a.max, a.sum, a.prod, a.std,

a.var):
print(func.__name__, "=", func())

min = -2.5
max = 12.0
sum = 40.6
prod = -71610.0
std = 5.08483584352
var = 25.8555555556
[email protected]
Summing across different axes
We can sum across different axes of a numpy array by
specifying the axis parameter of the sum function.

>>> c=np.arange(24).reshape(2,3,4)
>>> c
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],

[[12, 13, 14, 15],

[16, 17, 18, 19],
[20, 21, 22, 23]]])

[email protected]
Summing across different axes

>>> c.sum(axis=0) # sum across matrices

array([[12, 14, 16, 18],
[20, 22, 24, 26],
[28, 30, 32, 34]])

[email protected]
Transposing Matrices
The T attribute is equivalent to calling transpose() when the
rank is ≥2

>>> m1 = np.arange(6).reshape(2,3)
>>> m1
array([[0, 1, 2],
[3, 4, 5]])
>>> m1.T
array([[0, 3],
[1, 4],
[2, 5]])

[email protected]
Solving a system of linear scalar equations
The solve function solves a system of linear scalar equations,
such as:

2x + 6y = 6
5x + 3y = -9

[email protected]
Solving a system of linear scalar equations
>>> coeffs = np.array([[2, 6], [5, 3]])
>>> depvars = np.array([6, -9])
>>> solution = linalg.solve(coeffs, depvars)
>>> solution
array([-3., 2.])

[email protected]
Solving a system of linear scalar equations
Let’s check the solution.

>>> coeffs.dot(solution), depvars

(array([ 6., -9.]), array([ 6, -9]))

[email protected]
References

● NumPy
○ https://docs.scipy.org/doc/

[email protected]
Questions?
https://discuss.cloudxlab.com
[email protected]
Pandas

[email protected]
What is Pandas?

● One of the most widely used Python libraries in Data Science after
NumPy and Matplotlib
● The Pandas library Provides
○ High-performance
○ Easy-to-use data structures and
○ Data analysis tools

[email protected]
Pandas - DataFrame

● The main data structure is the DataFrame

● In memory 2D table

○ Like Spreadsheet with column names and row label

[email protected]
Pandas - Data Analysis

● Many features available in Excel are available programmatically like

○ Creating pivot tables

○ Computing columns based on other columns

○ Plotting graphs

[email protected]
Pandas - Data Structures

● Series objects

○ 1D array, similar to a column in a spreadsheet

● DataFrame objects

○ 2D table, similar to a spreadsheet

● Panel objects

○ Dictionary of DataFrames

[email protected]
Pandas - Series Objects

Creating a Series
>>> import pandas as pd
>>> s = pd.Series([2,-1,3,5])

Output -
0 2
1 -1
2 3
3 5
dtype: int64

[email protected]
Pandas - Series Objects

Pass as parameters to NumPy functions

>>> import numpy as np
>>> np.square(s)

Output -
0 4
1 1
2 9
3 25
dtype: int64

[email protected]
Pandas - Series Objects

Arithmetic operation on the series

>>> s + [1000,2000,3000,4000]

Output -
0 1002
1 1999
2 3003
3 4005
dtype: int64

[email protected]
Pandas - Series Objects

Broadcasting
>>> s + 1000

Output -
0 1002
1 999
2 1003
3 1005
dtype: int64

[email protected]
Pandas - Series Objects

Binary and conditional operations

>>> s < 0

Output -
0 False
1 True
2 False
3 False
dtype: bool

[email protected]
Pandas - Series Objects

Index labels - Integer location

>>> s2 = pd.Series([68, 83, 112, 68])
>>> print(s2)

Output -
0 68
1 83
2 112
3 68
dtype: int64

[email protected]
Pandas - Series Objects

Index labels - Set Manually

>>> s2 = pd.Series([68, 83, 112, 68],
index=["alice", "bob", "charles", "darwin"])
>>> print(s2)

Output -
alice 68
bob 83
charles 112
darwin 68
dtype: int64

[email protected]
Pandas - Series Objects

Access the items in series

● By specifying integer location

>>> s2[1]

● By specifying label

>>> s2["bob"]

[email protected]
Pandas - Series Objects

Access the items in series - Recommendations

● Use the loc attribute when accessing by label

>>> s2.loc["bob"]

● Use iloc attribute when accessing by integer location

>>> s2.iloc[1]

[email protected]
Pandas - Series Objects

Init from Python dict

>>> weights = {"alice": 68, "bob": 83, "colin": 86,

"darwin": 68}
>>> s3 = pd.Series(weights)
>>> print(s3)

Output -
alice 68
bob 83
colin 86
darwin 68
dtype: int64
[email protected]
Pandas - Series Objects

Control the elements to include and specify their order

>>> s4 = pd.Series(weights, index = ["colin", "alice"])

>>> print(s4)

Output -
colin 86
alice 68
dtype: int64

[email protected]
Pandas - Series Objects

Automatic alignment

● When an operation involves multiple Series objects

● Pandas automatically aligns items by matching index labels

[email protected]
Pandas - Series Objects

Automatic alignment - example

>>> print(s2+s3)
Output -
alice 136.0
bob 166.0
charles NaN
colin NaN
darwin 136.0
dtype: float64

* Note NaN

[email protected]
Pandas - Series Objects

Automatic alignment

Do not forget to set the right index labels, else you may get surprising
results
>>> s5 = pd.Series([1000,1000,1000,1000])
>>> print(s2 + s5)
Output-
alice NaN
bob NaN
charles NaN
darwin NaN
0 NaN
1 NaN
[email protected]
Pandas - Series Objects

Init with a scalar

>>> meaning = pd.Series(42, ["life", "universe",

"everything"])
>>> print(meaning)

Output-

life 42
universe 42
everything 42
dtype: int64

[email protected]
Pandas - Series Objects

Series name - A Series can have a name

>>> s6 = pd.Series([83, 68], index=["bob", "alice"],

name="weights")
>>> print(s6)

* Here series name is weights

Output-
bob 83
alice 68
Name: weights, dtype: int64

[email protected]
Pandas - Series Objects

Plotting a series

>>> %matplotlib inline

>>> import matplotlib.pyplot as plt
>>> temperatures =
[4.4,5.1,6.1,6.2,6.1,6.1,5.7,5.2,4.7,4.1,3.9,3.5]
>>> s7 = pd.Series(temperatures, name="Temperature")
>>> s7.plot()
>>> plt.show()

[email protected]
Pandas - DataFrame Objects

● A DataFrame object represents

○ A spreadsheet,
○ With cell values,
○ Column names
○ And row index labels

● Visualize DataFrame as dictionaries of Series

[email protected]
Pandas - DataFrame Objects

Creating a DataFrame - Pass a dictionary of Series objects

>>> people_dict = {
"weight": pd.Series([68, 83, 112],index=["alice",
"bob", "charles"]),

"birthyear": pd.Series([1984, 1985, 1992],

index=["bob", "alice", "charles"], name="year"),

"children": pd.Series([0, 3], index=["charles",

"bob"]),

"hobby": pd.Series(["Biking", "Dancing"],

index=["alice", "bob"]),
}
[email protected]
Pandas - DataFrame Objects

Creating a DataFrame

>>> people = pd.DataFrame(people_dict)

>>> people

[email protected]
Pandas - DataFrame Objects

Creating a DataFrame - Important Notes

● The Series were automatically aligned based on their index

● Missing values are represented as NaN
● Series names are ignored (the name "year" was dropped)

[email protected]
Pandas - DataFrame Objects

DataFrame - Access a column

>>> people["birthyear"]

Output -

alice 1985
bob 1984
charles 1992
Name: birthyear, dtype: int64

[email protected]
Pandas - DataFrame Objects

DataFrame - Access the multiple columns

>>> people[["birthyear", "hobby"]]

Output -

[email protected]
Pandas - DataFrame Objects

Creating DataFrame - Include columns and/or rows and

guarantee order

>>> d2 = pd.DataFrame(
people_dict,
columns=["birthyear", "weight", "height"],
index=["bob", "alice", "eugene"]
)
>>> print(d2)

[email protected]
Pandas - DataFrame Objects

DataFrame - Accessing rows

● Using loc
○ people.loc["charles"]
● Using iloc
○ People.iloc[2]
Output -
birthyear 1992
children 0
hobby NaN
weight 112
Name: charles, dtype: object
[email protected]
Pandas - DataFrame Objects

DataFrame - Get a slice of rows

>>> people.iloc[1:3]

Output -

[email protected]
Pandas - DataFrame Objects

DataFrame - Pass a boolean array

>>> people[np.array([True, False, True])]

Output -

[email protected]
Pandas - DataFrame Objects

DataFrame - Pass boolean expression

>>> people[people["birthyear"] < 1990]

Output -

[email protected]
Pandas - DataFrame Objects

DataFrame - Adding and removing columns

>>> # Adds a new column "age"

>>> people["age"] = 2016 - people["birthyear"]

>>> # Adds another column "over 30"

>>> people["over 30"] = people["age"] > 30

>>> # Removes "birthyear" and "children" columns

>>> birthyears = people.pop("birthyear")
>>> del people["children"]

>>> people

[email protected]
Pandas - DataFrame Objects

DataFrame - A new column must have the same number of rows

>>> # alice is missing, eugene is ignored

>>> people["pets"] = pd.Series({

"bob": 0,
"charles": 5,
"eugene":1
})

>>> people

[email protected]
Pandas - DataFrame Objects

DataFrame - Add a new column using insert method after an

existing column

>>> people.insert(1, "height", [172, 181, 185])

>>> people

[email protected]
Pandas - DataFrame Objects

DataFrame - Add new columns using assign method

>>> (people
.assign(body_mass_index = lambda df:df["weight"]
/ (df["height"] / 100) ** 2)
.assign(overweight = lambda df:
df["body_mass_index"] > 25)
)

[email protected]
Pandas - DataFrame Objects

DataFrame - Sorting a DataFrame

● Use sort_index method

○ It sorts the rows by their index label
○ In ascending order
○ Reverse the order by passing ascending=False
○ Returns a sorted copy of DataFrame

[email protected]
Pandas - DataFrame Objects

DataFrame - Sorting a DataFrame

>>> people.sort_index(ascending=False)

[email protected]
Pandas - DataFrame Objects

DataFrame - Sorting a DataFrame - inplace argument

>>> people.sort_index(inplace=True)
>>> people

[email protected]
Pandas - DataFrame Objects

DataFrame - Sorting a DataFrame - Sort By Value

>>> people.sort_values(by="age", inplace=True)

>>> people

[email protected]
Pandas - DataFrame Objects

Plotting a DataFrame

>>> people.plot(
kind = "line",
x = "body_mass_index",
y = ["height", "weight"]
)
>>> plt.show()

[email protected]
Pandas - DataFrame Objects

DataFrames - Saving and Loading

● Pandas can save DataFrames to various backends such as

○ CSV
○ Excel (requires openpyxl library)
○ JSON
○ HTML
○ SQL database

[email protected]
Pandas - DataFrame Objects

DataFrames - Saving

Let’s create a new DataFrame my_df and save it in various formats

>>> my_df = pd.DataFrame(

[
["Biking", 68.5, 1985, np.nan],
["Dancing", 83.1, 1984, 3]
],
columns=["hobby","weight","birthyear","children"],
index=["alice", "bob"]
)
>>> my_df

[email protected]
Pandas - DataFrame Objects

DataFrames - Saving

● Save to CSV
○ >>> my_df.to_csv("my_df.csv")
● Save to HTML
○ >>> my_df.to_html("my_df.html")
● Save to JSON
○ >>> my_df.to_json("my_df.json")

[email protected]
Pandas - DataFrame Objects

DataFrames - What was saved?

>>> for filename in ("my_df.csv", "my_df.html",

"my_df.json"):
print("#", filename)
with open(filename, "rt") as f:
print(f.read())
print()

[email protected]
Pandas - DataFrame Objects

DataFrames - What was saved?

Note that the index is saved as the first column (with no name) in a CSV file

[email protected]
Pandas - DataFrame Objects
DataFrames - What was saved?

Note that the index is saved as <th> tags in HTML

[email protected]
Pandas - DataFrame Objects

DataFrames - What was saved?

Note that the index is saved as keys in JSON

[email protected]
Pandas - DataFrame Objects

DataFrames - Loading

● read_csv # For loading CSV files

● read_html # For loading HTML files

● read_excel # For loading Excel files

[email protected]
Pandas - DataFrame Objects

DataFrames - Load CSV file

>>> my_df_loaded = pd.read_csv("my_df.csv", index_col=0)

>>> my_df_loaded

[email protected]
Pandas - DataFrame Objects

DataFrames - Overview

● When dealing with large DataFrames, it is useful to get a quick overview

of its content
● Load housing.csv inside dataset directory to create a DataFrame and
get a quick overview

[email protected]
Pandas - DataFrame Objects

DataFrames - Overview

● Let’s understand below methods

○ head()
○ tail()
○ info()
○ describe()

[email protected]
Pandas - DataFrame Objects

DataFrames - Overview - head()

● The head method returns the top 5 rows

>>> housing = pd.read_csv("dataset/housing.csv")

>>> housing.head()

[email protected]
Pandas - DataFrame Objects

DataFrames - Overview - tail()

● The tail method returns the bottom 5 rows

● We can also pass the number of rows we want

>>> housing.tail(n=2)

[email protected]
Pandas - DataFrame Objects

DataFrames - Overview - info()

● The info method prints out the summary of each column's contents

>>> housing.info()

[email protected]
Pandas - DataFrame Objects

DataFrames - Overview - describe()

● The describe method gives a nice overview of the main aggregated

values over each column
○ count: number of non-null (not NaN) values
○ mean: mean of non-null values
○ std: standard deviation of non-null values
○ min: minimum of non-null values
○ 25%, 50%, 75%: 25th, 50th and 75th percentile of non-null values
○ max: maximum of non-null values
[email protected]
References

● Pandas
○ http://pandas.pydata.org/pandas-docs/stable/

[email protected]
Questions?
https://discuss.cloudxlab.com
[email protected]
Matplotlib

[email protected]
Matplotlib - Overview

● Matplotlib is a Python 2D plotting library

● Produces publication quality figures in a variety of
○ Hardcopy formats and
○ Interactive environments

[email protected]
Matplotlib - Overview

● Matplotlib can be used in

○ Python scripts
○ Python and IPython shell
○ Jupyter notebook
○ Web application servers
○ GUI toolkits

[email protected]
Matplotlib - pyplot Module

● matplotlib.pyplot
○ Collection of functions that make matplotlib work like MATLAB
○ Majority of plotting commands in pyplot have MATLAB analogs with
similar arguments

[email protected]
Matplotlib - pyplot Module

● matplotlib.pyplot
○ Collection of functions that make matplotlib work like MATLAB
○ Majority of plotting commands in pyplot have MATLAB analogs with
similar arguments

[email protected]
Matplotlib - pyplot Module - plot()

>>> import matplotlib.pyplot as plt

>>> plt.plot([1,2,3,4])
>>> plt.ylabel('some numbers')
>>> plt.show()

[email protected]
Matplotlib - pyplot Module - plot()

plot x versus y
>>> import matplotlib.pyplot as plt
>>> plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
>>> plt.ylabel('some numbers')
>>> plt.show()

[email protected]
Matplotlib - pyplot Module - Histogram

>>> import matplotlib.pyplot as plt

>>> x =
[21,22,23,4,5,6,77,8,9,10,31,32,33,34,35,36,37,18,49,50,
100]
>> num_bins = 5
>> plt.hist(x, num_bins, facecolor='blue')
>> plt.show()

[email protected]
References

● Matplotlib
○ https://matplotlib.org/tutorials/index.html

[email protected]
Questions?
https://discuss.cloudxlab.com
[email protected]

A Hands-On Introduction To Data Science
No ratings yet
A Hands-On Introduction To Data Science
2 pages
Real Python PDF
100% (1)
Real Python PDF
209 pages
Java Carter
No ratings yet
Java Carter
697 pages
Data Science Training in Naresh I Technologies
100% (3)
Data Science Training in Naresh I Technologies
18 pages
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet
Pythonfree PDF
100% (1)
Pythonfree PDF
77 pages
Power Point Presentation On Topic: Python: Submitted By: Himani Kathal
No ratings yet
Power Point Presentation On Topic: Python: Submitted By: Himani Kathal
12 pages
Python Final Print Vision 22
No ratings yet
Python Final Print Vision 22
112 pages
Python Guide PDF
100% (1)
Python Guide PDF
82 pages
00 01 Python Course Guide PDF
100% (1)
00 01 Python Course Guide PDF
148 pages
StatisticsMachineLearningPythonDraft PDF
100% (1)
StatisticsMachineLearningPythonDraft PDF
219 pages
What Is Python?
No ratings yet
What Is Python?
28 pages
Python3 Programming Language: Tahani Almanie
100% (1)
Python3 Programming Language: Tahani Almanie
57 pages
STAT 451: Intro To Machine Learning Lecture Notes
100% (1)
STAT 451: Intro To Machine Learning Lecture Notes
17 pages
LAB Manual
No ratings yet
LAB Manual
100 pages
Python - Follow Dr. AngShu (@drangshu) For More
100% (1)
Python - Follow Dr. AngShu (@drangshu) For More
300 pages
Python Project Documentation: Release 1.0
No ratings yet
Python Project Documentation: Release 1.0
15 pages
Python Programming
No ratings yet
Python Programming
89 pages
Python
0% (1)
Python
67 pages
Using Python Libraries
50% (2)
Using Python Libraries
101 pages
Python Notes
No ratings yet
Python Notes
77 pages
Python Programming Notes
No ratings yet
Python Programming Notes
144 pages
Python I Compiled Notes
100% (3)
Python I Compiled Notes
321 pages
StatisticsMachineLearningPythonDraft PDF
100% (1)
StatisticsMachineLearningPythonDraft PDF
323 pages
Python Network Programming Cookbook Sample Chapter
No ratings yet
Python Network Programming Cookbook Sample Chapter
28 pages
Lesson 5 Python For Loops While Loops
No ratings yet
Lesson 5 Python For Loops While Loops
7 pages
Python List of Programs
50% (2)
Python List of Programs
2 pages
Advanced Python Tips
No ratings yet
Advanced Python Tips
50 pages
Machine Learning With Python
100% (1)
Machine Learning With Python
14 pages
Python: An Introduction Python: An Introduction
100% (1)
Python: An Introduction Python: An Introduction
82 pages
Python Question Solution
No ratings yet
Python Question Solution
11 pages
Python DataScience Cheat-Sheet
100% (1)
Python DataScience Cheat-Sheet
7 pages
Fundamentals of Networking Chapter 1
100% (2)
Fundamentals of Networking Chapter 1
197 pages
A Python Book
No ratings yet
A Python Book
148 pages
Untitled
100% (3)
Untitled
512 pages
Python - Programming
No ratings yet
Python - Programming
9 pages
OceanofPDF - Com Python Machine Learning The Beginners Gu - Lilly Trinity
No ratings yet
OceanofPDF - Com Python Machine Learning The Beginners Gu - Lilly Trinity
115 pages
Building Python Programs 1st Edition Stuart Reges 2024 Scribd Download
100% (3)
Building Python Programs 1st Edition Stuart Reges 2024 Scribd Download
50 pages
75 Python Object Oriented Progr - Learning, Edcorner
No ratings yet
75 Python Object Oriented Progr - Learning, Edcorner
112 pages
365 Data Science Program Roadmap
No ratings yet
365 Data Science Program Roadmap
1 page
Django
No ratings yet
Django
126 pages
Data Science Text Book PDF
No ratings yet
Data Science Text Book PDF
1 page
Basics: Showing Output To User
No ratings yet
Basics: Showing Output To User
17 pages
List Comprehension in Python
No ratings yet
List Comprehension in Python
8 pages
An Intro To Python and Algorithms
No ratings yet
An Intro To Python and Algorithms
199 pages
Pytest PDF
No ratings yet
Pytest PDF
219 pages
Lecture 2-Variables, Data Types and Arithmetic Expressions
No ratings yet
Lecture 2-Variables, Data Types and Arithmetic Expressions
35 pages
GE8151 Problem Solving and Python Programming - 03 - by LearnEngineering - in
No ratings yet
GE8151 Problem Solving and Python Programming - 03 - by LearnEngineering - in
100 pages
Corejavabynageswararaopdffreedownload PDF
0% (2)
Corejavabynageswararaopdffreedownload PDF
3 pages
Anaconda Cheat Sheet
No ratings yet
Anaconda Cheat Sheet
1 page
CSE-Machine Learning & Big Data - WSS Source Book
No ratings yet
CSE-Machine Learning & Big Data - WSS Source Book
181 pages
Python For Beginners. 2 Books in 1 - A Completed Guide To Master The Basics of Python Language
100% (2)
Python For Beginners. 2 Books in 1 - A Completed Guide To Master The Basics of Python Language
370 pages
Java Script: - CGI (Common Gateway Interface) Programs
No ratings yet
Java Script: - CGI (Common Gateway Interface) Programs
61 pages
Python For Education
No ratings yet
Python For Education
110 pages
Python Unleashed: Mastering the Art of Efficient Coding
From Everand
Python Unleashed: Mastering the Art of Efficient Coding
James Livingston
No ratings yet
Python Interview Questions: Ultimate Guide to Success
From Everand
Python Interview Questions: Ultimate Guide to Success
Meenu Kohli
No ratings yet
Introduction to Python Programming: Learn Coding with Hands-On Projects for Beginners
From Everand
Introduction to Python Programming: Learn Coding with Hands-On Projects for Beginners
Kiet Huynh
No ratings yet
NW.js Essentials
From Everand
NW.js Essentials
Alessandro Benoit
No ratings yet
New Learning of Python by Practical Innovation and Technology
From Everand
New Learning of Python by Practical Innovation and Technology
Sudhir Pathania
No ratings yet
Python Deep Learning Complete Self-Assessment Guide
From Everand
Python Deep Learning Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
10.1515_csh-2023-0015
No ratings yet
10.1515_csh-2023-0015
22 pages
(Chapman & Hall_CRC The Python Series) William J.B. Mattingly - Introduction to Python for Humanists-CRC Press_Chapman & Hall (2023)
No ratings yet
(Chapman & Hall_CRC The Python Series) William J.B. Mattingly - Introduction to Python for Humanists-CRC Press_Chapman & Hall (2023)
362 pages
32_TheGAMETRAPP_ProjPost-editingNMTofResearchAbstr_in_a_Gamified-Env_WSeal
No ratings yet
32_TheGAMETRAPP_ProjPost-editingNMTofResearchAbstr_in_a_Gamified-Env_WSeal
5 pages
Visual Text Analysis in Digital Humanities: Forum
No ratings yet
Visual Text Analysis in Digital Humanities: Forum
25 pages
Westfahl1993 - Neologism Science Fiction
No ratings yet
Westfahl1993 - Neologism Science Fiction
15 pages
The English Journal Volume 62 Issue 7 1973 (Doi 10.2307 - 813614) Friend, Beverly - Strange Bedfellows - Science Fiction Linguistics & Education
No ratings yet
The English Journal Volume 62 Issue 7 1973 (Doi 10.2307 - 813614) Friend, Beverly - Strange Bedfellows - Science Fiction Linguistics & Education
7 pages
A Review of Literature of Computer-Assisted Translation: September 2018
No ratings yet
A Review of Literature of Computer-Assisted Translation: September 2018
21 pages
Machine Learning For Sociology: Annual Review of Sociology
No ratings yet
Machine Learning For Sociology: Annual Review of Sociology
19 pages
Literature Study
No ratings yet
Literature Study
18 pages
01 - Introduction To Angular
No ratings yet
01 - Introduction To Angular
15 pages
Embedded System Report-2
No ratings yet
Embedded System Report-2
45 pages
IMA Unit 4
No ratings yet
IMA Unit 4
30 pages
The Sullair: Portable Air Compressor
No ratings yet
The Sullair: Portable Air Compressor
2 pages
Manual v1.20 de Maleta Square D
No ratings yet
Manual v1.20 de Maleta Square D
132 pages
Simulatiing Porosity in Ductile Iron - Etip17
No ratings yet
Simulatiing Porosity in Ductile Iron - Etip17
2 pages
Arthur Morris-Geography and Development (1998)
No ratings yet
Arthur Morris-Geography and Development (1998)
196 pages
Tax Law Project
No ratings yet
Tax Law Project
12 pages
Millipore Paper
No ratings yet
Millipore Paper
7 pages
Progress Billing
No ratings yet
Progress Billing
10 pages
Quectel LPWA Module Product Overview V5.3
No ratings yet
Quectel LPWA Module Product Overview V5.3
62 pages
JBL Radial Speaker Dock
No ratings yet
JBL Radial Speaker Dock
61 pages
A Global Review of Metro Station Construction Projects
100% (1)
A Global Review of Metro Station Construction Projects
9 pages
Sumayya Update CV
No ratings yet
Sumayya Update CV
4 pages
Onboarding Requirments - Google Forms
No ratings yet
Onboarding Requirments - Google Forms
3 pages
Relación Entre Los Componentes Del Balance de Energía y La Resistencia Estomática en El Cultivo de Melón Bajo Acolchado Plástico
No ratings yet
Relación Entre Los Componentes Del Balance de Energía y La Resistencia Estomática en El Cultivo de Melón Bajo Acolchado Plástico
12 pages
Datasheet JNS60MB295 310 1
No ratings yet
Datasheet JNS60MB295 310 1
2 pages
A Comprehensive Review Paper
No ratings yet
A Comprehensive Review Paper
36 pages
PCRF Brochure
No ratings yet
PCRF Brochure
4 pages
Module 1 - Conflict Resolution Strategy
No ratings yet
Module 1 - Conflict Resolution Strategy
53 pages
Abdul Karim Ltd (Dredging Unit)_1.2
No ratings yet
Abdul Karim Ltd (Dredging Unit)_1.2
1 page
HPE OneView Startup Installation and Configuration Service Data Sheet-4aa4-2814enw
No ratings yet
HPE OneView Startup Installation and Configuration Service Data Sheet-4aa4-2814enw
5 pages
Compensation Management
100% (1)
Compensation Management
32 pages
PHILIPS 32PHG5102 - Chasis TPM17.7L LA
100% (2)
PHILIPS 32PHG5102 - Chasis TPM17.7L LA
48 pages
Be Artificial Intelligence and Data Science Semester 3 2022 October Fundamentals of Data Structure Fods Pattern 2019
No ratings yet
Be Artificial Intelligence and Data Science Semester 3 2022 October Fundamentals of Data Structure Fods Pattern 2019
2 pages
INTESTATE TESTATE OF TEOFILO vs. PSC
No ratings yet
INTESTATE TESTATE OF TEOFILO vs. PSC
2 pages
Cam Motion S1351 TD - ENG
No ratings yet
Cam Motion S1351 TD - ENG
4 pages
Snapchat: A Parents' Guide To
No ratings yet
Snapchat: A Parents' Guide To
6 pages
Activity 1: ABM01 - Applied Economics Lesson 3 - SEATWORK
100% (1)
Activity 1: ABM01 - Applied Economics Lesson 3 - SEATWORK
2 pages