100% found this document useful (2 votes)

333 views8 pages

Pandas Summarized Visually in 8

Uploaded by

qwerty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

333 views8 pages

Pandas Summarized Visually in 8

Uploaded by

qwerty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Reading and Writing Data with Pandas

pandas
Methods to read data are all named read_* to_*
pd.read_* where * is the file type. Series
and DataFrames can be saved to disk
using their to_* method.

DataFrame
Usage Patterns h5 X Y Z h5

a
• Use pd.read_clipboard() for one oﬀ data b
extractions. c

• Use the other pd.read_* methods in scripts

for repeatable analyses.

+ +
Reading Text Files into a DataFrame
olors highlight ho diﬀerent arguments ma from the data ﬁle to a ata rame.

# Historical_data.csv
Date Cs Rd
Date, Cs, Rd >>> read_table(
2005-01-03, 64.78, - 'historical_data.csv',
sep=',',
2005-01-04, 63.79, 201.4
header=1,
2005-01-05, 64.46, 193.45
skiprows=1,
... skipfooter=2,
Data from Lab Z. index_col=0,
Recorded by Agent E parse_dates=True,
na_values=['-'])

Other arguments: Possible values of parse_dates:

• names: set or override column names • [0, 2]: Parse columns 0 and 2 as separate dates
• parse_dates: accepts multiple argument types, see on the right • [[0, 2]]: Group columns 0 and 2 and parse as single date
• converters: manually process each element in a column • {'Date': [0, 2]}: Group columns 0 and 2, parse as
• comment: character indicating commented line single date in a column named ate.
• chunksize: read only a certain number of rows each time ates are arsed after the converters have been applied.

Parsing Tables from the Web

, ,
X Y X Y X Y
a a a
>>> df_list = read_html(url) b b b
c c c

Writing Data Structures to Disk From and To a Database

Writing data structures to disk: Read, using SQLAlchemy. Supports multiple databases:
> s_df.to_csv(filename) > from sqlalchemy import create_engine
> s_df.to_excel(filename) > engine = create_engine(database_url)
> conn = engine.connect()
rite multi le ata rames to single cel ﬁle > df = pd.read_sql(query_str_or_table_name, conn)
> writer = pd.ExcelWriter(filename)
> df1.to_excel(writer, sheet_name='First') Write:
> df2.to_excel(writer, sheet_name='Second') > df.to_sql(table_name, conn)
> writer.save()

1
Tak e your Pandas ski l ls to the n ex t l ev e l ! Re g i s te r a t ww w .e nt h oug ht. c om /p and a s-m as te r y -w or ks ho p
© 2 01 9 E n t hou g ht , In c., l i cen se d u nder t h e C rea t ive C om mo ns At t ribution-N onCom m er cial-N oDeri va tives 4.0 Inter nat ional L icense.
To vi ew a copy o f th i s l ice nse, visit ht tp :/ / creat ivecom m ons.or g/ licenses /by-nc-nd/ 4.0 /
Pandas Data Structures: Series and DataFrames
pandas
A Series, s, maps an index to values. It is:
• Like an ordered dictionary
• A Numpy array with row labels and a name
A DataFrame, df, maps index and column labels to values. It is:
Indexing and Slicing
• Like a dictionary of Series (columns) sharing the same index
• A 2D Numpy array with row and column labels Use these attributes on Series and DataFrames for indexing,
s_df applies to both Series and DataFrames. slicing, and assignments:
Assume that manipulations of Pandas object return copies.
s_df.loc[] Refers only to the index labels
s_df.iloc[] Refers only to the integer location,
similar to lists or Numpy arrays
Creating Series and DataFrames
s_df.xs(key, level) Select rows with label key in level
Series Series level of an object with MultiIndex.

> pd.Series(values, index=index,

Values
name=name) Masking and Boolean Indexing
> pd.Series({'idx1': val1, 'idx2': val2} n1 ‘Cary’ 0
Where values, index, and name are sequences or
n2 ‘Lynn’ 1 Create masks with, for example, comparisons
arrays.
mask = df['X'] < 0
DataFrame n3 ‘Sam’ 2 Or isin, for membership mask
Index Integer mask = df['X'].isin(list_valid_values)
location Use masks for indexing (must use loc)
Columns
DataFrame
Age Gender
df.loc[mask] = 0
‘Cary’ 32 M > pd.DataFrame(values, index=index, Combine multiple masks with bitwise operators (and (&), or (|), xor
columns=col_names) (^), not (~)) and group them with parentheses:
‘Lynn’ 18 F mask = (df['X'] < 0) & (df['Y'] == 0)
> pd.DataFrame({'col1': series1_or_seq,
‘Sam’ 26 M 'col2': series2_or_seq})
Where values is a sequence of sequences or a
Index Values
arra Common Indexing and Slicing Patterns
rows and cols can be values, lists, Series or masks.
Manipulating Series and DataFrames
s_df.loc[rows] Some rows (all columns in a DataFrame)
Manipulating Columns df.loc[:, cols_list] All rows, some columns
df.rename(columns={old_name: new_name}) Renames column df.loc[rows, cols] Subset of rows and columns
df.drop(name_or_names, axis='columns') Drops column name s_df.loc[mask] Boolean mask of rows (all columns)
df.loc[mask, cols] Boolean mask of rows, some columns
Manipulating Index
s_df.reindex(new_index) Conform to new index Using [ ] on Series and DataFrames
s_df.drop(labels_to_drop) Drops index labels
s_df.rename(index={old_label: new_label}) Renames index labels On Series, [ ] refers to the index labels, or to a slice
s_df.sort_index() Sorts index labels
df.set_index(column_name_or_names) s['a'] Value
s_df.reset_index() Inserts index into columns, resets index to s[:2] eries, ﬁrst ro s
default integer index.
On DataFrames, [ ] refers to columns labels:
Manipulating Values
df['X'] Series
All row values and the index will follow:
df[['X', 'Y']] DataFrame
df.sort_values(col_name, ascending=True)
df.sort_values(['X','Y'], ascending=[False, True]) df['new_or_old_col'] = series_or_array

Important Attributes and Methods EXCEPT! with a slice or mask.

df[:2] ata rame, ﬁrst ro s
s_df.index Array-like row labels
df[mask] DataFrame, rows where mask is
df.columns Array-like column labels
True
s_df.values Numpy array, data
s_df.shape (n_rows, m_cols) NEVER CHAIN BRACKETS!
s.dtype, df.dtypes Type of Series, of each column
len(s_df) Number of rows > df[mask]['X'] = 1
SettingWithCopyWarning
s_df.head() and s_df.tail() First/last rows
s.unique() Series of unique values > df.loc[mask , 'X'] = 1
s_df.describe() Summary stats
df.info() Memory usage

2
Tak e your Pandas ski l ls to the n ex t l ev e l ! Re g i s te r a t ww w .e nt h oug ht. c om /p and a s-m as te r y -w or ks ho p
© 2 01 9 E n t hou g ht , In c., l i cen se d u nder t h e C rea t ive C om mo ns At t ribution-N onCom m er cial-N oDeri va tives 4.0 Inter nat ional L icense.
To vi ew a copy o f th i s l ice nse, visit ht tp :/ / creat ivecom m ons.or g/ licenses /by-nc-nd/ 4.0 /
Computation with Series and DataFrames
pandas
Pandas objects do not behave exactly like Numpy arrays. They follow three
main rules (see on the right). Aligning objects on the index (or columns)
before calculations might be the most important difference. There are The 3 Rules of Binary Operations
built-in methods for most common statistical operations, such as mean
or sum, and they apply across one-dimension at a time. To apply
custom functions, use one of three methods to do tablewise (pipe), Rule 1:
row or column-wise (apply) or elementwise (applymap) Operations between multiple Pandas objects implement
operations. auto alignment ased on inde ﬁrst.
Rule 2:
Mathematical operators (+ - * / exp, log, ...) apply element by
Rule 1: Alignment First element, on the values.
Rule 3:
> s1 + s2 > s1.add(s2, fill_value=0) Reduction operations (mean, std, skew, kurt, sum, prod, ...)
s1 s2 s1 s2 are applied column by column by default.

a 1 NaN a NaN a 1 0 a 1
b 6
b 2
NaN
b 4
c 5 c NaN
b 2
0
b 4
c 5
b 6
c 5
Rule 2: Element-By-Element
Mathematical Operations
Use add, sub, mul, div, to set ﬁll alue.

df + 1 df.abs() np.log(df)
Rule 3: Reduction Operations
X Y X Y X Y X Y
>>> df.sum() Series a -2 -2 a -1 -1 a 1 1 a 0 0
b -2 -2 b -1 -1 b 1 1 b 0 0
X Y c -2 -2 c -1 -1 c 1 1 c 0 0
df.sum()
a X
b Y
c Apply a Function to Each Value
Operates across rows by default (axis=0, or axis='rows'). Apply a function to each value in a Series or DataFrame
Operate across columns with axis=1 or axis='columns'. s.apply(value_to_value) Series
df.applymap(value_to_value) DataFrame

count Number of non-null observations Apply a Function to Each Series

sum: Sum of values Apply series_to_* function to every column by default (across rows):
mean: Mean of values df.apply(series_to_series) DataFrame
mad: Mean absolute deviation df.apply(series_to_value) Series
median: Arithmetic median of values
min: Minimum To apply the function to every row (across columns), set axis=1:
max: Maximum df.apply(series_to_series, axis=1)
mode: Mode
prod: Product of values
std: Bessel-corrected sample
Apply a Function to a DataFrame
standard deviation Apply a function that receives a DataFrame and returns a DataFrame, a Series,
var: Unbiased variance or a single value:
df.pipe(df_to_df) DataFrame
sem: Standard error of the mean
df.pipe(df_to_series) Series
skew: Sample skewness
df.pipe(df_to_value) Value
(3rd moment)
kurt: Sample kurtosis
(4th moment) What Happens with Missing Values?
quantile: Sample quantile
(Value at %) Missing values are represented by NaN (not a number) or NaT (not a time).
value_counts: Count of unique • They propagate in operations across Pandas objects (1 + NaN NaN).
values • They are ignored in a "sensible" way in computations, they equal 0 in sum, they're
ignored in mean, etc.
• They stay NaN with mathematical operations (np.log(NaN) NaN).

Ta k e y ou r Pa n das sk il l s to th e next l e ve l ! Re g is t e r at w ww . e nthou g h t . c om/p a nd as -mas t e ry -wo rk s ho p

© 2 01 9 Ent hou g ht , In c., l i cen se d un der t he C rea t ive C omm o ns At tr ibut ion-N onC om m erc ial-No Derivat ives 4.0 I nternat ional Lic ense.
T o vi ew a copy o f th i s l ice nse, visit ht tp:/ /c reat ivecom m ons.or g/ licenses /by-nc-nd/ 4.0 /
Plotting with Pandas Series and DataFrames
pandas
Pandas uses Matplotlib to generate figures. Once a figure
is generated with Pandas, all of Matplotlib's functions Setup
can be used to modify the title, labels, legend, etc. In a
Jupyter notebook, all plotting calls for a given plot
Import packages:
should be in the same cell.
> import pandas as pd
> import matplotlib.pyplot as plt
Parts of a Figure ecute this at thon rom t to dis la ﬁgures
Figure
in new windows:
> %matplotlib
An Axes object is what we
think of as a “plot”. It has title Use this in Jupyter notebooks to display static
a title and two Axis images inline:
objects that deﬁne data > %matplotlib inline
y label

limits. Each Axis can have

a label. There can be Use this in Jupyter notebooks to display zoom-
multiple Axes objects in a Axes
able images inline:
Figure. x label Axis > %matplotlib notebook

Plotting with Pandas Objects

Series Dataframe Labels

X Y Z Experiment A
a X
a

Value
Y
b b Z
c c
Time

With a Series, Pandas plots values against the With a DataFrame, Pandas creates one line per Use Matplotlib to override or add annotations:
index: column: > ax.set_xlabel('Time')
> ax = s.plot() > ax = df.plot() > ax.set_ylabel('Value')
> ax.set_title('Experiment A')
When plotting the results of complex manipulations with groupby, it's often useful to
Pass labels if you want to override
stack/unstack the resulting ata rame to ﬁt the one line er column assum tion see
Data Structures cheatsheet). the column names and set the legend
location:
Useful Arguments to plot > ax.legend(labels, loc='best')

X Y
a
b
c

Red Panda
• subplots=True: one subplot per column, instead of one line Ailurus fulgens
• figsize set ﬁgure si e, in inches
• x and y: plot one column against another

Kinds of Plots

+
df.plot.scatter(x, y) df.plot.bar() df.plot.hist() df.plot.box()

4
Ta k e your Pa ndas sk i ll s to the nex t le ve l! Re g i st e r at w ww . e n t ho ug ht . c om/ p an d as -mas t e r y-w ork s h op
© 20 19 Ent ho ug h t, In c. , li cen sed u n der th e Cr eat ive C om m ons Att ributi on-NonCo m m ercial-NoDe rivati ve s 4.0 Int erna tional L icen se.
T o vi ew a copy o f t hi s l i cense , visit htt p:// creat iveco m mo ns .org/lic ense s/b y-nc -nd/4 .0/
Manipulating Dates and Times
pandas
Use a Datetime index for easy time-based indexing and slicing, as
well as for powerful resampling and data alignment.
Timestamps vs Periods
Pandas makes a distinction between timestamps, called
Timestamps
Datetime objects, and time spans, called Period objects.

2016-01-01 2016-01-02 2016-01-03 2016-01-04

Converting Objects to Time Objects

Periods
on ert different t es, for e am le strings, lists, or arra s to
... ...
Datetime with:
> pd.to_datetime(value) 2016-01-01 2016-01-02 2016-01-03
Convert timestamps to time spans: set period “duration” with
fre uenc offset see elo .
Save Yourself Some Pain:
> date_obj.to_period(freq=freq_offset)
Use ISO 8601 Format
Creating Ranges of Timestamps hen entering dates, to e consistent and to lo er the ris of error
or confusion, use format
> pd.date_range(start=None, end=None,
>>> pd.to_datetime('12/01/2000') # 1st December
periods=None, freq=offset,
Timestamp('2000-12-01 00:00:00')
tz='Europe/London')
>>> pd.to_datetime('13/01/2000') # 13th January!
ecif either a start or end date, or oth. et num er of
Timestamp('2000-01-13 00:00:00')
"steps" with periods. Set "step size" with freq; see "Frequen-
>>> pd.to_datetime('2000-01-13') # 13th January
c offsets for acce ta le alues. ecif time ones ith tz.
Timestamp('2000-01-13 00:00:00')

Frequency Offsets
Used by date_range, period_range and resample:
Creating Ranges or Periods
• B: Business day • A: Year end > pd.period_range(start=None, end=None,
• D: Calendar day • AS: Year start periods=None, freq=offset)
• W: Weekly • H: Hourly
• M: Month end T, min inutel
• MS: Month start • S: Secondly
Resampling
• BM: Business month end , ms illiseconds
> s_df.resample(freq_offset).mean()
• Q: Quarter end , us icroseconds
For more: • N: Nanoseconds resample returns a groupby-like object that must be
oo u andas ﬀset liases or chec out pandas.tseries.offsets, aggregated with mean, sum, std, apply, etc. ee also the
and pandas.tseries.holiday modules. lit l om ine cheat sheet.

Vectorized String Operations

Pandas implements vectorized string operations named after
Python's string methods. Access them through the str
Splitting and Replacing
attribute of string Series
split returns a Series of lists:
> s.str.split()
Some String Methods Access an element of each list with get:
> s.str.split(char).str.get(1)
> s.str.lower() > s.str.strip()
> s.str.isupper() > s.str.normalize()
Return a DataFrame instead of a list:
> s.str.len() and more… > s.str.split(expand=True)
Index by character position:
> s.str[0] Find and replace with string or regular expressions:
> s.str.replace(str_or_regex, new)
True if regular expression pattern or string in Series: > s.str.extract(regex)
> s.str.contains(str_or_pattern) > s.str.findall(regex)

5
Tak e your Pandas ski l ls to the n ex t l ev e l ! Re g i s te r a t ww w .e nt h oug ht. c om /p and a s-m as te r y -w or ks ho p
© 2 01 9 E n t hou g ht , In c., l i cen se d u nder t h e C rea t ive C om mo ns At t ribution-N onCom m er cial-N oDeri va tives 4.0 Inter nat ional L icense.
To vi ew a copy o f th i s l ice nse, visit ht tp :/ / creat ivecom m ons.or g/ licenses /by-nc-nd/ 4.0 /
Combining DataFrames
pandas
Tools for combining Series and DataFrames together, with
SQL-type joins and concatenation. Use join if merging Concatenating DataFrames
on indices, otherwise use merge. > pd.concat(df_list)
“Stacks” DataFrames on top of each other.
Set ignore_index=True, to replace index with RangeIndex.
Note: Faster than repeated df.append(other_df).
Merge on Column Values
> pd.merge(left, right, how='inner', on='id')
Ignores index, unless on=None. See value of how below.
Join on Index
Use on if merging on same column in both DataFrames, otherwise
> df.join(other)
use left_on, right_on.
Merge DataFrames on indexes. Set on=columns to join on index
of other and on columns of df. join uses pd.merge under
Merge Types: The how Keyword the covers.

left left_on='X' right_on='Y' right

long X long X Y short Y short

left right how="outer" 0 aaaa a 0 aaaa a 0 b bb
1 bbbb b 1 bbbb b b bb 1 c cc
2 c cc

long X long X Y short Y short

left right how="inner" 0 aaaa a 0 bbbb b b bb 0 b bb
1 bbbb b 1 c cc

long X long X Y short Y short

left right how="left" 0 aaaa a 0 aaaa a 0 b bb
1 bbbb b 1 bbbb b b bb 1 c cc

long X long X Y short Y short

left right how="right" 0 aaaa a 0 bbbb b b bb 0 b bb
1 bbbb b 1 c cc 1 c ctc

Cleaning Data with Missing Values

Pandas represents missing values as NaN (Not a Number).
It comes from Numpy and is of type float64. Pandas
Replacing Missing Values
has many methods to find and replace missing values.
s_df.loc[s_df.isnull()] = 0 Use mask to replace NaN

Find Missing Values s_df.interpolate(method='linear') nter olate using diﬀerent methods

s_df.fillna(method='ffill') Fill forward (last valid value)
> s_df.isnull() or > pd.isnull(obj)
s_df.fillna(method='bfill') Or backward (next valid value)
> s_df.notnull() or > pd.notnull(obj)
s_df.dropna(how='any') Drop rows if any value is NaN
s_df.dropna(how='all') Drop rows if all values are NaN
s_df.dropna(how='all', axis=1) Drop across columns instead of rows

6
Tak e your Pandas ski l ls to the n ex t l ev e l ! Re g i s te r a t ww w .e nt h oug ht. c om /p and a s-m as te r y -w or ks ho p
© 2 01 9 E n t hou g ht , In c., l i cen se d u nder t h e C rea t ive C om mo ns At t ribution-N onCom m er cial-N oDeri va tives 4.0 Inter nat ional L icense.
To vi ew a copy o f th i s l ice nse, visit ht tp :/ / creat ivecom m ons.or g/ licenses /by-nc-nd/ 4.0 /
Split / Apply / Combine with DataFrames
pandas
1. Split the data based on some criteria.
2. Apply a function to each group to aggregate, transform, or
filter. Split/Apply/Combine
3. Combine the results.
The apply and combine steps are typically done together in X Y
Pandas. a 1 1.5
X Y a 2
a 1 X Y
Split: Group By b
c
3
2
X Y
b 3 2
a
b
1.5
2
Group by a single column: b 1 b 1 c 2
> g = df.groupby(col_name) c 2
a 2 X Y
Grouping with list of column names creates DataFrame with MultiIndex. c 2 2
(see “Reshaping DataFrames and Pivot Tables” cheatsheet): c 2
> g = df.groupby(list_col_names)
Pass a function to group based on the index:
Split Apply Combine
> g = df.groupby(function)
• Groupby • Apply
• Window Functions • Group-specific transformations
X Y Z
0 a • Aggregation
X Y Z 2 a • Group-specific Filtering
0 a
df.groupby('X')
1 b X Y Z
2
3
a
b
1 b
3 b
Split: What’s a GroupBy Object?
4 c
X Y Z
It keeps track of which rows are part of which group.
4 c
> g.groups Dictionary, where keys are group
names, and values are indices of rows in a given group.
Apply/Combine: General Tool: apply It is iterable:
> for group, sub_df in g:
More general than agg, transform, and filter. Can
...
aggregate, transform or ﬁlter. The resulting dimensions
can change, for example:
> g.apply(lambda x: x.describe())
Apply/Combine: Aggregation
Perform computations on each group. The shape changes;
Apply/Combine: Transformation the categories in the grouping columns become the index.
Can use built-in aggregation methods: mean, sum, size,
The shape and the index do not change.
count, std, var, sem, describe, first, last, nth,
> g.transform(df_to_df)
min, max, for example:
Example, normalization:
> g.mean()
> def normalize(grp):
… or aggregate using custom function:
. return (grp - grp.mean()) / grp.var()
> g.agg(series_to_value)
> g.transform(normalize)
… or aggregate with multiple functions at once:

X Y Z > g.agg([s_to_v1, s_to_v2])

0 a 1 1 X Y Z … or use diﬀerent functions on diﬀerent columns.
2 a 1 1 0 a 0 0 > g.agg({'Y': s_to_v1, 'Z': s_to_v2})
g.transform(…) 1 b 0 0
X Y Z
1 b 2 2 2 a 0 0 X Y Z
3 b 2 2 3 b 0 0 0 a
4 c 0 0 2 a
X Y Z
4 c 3 3 X Y Z Y Z
1 b g.agg(…) a
3 b
Apply/Combine: Filtering
b
c
X Y Z
4 c
Returns a group only if condition is true.
> g.filter(lambda x: len(x)>1)

X Y Z
Other Groupby-Like Operations: Window Functions
0 a 1 1
X Y Z • resample, rolling, and ewm (exponential weighted
2 a 1 1
0 a 1 1 0
X Y Z
g.filter(…) function) methods behave like GroupBy objects. They keep
1 b 1 1 1
1 b 1 1 track of which row is in which “group”. Results must be
2 a 1 1 2
3 b 1 1 aggregated with sum, mean, count, etc. (see Aggregation).
3 b 1 1 • resample is often used before rolling, expanding, and 3
X Y Z
4 c 0 0 ewm when using a DateTime index. 4

7
Tak e your Pa ndas sk i ll s to the nex t le v e l ! Re g i s t e r at ww w . e n th oug ht .c o m/p a nd as -m as te ry -wor k sho p
© 2 01 9 E nt h ou g ht , In c., li cen sed u nder th e Cre at ive C om m ons Att ributio n-NonCo mm e rcial-N oDe rivative s 4.0 Inte rnat ional L icense.
To vi ew a copy o f t hi s l i cense, visit ht t p:// creat iveco mm o ns.o rg/lice nse s/by-nc-nd /4.0 /
Reshaping DataFrames and Pivot Tables
pandas
Tools for reshaping DataFrames from the wide to the long format and back.
The long format can be tidy, which means that "each variable is a column,
each observation is a row"1. Tidy data is easier to filter, aggregate,
transform, sort, and pivot. Reshaping operations often produce multi-level
Long to Wide Format and Back
indices or columns, which can be sliced and indexed. with stack() and unstack()
1 Hadley Wickham (2014) "Tidy Data", http://dx.doi.org/10.18637/jss.v059.i10

Pivot column level to index, Pivot index level to columns,

i.e. "stacking the columns" "unstack the columns" (long to
MultiIndex: A Multi-Level (wide to long):
> df.stack()
wide):
> df.unstack()
Hierarchical Index If multiple indices or column levels, use level number or name to
stack/unstack:
Often created as a result of: > df.unstack(1) or > df.unstack('Month')
> df.groupby(list_of_columns)
> df.set_index(list_of_columns) A common use case for unstacking, plotting group data vs index
after groupby:
Contiguous labels are displayed together but apply to each row. The concept is > (df.groupby(['A', 'B])['relevant'].mean()
similar to multi-level columns. .unstack().plot())
Long
A MultiIndex allows indexing and slicing one or multiple levels at once. Using
the Long example from the right: Wide Year Month Value
Stack 1
Jan.
Year Jan. Feb. Mar.
long.loc[1900] All 1900 rows 1900 Feb 7
1900 1 7 2
long.loc[(1900, 'March')] value 2 Mar. 2
2000 4 3 9
long.xs('March', level='Month') All March rows Jan. 4
Unstack
Simpler than using boolean indexing, for example: 2000 Feb 3
> long[long.Month == 'March'] Mar. 9

Pivot Tables From Wide to Long with melt

ecif hich columns are identiﬁers id_vars, values will be
> pd.pivot_table(df,
repeated for each row) and which are "measured variables"
index=cols, (keys to group by for index)
(value_vars, will become values in variable column.
columns=cols2, (keys to group by for columns)
All remaining columns by default).
values=cols3, (columns to aggregate)
aggfunc='mean') (what to do with repeated values)
pd.melt(df, id_vars=id_cols, value_vars=value_columns)
Omitting index, columns, or values will use all remaining columns of df.
You can "pivot" a table manually using groupby, stack and unstack.
pd.melt(team, id_vars=['Color'],
Index value_vars=['A', 'B', 'C'],
Columns var_name='Team', value_name='Score')
Number of Continent Continent
0 Recently updated stations code AN EU
code
1 FALSE 1 EU Recently
updated Color Team Score
2 FALSE 1 EU Team 0 Red A 1
FALSE 1 3
Color A B C
3 FALSE 1 EU TRUE 2 1
Melt 1 Blue A 2
0 Red 1 3 4 2 Red B 3
4 TRUE 1 EU
pd.pivot_table(df, 1 Blue 2 - 6 3 Blue B -
5 FALSE 1 AN index="Recently updated", 4 Red C 4
columns="continent code",
6 TRUE 1 AN 5 Blue C 6
values="Number of Stations",
7 TRUE 1 AN
aggfunc=np.sum)

df.pivot() vs pd.pivot_table

df.pivot() Does not deal with repeated values in Red Panda

Ailurus fulgens
index. It's a declarative form of stack
and unstack.
pd.pivot_table() Use if you have repeated values in
index (specify aggfunc argument).

8
Tak e your Pandas ski l ls to the n ex t l ev e l ! Re g is t e r at w ww . e nthou g h t . c om/p a nd as -mas t e ry -wo rk s ho p
© 2 01 9 E n th ou g ht , In c., l i cen se d un der t he C rea ti ve C omm o ns At tr ibut ion-N onC om m erc ial-No Derivat ives 4.0 I nternat ional Lic ense.
To vi ew a copy of th i s l ice nse, visit ht tp:/ /c reat ivecom m ons.org/ licenses/ by-nc-nd/ 4.0 /

Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
100% (18)
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
208 pages
Pandas Exercises For Data Analysis PDF
100% (1)
Pandas Exercises For Data Analysis PDF
83 pages
Pandas 1.x Cookbook - Second Edition: Practical recipes for scientific computing, time series analysis, and exploratory data analysis using Python, 2nd Edition
From Everand
Pandas 1.x Cookbook - Second Edition: Practical recipes for scientific computing, time series analysis, and exploratory data analysis using Python, 2nd Edition
Matt Harrison
5/5 (1)
Python Seaborn Notes
No ratings yet
Python Seaborn Notes
28 pages
Pandas Worksheets ALL
100% (1)
Pandas Worksheets ALL
8 pages
Pandas Basics
No ratings yet
Pandas Basics
84 pages
Python Seaborn Cheat Sheet
100% (1)
Python Seaborn Cheat Sheet
1 page
AI Publishing. Python Scikit-Learn For Beginners... For Data Scientist 2021
100% (8)
AI Publishing. Python Scikit-Learn For Beginners... For Data Scientist 2021
339 pages
Root A. Python For Data Analytics. A Beginners Guide For Learning 2019
100% (8)
Root A. Python For Data Analytics. A Beginners Guide For Learning 2019
167 pages
Numpy Crash Course - Sharp Sight
89% (9)
Numpy Crash Course - Sharp Sight
160 pages
Python Pandas Tutorial
96% (28)
Python Pandas Tutorial
178 pages
NumPy Cookbook - Second Edition - Sample Chapter
100% (4)
NumPy Cookbook - Second Edition - Sample Chapter
32 pages
Introduction To Data Visualization With Seaborn Chapter3
100% (1)
Introduction To Data Visualization With Seaborn Chapter3
32 pages
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Learning pandas - Second Edition
From Everand
Learning pandas - Second Edition
Michael Heydt
4/5 (4)
Django Design Patterns and Best Practices
From Everand
Django Design Patterns and Best Practices
Arun Ravindran
5/5 (1)
Inglês 213 214 Atividade Modal Verbs PDF
0% (1)
Inglês 213 214 Atividade Modal Verbs PDF
2 pages
Data Analysis With PANDAS: Cheat Sheet
83% (6)
Data Analysis With PANDAS: Cheat Sheet
4 pages
Pandas Cheat Sheet
85% (13)
Pandas Cheat Sheet
2 pages
Numpy Cheat Sheet
67% (3)
Numpy Cheat Sheet
1 page
Matplotlib Cheat Sheet
100% (7)
Matplotlib Cheat Sheet
8 pages
NumPy, SciPy, Pandas, Quandl Cheat Sheet
100% (3)
NumPy, SciPy, Pandas, Quandl Cheat Sheet
4 pages
Pandas Data Analysis Handbook
No ratings yet
Pandas Data Analysis Handbook
55 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Python Data Science Cookbook - Sample Chapter
100% (4)
Python Data Science Cookbook - Sample Chapter
48 pages
Pandas Python
100% (2)
Pandas Python
115 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
44 pages
Pandas Guide
No ratings yet
Pandas Guide
64 pages
Scikit Learn Cheat Sheet Python
No ratings yet
Scikit Learn Cheat Sheet Python
1 page
PySpark SQL Cheat Sheet Python
100% (2)
PySpark SQL Cheat Sheet Python
1 page
SQL Cheat Sheet Python
100% (1)
SQL Cheat Sheet Python
1 page
Python Matplotlib Cheat Sheet
No ratings yet
Python Matplotlib Cheat Sheet
1 page
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
ML Algorithms
100% (1)
ML Algorithms
1 page
Python Cheat Sheet For Data Scientists by Tomi Mester 2019 PDF
100% (3)
Python Cheat Sheet For Data Scientists by Tomi Mester 2019 PDF
23 pages
Matplotlib Tutorial Learn Matplotlib by Examples B08XYJB9K3
100% (5)
Matplotlib Tutorial Learn Matplotlib by Examples B08XYJB9K3
204 pages
ARIMA Models in Python Chapter4 PDF
100% (1)
ARIMA Models in Python Chapter4 PDF
50 pages
Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python
From Everand
Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python
Stefanie Molin
No ratings yet
Effective Amazon Machine Learning
From Everand
Effective Amazon Machine Learning
Alexis Perrier
No ratings yet
Pandas in 7 Days: Utilize Python to Manipulate Data, Conduct Scientific Computing, Time Series Analysis, and Exploratory Data Analysis
From Everand
Pandas in 7 Days: Utilize Python to Manipulate Data, Conduct Scientific Computing, Time Series Analysis, and Exploratory Data Analysis
Fabio Nelli
No ratings yet
Hands-on Data Analysis and Visualization with Pandas: Engineer, Analyse and Visualize Data, Using Powerful Python Libraries
From Everand
Hands-on Data Analysis and Visualization with Pandas: Engineer, Analyse and Visualize Data, Using Powerful Python Libraries
PURNA CHANDER RAO. KATHULA
5/5 (1)
Python Data Visualization Cookbook - Second Edition
From Everand
Python Data Visualization Cookbook - Second Edition
Milovanović Igor
No ratings yet
Python Machine Learning: Introduction to Machine Learning with Python
From Everand
Python Machine Learning: Introduction to Machine Learning with Python
Frank Millstein
No ratings yet
Python Data Visualization Essentials Guide: Become a Data Visualization expert by building strong proficiency in Pandas, Matplotlib, Seaborn, Plotly, Numpy, and Bokeh
From Everand
Python Data Visualization Essentials Guide: Become a Data Visualization expert by building strong proficiency in Pandas, Matplotlib, Seaborn, Plotly, Numpy, and Bokeh
Kalilur Rahman
No ratings yet
Python Machine Learning Illustrated Guide For Beginners & Intermediates:The Future Is Here!
From Everand
Python Machine Learning Illustrated Guide For Beginners & Intermediates:The Future Is Here!
William Sullivan
4.5/5 (2)
Data Analysis with Python: Introducing NumPy, Pandas, Matplotlib, and Essential Elements of Python Programming (English Edition)
From Everand
Data Analysis with Python: Introducing NumPy, Pandas, Matplotlib, and Essential Elements of Python Programming (English Edition)
Rituraj Dixit
No ratings yet
Python Data Analysis
From Everand
Python Data Analysis
Ivan Idris
4/5 (2)
NumPy Essentials
From Everand
NumPy Essentials
Leo (Liang-Huan) Chin
No ratings yet
Numpy Simply In Depth
From Everand
Numpy Simply In Depth
Ajit Singh
5/5 (1)
Python Data Analysis - Second Edition
From Everand
Python Data Analysis - Second Edition
Armando Fandango
No ratings yet
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
From Everand
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
Alok Kumar
No ratings yet
Python Machine Learning: A Step by Step Beginner’s Guide to Learn Machine Learning Using Python
From Everand
Python Machine Learning: A Step by Step Beginner’s Guide to Learn Machine Learning Using Python
Brady Ellison
No ratings yet
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
12 Pandas
No ratings yet
12 Pandas
9 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Pandas
No ratings yet
Pandas
41 pages
Unit_III_part_2_1725700061785
No ratings yet
Unit_III_part_2_1725700061785
85 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Introduction To Pandas For Data Analysis
No ratings yet
Introduction To Pandas For Data Analysis
6 pages
Unit-III_Stack
No ratings yet
Unit-III_Stack
44 pages
4 BIG IDEAS Partitioning
No ratings yet
4 BIG IDEAS Partitioning
35 pages
Meet Program - 08-09-2022 Evening
No ratings yet
Meet Program - 08-09-2022 Evening
16 pages
Handwriting and Dyspraxia
No ratings yet
Handwriting and Dyspraxia
8 pages
Mughal Empire From Babar To Aurangzeb
100% (3)
Mughal Empire From Babar To Aurangzeb
491 pages
15 Decoder-Encoder-Mux-DeMux
No ratings yet
15 Decoder-Encoder-Mux-DeMux
19 pages
Lesson Plan 3.6 Understanding Bernoulli's Principle
No ratings yet
Lesson Plan 3.6 Understanding Bernoulli's Principle
2 pages
Bba 3 Sem RD It Tools For Business 76820 Dec 2020
No ratings yet
Bba 3 Sem RD It Tools For Business 76820 Dec 2020
2 pages
Unidad Educativa "Once de Noviembre": Ulinguí - San Andrés - Guano Teléfono 3026556-3026107 Email
No ratings yet
Unidad Educativa "Once de Noviembre": Ulinguí - San Andrés - Guano Teléfono 3026556-3026107 Email
2 pages
Noun and Pronoun Group 6
No ratings yet
Noun and Pronoun Group 6
19 pages
DE Practical File CSE 3rd Sem Dec'24
No ratings yet
DE Practical File CSE 3rd Sem Dec'24
44 pages
A SWOT Analysis of ChatGPT Implications For Educational Practice and Research
No ratings yet
A SWOT Analysis of ChatGPT Implications For Educational Practice and Research
16 pages
Kronecker Modeling and Analysis of Multidimensional Markovian Systems Tuğrul Dayar all chapter instant download
100% (1)
Kronecker Modeling and Analysis of Multidimensional Markovian Systems Tuğrul Dayar all chapter instant download
55 pages
P6-COSE CSE ARS v1.3
No ratings yet
P6-COSE CSE ARS v1.3
23 pages
Organization Management PDF
No ratings yet
Organization Management PDF
25 pages
Socket Programming: Lab Exercise
No ratings yet
Socket Programming: Lab Exercise
9 pages
AVW-PCAP Manual
No ratings yet
AVW-PCAP Manual
7 pages
Multiple
100% (1)
Multiple
8 pages
Eap Draft
No ratings yet
Eap Draft
3 pages
Grade 3 Test 2021
No ratings yet
Grade 3 Test 2021
5 pages
A2.1 - Unit 12 - VOCABULARY REVISADO
No ratings yet
A2.1 - Unit 12 - VOCABULARY REVISADO
29 pages
DOC-20231120-WA0019
No ratings yet
DOC-20231120-WA0019
14 pages
Non-Disruptive Migration VMAX All Flash - VMAX3 PDF
No ratings yet
Non-Disruptive Migration VMAX All Flash - VMAX3 PDF
61 pages
Lesson Plan Form 1
No ratings yet
Lesson Plan Form 1
2 pages
SSC CHSL 2020
No ratings yet
SSC CHSL 2020
1,128 pages
Biblical Typology
No ratings yet
Biblical Typology
32 pages
Critique
No ratings yet
Critique
2 pages
Leon S Kennedy
No ratings yet
Leon S Kennedy
72 pages
Lesson 2 Communication Models
No ratings yet
Lesson 2 Communication Models
3 pages

Pandas Summarized Visually in 8

Uploaded by

Pandas Summarized Visually in 8

Uploaded by

Reading and Writing Data with Pandas

• Use the other pd.read_* methods in scripts

Other arguments: Possible values of parse_dates:

Parsing Tables from the Web

Writing Data Structures to Disk From and To a Database

> pd.Series(values, index=index,

Important Attributes and Methods EXCEPT! with a slice or mask.

count Number of non-null observations Apply a Function to Each Series

Ta k e y ou r Pa n das sk il l s to th e next l e ve l ! Re g is t e r at w ww . e nthou g h t . c om/p a nd as -mas t e ry -wo rk s ho p

limits. Each Axis can have

Plotting with Pandas Objects

Series Dataframe Labels

2016-01-01 2016-01-02 2016-01-03 2016-01-04

Converting Objects to Time Objects

Vectorized String Operations

left left_on='X' right_on='Y' right

long X long X Y short Y short

long X long X Y short Y short

long X long X Y short Y short

long X long X Y short Y short

Cleaning Data with Missing Values

Find Missing Values s_df.interpolate(method='linear') nter olate using diﬀerent methods

X Y Z > g.agg([s_to_v1, s_to_v2])

Pivot column level to index, Pivot index level to columns,

Pivot Tables From Wide to Long with melt

df.pivot() Does not deal with repeated values in Red Panda

You might also like