- 2.25.0 (latest)
- 2.24.0
- 2.23.0
- 2.22.0
- 2.21.0
- 2.20.0
- 2.19.0
- 2.18.0
- 2.17.0
- 2.16.0
- 2.15.0
- 2.14.0
- 2.13.0
- 2.12.0
- 2.11.0
- 2.10.0
- 2.9.0
- 2.8.0
- 2.7.0
- 2.6.0
- 2.5.0
- 2.4.0
- 2.3.0
- 2.2.0
- 1.36.0
- 1.35.0
- 1.34.0
- 1.33.0
- 1.32.0
- 1.31.0
- 1.30.0
- 1.29.0
- 1.28.0
- 1.27.0
- 1.26.0
- 1.25.0
- 1.24.0
- 1.22.0
- 1.21.0
- 1.20.0
- 1.19.0
- 1.18.0
- 1.17.0
- 1.16.0
- 1.15.0
- 1.14.0
- 1.13.0
- 1.12.0
- 1.11.1
- 1.10.0
- 1.9.0
- 1.8.0
- 1.7.0
- 1.6.0
- 1.5.0
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.0
- 1.0.0
- 0.26.0
- 0.25.0
- 0.24.0
- 0.23.0
- 0.22.0
- 0.21.0
- 0.20.1
- 0.19.2
- 0.18.0
- 0.17.0
- 0.16.0
- 0.15.0
- 0.14.1
- 0.13.0
- 0.12.0
- 0.11.0
- 0.10.0
- 0.9.0
- 0.8.0
- 0.7.0
- 0.6.0
- 0.5.0
- 0.4.0
- 0.3.0
- 0.2.0
DataFrame(
    data=None,
    index: vendored_pandas_typing.Axes | None = None,
    columns: vendored_pandas_typing.Axes | None = None,
    dtype: typing.Optional[
        bigframes.dtypes.DtypeString | bigframes.dtypes.Dtype
    ] = None,
    copy: typing.Optional[bool] = None,
    *,
    session: typing.Optional[bigframes.session.Session] = None
)Two-dimensional, size-mutable, potentially heterogeneous tabular data.
Data structure also contains labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure.
Properties
T
The transpose of the DataFrame.
All columns must be the same dtype (numerics can be coerced to a common supertype).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df
   col1  col2
0     1     3
1     2     4
<BLANKLINE>
[2 rows x 2 columns]
>>> df.T
      0  1
col1  1  2
col2  3  4
<BLANKLINE>
[2 rows x 2 columns]
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The transposed DataFrame. | 
at
Access a single value for a row/column label pair.
Examples:
>>> import bigframes.pandas as bpd
>>> df = bpd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
...   index=[4, 5, 6], columns=['A', 'B', 'C'])
>>> bpd.options.display.progress_bar = None
>>> df
    A   B   C
4   0   2   3
5   0   4   1
6  10  20  30
<BLANKLINE>
[3 rows x 3 columns]
Get value at specified row/column pair
>>> df.at[4, 'B']
2
Get value within a series
>>> df.loc[5].at['B']
4
| Returns | |
|---|---|
| Type | Description | 
| bigframes.core.indexers.AtDataFrameIndexer | Indexers object. | 
axes
Return a list representing the axes of the DataFrame.
It has the row axis labels and column axis labels as the only members. They are returned in that order.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.axes[1:]
[Index(['col1', 'col2'], dtype='object')]
bqclient
BigQuery REST API Client the DataFrame uses for operations.
columns
The column labels of the DataFrame.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can access the column labels of a DataFrame via columns property.
>>> df = bpd.DataFrame({'Name': ['Alice', 'Bob', 'Aritra'],
...                     'Age': [25, 30, 35],
...                     'Location': ['Seattle', 'New York', 'Kona']},
...                    index=([10, 20, 30]))
>>> df
      Name  Age  Location
10   Alice   25   Seattle
20     Bob   30  New York
30  Aritra   35      Kona
<BLANKLINE>
[3 rows x 3 columns]
>>> df.columns
Index(['Name', 'Age', 'Location'], dtype='object')
You can also set new labels for columns.
>>> df.columns = ["NewName", "NewAge", "NewLocation"]
>>> df
   NewName  NewAge NewLocation
10   Alice      25     Seattle
20     Bob      30    New York
30  Aritra      35        Kona
<BLANKLINE>
[3 rows x 3 columns]
>>> df.columns
Index(['NewName', 'NewAge', 'NewLocation'], dtype='object')
dtypes
Return the dtypes in the DataFrame.
This returns a Series with the data type of each column. The result's index is the original DataFrame's columns. Columns with mixed types aren't supported yet in BigQuery DataFrames.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'float': [1.0], 'int': [1], 'string': ['foo']})
>>> df.dtypes
float             Float64
int                 Int64
string    string[pyarrow]
dtype: object
empty
Indicates whether Series/DataFrame is empty.
True if Series/DataFrame is entirely empty (no items), meaning any of the axes are of length 0.
| Returns | |
|---|---|
| Type | Description | 
| bool | If Series/DataFrame is empty, return True, if not return False. | 
iat
Access a single value for a row/column pair by integer position.
Examples:
>>> import bigframes.pandas as bpd
>>> df = bpd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
...                    columns=['A', 'B', 'C'])
>>> bpd.options.display.progress_bar = None
>>> df
    A       B       C
0   0       2       3
1   0       4       1
2   10      20      30
<BLANKLINE>
[3 rows x 3 columns]
Get value at specified row/column pair
>>> df.iat[1, 2]
1
Get value within a series
>>> df.loc[0].iat[1]
2
| Returns | |
|---|---|
| Type | Description | 
| bigframes.core.indexers.IatDataFrameIndexer | Indexers object. | 
iloc
Purely integer-location based indexing for selection by position.
| Returns | |
|---|---|
| Type | Description | 
| bigframes.core.indexers.ILocDataFrameIndexer | Purely integer-location Indexers. | 
index
The index (row labels) of the DataFrame.
The index of a DataFrame is a series of labels that identify each row. The labels can be integers, strings, or any other hashable type. The index is used for label-based access and alignment, and can be accessed or modified using this attribute.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can access the index of a DataFrame via index property.
>>> df = bpd.DataFrame({'Name': ['Alice', 'Bob', 'Aritra'],
...                     'Age': [25, 30, 35],
...                     'Location': ['Seattle', 'New York', 'Kona']},
...                    index=([10, 20, 30]))
>>> df
      Name  Age  Location
10   Alice   25   Seattle
20     Bob   30  New York
30  Aritra   35      Kona
<BLANKLINE>
[3 rows x 3 columns]
>>> df.index # doctest: +ELLIPSIS
Index([10, 20, 30], dtype='Int64')
>>> df.index.values
array([10, 20, 30])
Let's try setting a new index for the dataframe and see that reflect via
index property.
>>> df1 = df.set_index(["Name", "Location"])
>>> df1
                 Age
Name   Location
Alice  Seattle    25
Bob    New York   30
Aritra Kona       35
<BLANKLINE>
[3 rows x 1 columns]
>>> df1.index # doctest: +ELLIPSIS
MultiIndex([( 'Alice',  'Seattle'),
    (   'Bob', 'New York'),
    ('Aritra',     'Kona')],
   names=['Name', 'Location'])
>>> df1.index.values
array([('Alice', 'Seattle'), ('Bob', 'New York'), ('Aritra', 'Kona')],
    dtype=object)
| Returns | |
|---|---|
| Type | Description | 
| Index | The index object of the DataFrame. | 
loc
Access a group of rows and columns by label(s) or a boolean array.
| Returns | |
|---|---|
| Type | Description | 
| bigframes.core.indexers.ILocDataFrameIndexer | Indexers object. | 
ndim
Return an int representing the number of axes / array dimensions.
| Returns | |
|---|---|
| Type | Description | 
| int | Return 1 if Series. Otherwise return 2 if DataFrame. | 
plot
Make plots of Dataframes.
| Returns | |
|---|---|
| Type | Description | 
| bigframes.operations.plotting.PlotAccessor | An accessor making plots. | 
query_job
BigQuery job metadata for the most recent query.
shape
Return a tuple representing the dimensionality of the DataFrame.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2, 3],
...                     'col2': [4, 5, 6]})
>>> df.shape
(3, 2)
size
Return an int representing the number of elements in this object.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series({'a': 1, 'b': 2, 'c': 3})
>>> s.size
3
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.size
4
| Returns | |
|---|---|
| Type | Description | 
| int | Return the number of rows if Series. Otherwise return the number of rows times number of columns if DataFrame. | 
sql
Compiles this DataFrame's expression tree to SQL.
values
Return the values of DataFrame in the form of a NumPy array.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.values
array([[1, 3],
       [2, 4]], dtype=object)
Methods
__add__
__add__(other) -> bigframes.dataframe.DataFrameGet addition of DataFrame and other, column-wise, using arithmatic
operator +.
Equivalent to DataFrame.add(other).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...         'height': [1.5, 2.6],
...         'weight': [500, 800]
...     },
...     index=['elk', 'moose'])
>>> df
       height  weight
elk       1.5     500
moose     2.6     800
<BLANKLINE>
[2 rows x 2 columns]
Adding a scalar affects all rows and columns.
>>> df + 1.5
       height  weight
elk       3.0   501.5
moose     4.1   801.5
<BLANKLINE>
[2 rows x 2 columns]
You can add another DataFrame with index and columns aligned.
>>> delta = bpd.DataFrame({
...         'height': [0.5, 0.9],
...         'weight': [50, 80]
...     },
...     index=['elk', 'moose'])
>>> df + delta
       height  weight
elk       2.0     550
moose     3.5     880
<BLANKLINE>
[2 rows x 2 columns]
Adding any mis-aligned index and columns will result in invalid values.
>>> delta = bpd.DataFrame({
...         'depth': [0.5, 0.9, 1.0],
...         'weight': [50, 80, 100]
...     },
...     index=['elk', 'moose', 'bison'])
>>> df + delta
       depth  height  weight
elk     <NA>    <NA>     550
moose   <NA>    <NA>     880
bison   <NA>    <NA>    <NA>
<BLANKLINE>
[3 rows x 3 columns]
| Parameter | |
|---|---|
| Name | Description | 
| other | scalar or DataFrameObject to be added to the DataFrame. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The result of adding otherto DataFrame. | 
__array__
__array__(dtype=None) -> numpy.ndarrayReturns the rows as NumPy array.
Equivalent to DataFrame.to_numpy(dtype).
Users should not call this directly. Rather, it is invoked by
numpy.array and numpy.asarray.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> import numpy as np
>>> df = bpd.DataFrame({"a": [1, 2, 3], "b": [11, 22, 33]})
>>> np.array(df)
array([[1, 11],
    [2, 22],
    [3, 33]], dtype=object)
>>> np.asarray(df)
array([[1, 11],
    [2, 22],
    [3, 33]], dtype=object)
| Parameter | |
|---|---|
| Name | Description | 
| dtype | str or numpy.dtype, optionalThe dtype to use for the resulting NumPy array. By default, the dtype is inferred from the data. | 
| Returns | |
|---|---|
| Type | Description | 
| numpy.ndarray | The rows in the DataFrame converted to a numpy.ndarraywith the specified dtype. | 
__array_ufunc__
__array_ufunc__(
    ufunc: numpy.ufunc, method: str, *inputs, **kwargs
) -> bigframes.dataframe.DataFrameUsed to support numpy ufuncs. See: https://numpy.org/doc/stable/reference/ufuncs.html
__bool__
__bool__()Returns the truth value of the object.
__eq__
__eq__(other) -> bigframes.dataframe.DataFrameCheck equality of DataFrame and other, element-wise, using logical
operator ==.
Equivalent to DataFrame.eq(other).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...         'a': [0, 3, 4],
...         'b': [360, 0, 180]
...      })
>>> df == 0
       a      b
0   True  False
1  False   True
2  False  False
<BLANKLINE>
[3 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| other | scalar or DataFrameObject to be compared to the DataFrame for equality. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The result of comparing otherto DataFrame. | 
__floordiv__
__floordiv__(other)Get integer divison of DataFrame by other, using arithmatic operator //.
Equivalent to DataFrame.floordiv(other).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can divide by a scalar:
>>> df = bpd.DataFrame({"a": [15, 15, 15], "b": [30, 30, 30]})
>>> df // 2
   a   b
0  7  15
1  7  15
2  7  15
<BLANKLINE>
[3 rows x 2 columns]
You can also divide by another DataFrame with index and column labels aligned:
>>> divisor = bpd.DataFrame({"a": [2, 3, 4], "b": [5, 6, 7]})
>>> df // divisor
   a  b
0  7  6
1  5  5
2  3  4
<BLANKLINE>
[3 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| other | scalar or DataFrameObject to divide the DataFrame by. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The result of the integer divison. | 
__ge__
__ge__(other) -> bigframes.dataframe.DataFrameCheck whether DataFrame is greater than or equal to other, element-wise,
using logical operator >=.
Equivalent to DataFrame.ge(other).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...         'a': [0, -1, 1],
...         'b': [1, 0, -1]
...      })
>>> df >= 0
       a      b
0   True   True
1  False   True
2   True  False
<BLANKLINE>
[3 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| other | scalar or DataFrameObject to be compared to the DataFrame. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The result of comparing otherto DataFrame. | 
__getitem__
__getitem__(
    key: typing.Union[
        typing.Hashable,
        typing.Sequence[typing.Hashable],
        pandas.core.indexes.base.Index,
        bigframes.series.Series,
    ]
)Gets the specified column(s) from the DataFrame.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     "name" : ["alpha", "beta", "gamma"],
...     "age": [20, 30, 40],
...     "location": ["WA", "NY", "CA"]
... })
>>> df
    name  age location
0  alpha   20       WA
1   beta   30       NY
2  gamma   40       CA
<BLANKLINE>
[3 rows x 3 columns]
You can specify a column label to retrieve the corresponding Series.
>>> df["name"]
0    alpha
1     beta
2    gamma
Name: name, dtype: string
You can specify a list of column labels to retrieve a Dataframe.
>>> df[["name", "age"]]
    name  age
0  alpha   20
1   beta   30
2  gamma   40
<BLANKLINE>
[3 rows x 2 columns]
You can specify a condition as a series of booleans to retrieve matching rows.
>>> df[df["age"] > 25]
    name  age location
1   beta   30       NY
2  gamma   40       CA
<BLANKLINE>
[2 rows x 3 columns]
You can specify a pandas Index with desired column labels.
>>> import pandas as pd
>>> df[pd.Index(["age", "location"])]
   age location
0   20       WA
1   30       NY
2   40       CA
<BLANKLINE>
[3 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| key | indexIndex or list of indices. It can be a column label, a list of column labels, a Series of booleans or a pandas Index of desired column labels | 
| Returns | |
|---|---|
| Type | Description | 
| Series or Value | Value(s) at the requested index(es). | 
__gt__
__gt__(other) -> bigframes.dataframe.DataFrameCheck whether DataFrame is greater than other, element-wise, using logical
operator >.
Equivalent to DataFrame.gt(other).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...         'a': [0, -1, 1],
...         'b': [1, 0, -1]
...      })
>>> df > 0
       a      b
0  False   True
1  False  False
2   True  False
<BLANKLINE>
[3 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| other | scalar or DataFrameObject to be compared to the DataFrame. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The result of comparing otherto DataFrame. | 
__le__
__le__(other) -> bigframes.dataframe.DataFrameCheck whether DataFrame is less than or equal to other, element-wise,
using logical operator <=.
Equivalent to DataFrame.le(other).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...         'a': [0, -1, 1],
...         'b': [1, 0, -1]
...      })
>>> df <= 0
       a      b
0   True  False
1   True   True
2  False   True
<BLANKLINE>
[3 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| other | scalar or DataFrameObject to be compared to the DataFrame. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The result of comparing otherto DataFrame. | 
__len__
__len__()Returns number of rows in the DataFrame, serves len operator.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...         'a': [0, 1, 2],
...         'b': [3, 4, 5]
...      })
>>> len(df)
3
__lt__
__lt__(other) -> bigframes.dataframe.DataFrameCheck whether DataFrame is less than other, element-wise, using logical
operator <.
Equivalent to DataFrame.lt(other).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...         'a': [0, -1, 1],
...         'b': [1, 0, -1]
...      })
>>> df < 0
       a      b
0  False  False
1   True  False
2  False   True
<BLANKLINE>
[3 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| other | scalar or DataFrameObject to be compared to the DataFrame. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The result of comparing otherto DataFrame. | 
__matmul__
__matmul__(other) -> bigframes.dataframe.DataFrameCompute the matrix multiplication between the DataFrame and other, using
operator @.
Equivalent to DataFrame.dot(other).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> left = bpd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])
>>> left
   0  1   2   3
0  0  1  -2  -1
1  1  1   1   1
<BLANKLINE>
[2 rows x 4 columns]
>>> right = bpd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])
>>> right
    0   1
0   0   1
1   1   2
2  -1  -1
3   2   0
<BLANKLINE>
[4 rows x 2 columns]
>>> left @ right
   0  1
0  1  4
1  2  2
<BLANKLINE>
[2 rows x 2 columns]
The operand can be a Series, in which case the result will also be a Series:
>>> right = bpd.Series([1, 2, -1,0])
>>> left @ right
0    4
1    2
dtype: Int64
| Parameter | |
|---|---|
| Name | Description | 
| other | DataFrame or SeriesObject to be matrix multiplied with the DataFrame. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame or Series | The result of the matrix multiplication. | 
__mod__
__mod__(other)Get modulo of DataFrame with other, element-wise, using operator %.
Equivalent to DataFrame.mod(other).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can modulo with a scalar:
>>> df = bpd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> df % 3
   a  b
0  1  1
1  2  2
2  0  0
<BLANKLINE>
[3 rows x 2 columns]
You can also modulo with another DataFrame with index and column labels aligned:
>>> modulo = bpd.DataFrame({"a": [2, 2, 2], "b": [3, 3, 3]})
>>> df % modulo
   a  b
0  1  1
1  0  2
2  1  0
<BLANKLINE>
[3 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| other | scalar or DataFrameObject to modulo the DataFrame by. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The result of the modulo. | 
__mul__
__mul__(other)Get multiplication of DataFrame with other, element-wise, using operator *.
Equivalent to DataFrame.mul(other).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can multiply with a scalar:
>>> df = bpd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> df * 3
   a   b
0  3  12
1  6  15
2  9  18
<BLANKLINE>
[3 rows x 2 columns]
You can also multiply with another DataFrame with index and column labels aligned:
>>> df1 = bpd.DataFrame({"a": [2, 2, 2], "b": [3, 3, 3]})
>>> df * df1
   a   b
0  2  12
1  4  15
2  6  18
<BLANKLINE>
[3 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| other | scalar or DataFrameObject to multiply with the DataFrame. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The result of the multiplication. | 
__ne__
__ne__(other) -> bigframes.dataframe.DataFrameCheck inequality of DataFrame and other, element-wise, using logical
operator !=.
Equivalent to DataFrame.ne(other).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...         'a': [0, 3, 4],
...         'b': [360, 0, 180]
...      })
>>> df != 0
       a      b
0  False   True
1   True  False
2   True   True
<BLANKLINE>
[3 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| other | scalar or DataFrameObject to be compared to the DataFrame for inequality. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The result of comparing otherto DataFrame. | 
__nonzero__
__nonzero__()Returns the truth value of the object.
__pow__
__pow__(other)Get exponentiation of DataFrame with other, element-wise, using operator
**.
Equivalent to DataFrame.pow(other).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can exponentiate with a scalar:
>>> df = bpd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> df ** 2
   a   b
0  1  16
1  4  25
2  9  36
<BLANKLINE>
[3 rows x 2 columns]
You can also exponentiate with another DataFrame with index and column labels aligned:
>>> exponent = bpd.DataFrame({"a": [2, 2, 2], "b": [3, 3, 3]})
>>> df ** exponent
   a    b
0  1   64
1  4  125
2  9  216
<BLANKLINE>
[3 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| other | scalar or DataFrameObject to exponentiate the DataFrame with. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The result of the exponentiation. | 
__radd__
__radd__(other) -> bigframes.dataframe.DataFrameGet addition of DataFrame and other, column-wise, using arithmatic
operator +.
Equivalent to DataFrame.add(other).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...         'height': [1.5, 2.6],
...         'weight': [500, 800]
...     },
...     index=['elk', 'moose'])
>>> df
       height  weight
elk       1.5     500
moose     2.6     800
<BLANKLINE>
[2 rows x 2 columns]
Adding a scalar affects all rows and columns.
>>> df + 1.5
       height  weight
elk       3.0   501.5
moose     4.1   801.5
<BLANKLINE>
[2 rows x 2 columns]
You can add another DataFrame with index and columns aligned.
>>> delta = bpd.DataFrame({
...         'height': [0.5, 0.9],
...         'weight': [50, 80]
...     },
...     index=['elk', 'moose'])
>>> df + delta
       height  weight
elk       2.0     550
moose     3.5     880
<BLANKLINE>
[2 rows x 2 columns]
Adding any mis-aligned index and columns will result in invalid values.
>>> delta = bpd.DataFrame({
...         'depth': [0.5, 0.9, 1.0],
...         'weight': [50, 80, 100]
...     },
...     index=['elk', 'moose', 'bison'])
>>> df + delta
       depth  height  weight
elk     <NA>    <NA>     550
moose   <NA>    <NA>     880
bison   <NA>    <NA>    <NA>
<BLANKLINE>
[3 rows x 3 columns]
| Parameter | |
|---|---|
| Name | Description | 
| other | scalar or DataFrameObject to be added to the DataFrame. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The result of adding otherto DataFrame. | 
__repr__
__repr__() -> strConverts a DataFrame to a string. Calls to_pandas.
Only represents the first <xref uid="bigframes.options">bigframes.options</xref>.display.max_rows.
__rfloordiv__
__rfloordiv__(other)Get integer divison of other by DataFrame.
Equivalent to DataFrame.rfloordiv(other).
| Parameter | |
|---|---|
| Name | Description | 
| other | scalar or DataFrameObject to divide by the DataFrame. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The result of the integer divison. | 
__rmod__
__rmod__(other)Get integer divison of other by DataFrame.
Equivalent to DataFrame.rmod(other).
| Parameter | |
|---|---|
| Name | Description | 
| other | scalar or DataFrameObject to modulo by the DataFrame. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The result of the modulo. | 
__rmul__
__rmul__(other)Get multiplication of DataFrame with other, element-wise, using operator *.
Equivalent to DataFrame.rmul(other).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can multiply with a scalar:
>>> df = bpd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> df * 3
   a   b
0  3  12
1  6  15
2  9  18
<BLANKLINE>
[3 rows x 2 columns]
You can also multiply with another DataFrame with index and column labels aligned:
>>> df1 = bpd.DataFrame({"a": [2, 2, 2], "b": [3, 3, 3]})
>>> df * df1
   a   b
0  2  12
1  4  15
2  6  18
<BLANKLINE>
[3 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| other | scalar or DataFrameObject to multiply the DataFrame with. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The result of the multiplication. | 
__rpow__
__rpow__(other)Get exponentiation of other with DataFrame, element-wise, using operator
**.
Equivalent to DataFrame.rpow(other).
| Parameter | |
|---|---|
| Name | Description | 
| other | scalar or DataFrameObject to exponentiate with the DataFrame. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The result of the exponentiation. | 
__rsub__
__rsub__(other)Get subtraction of DataFrame from other, element-wise, using operator -.
Equivalent to DataFrame.rsub(other).
| Parameter | |
|---|---|
| Name | Description | 
| other | scalar or DataFrameObject to subtract the DataFrame from. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The result of the subtraction. | 
__rtruediv__
__rtruediv__(other)Get division of other by DataFrame, element-wise, using operator /.
Equivalent to DataFrame.rtruediv(other).
| Parameter | |
|---|---|
| Name | Description | 
| other | scalar or DataFrameObject to divide by the DataFrame. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The result of the division. | 
__setitem__
__setitem__(key: str, value: SingleItemValue)Modify or insert a column into the DataFrame.
Examples:>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     "name" : ["alpha", "beta", "gamma"],
...     "age": [20, 30, 40],
...     "location": ["WA", "NY", "CA"]
... })
>>> df
    name  age location
0  alpha   20       WA
1   beta   30       NY
2  gamma   40       CA
<BLANKLINE>
[3 rows x 3 columns]
You can add assign a constant to a new column.
>>> df["country"] = "USA"
>>> df
    name  age location country
0  alpha   20       WA     USA
1   beta   30       NY     USA
2  gamma   40       CA     USA
<BLANKLINE>
[3 rows x 4 columns]
You can assign a Series to a new column.
>>> df["new_age"] = df["age"] + 5
>>> df
    name  age location country  new_age
0  alpha   20       WA     USA       25
1   beta   30       NY     USA       35
2  gamma   40       CA     USA       45
<BLANKLINE>
[3 rows x 5 columns]
You can assign a Series to an existing column.
>>> df["new_age"] = bpd.Series([29, 39, 19], index=[1, 2, 0])
>>> df
    name  age location country  new_age
0  alpha   20       WA     USA       19
1   beta   30       NY     USA       29
2  gamma   40       CA     USA       39
<BLANKLINE>
[3 rows x 5 columns]
| Parameters | |
|---|---|
| Name | Description | 
| key | column indexIt can be a new column to be inserted, or an existing column to be modified. | 
| value | scalar or SeriesValue to be assigned to the column | 
__sub__
__sub__(other)Get subtraction of other from DataFrame, element-wise, using operator -.
Equivalent to DataFrame.sub(other).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can subtract a scalar:
>>> df = bpd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> df - 2
    a  b
0  -1  2
1   0  3
2   1  4
<BLANKLINE>
[3 rows x 2 columns]
You can also subtract another DataFrame with index and column labels aligned:
>>> df1 = bpd.DataFrame({"a": [2, 2, 2], "b": [3, 3, 3]})
>>> df - df1
    a  b
0  -1  1
1   0  2
2   1  3
<BLANKLINE>
[3 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| other | scalar or DataFrameObject to subtract from the DataFrame. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The result of the subtraction. | 
__truediv__
__truediv__(other)Get division of DataFrame by other, element-wise, using operator /.
Equivalent to DataFrame.truediv(other).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can multiply with a scalar:
>>> df = bpd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> df / 2
     a    b
0  0.5  2.0
1  1.0  2.5
2  1.5  3.0
<BLANKLINE>
[3 rows x 2 columns]
You can also multiply with another DataFrame with index and column labels aligned:
>>> denominator = bpd.DataFrame({"a": [2, 2, 2], "b": [3, 3, 3]})
>>> df / denominator
    a         b
0  0.5  1.333333
1  1.0  1.666667
2  1.5       2.0
<BLANKLINE>
[3 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| other | scalar or DataFrameObject to divide the DataFrame by. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The result of the division. | 
abs
abs() -> bigframes.dataframe.DataFrameReturn a Series/DataFrame with absolute numeric value of each element.
This function only applies to elements that are all numeric.
add
add(
    other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
    axis: str | int = "columns",
) -> bigframes.dataframe.DataFrameGet addition of DataFrame and other, element-wise (binary operator +).
Equivalent to dataframe + other. With reverse version, radd.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [4, 5, 6],
...     })
You can use method name:
>>> df['A'].add(df['B'])
0    5
1    7
2    9
dtype: Int64
You can also use arithmetic operator +:
>>> df['A'] + df['B']
0    5
1    7
2    9
dtype: Int64
| Parameters | |
|---|---|
| Name | Description | 
| other | float, int, or SeriesAny single or multiple element data structure, or list-like object. | 
| axis | {0 or 'index', 1 or 'columns'}Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame result of the arithmetic operation. | 
add_prefix
add_prefix(
    prefix: str, axis: int | str | None = None
) -> bigframes.dataframe.DataFramePrefix labels with string prefix.
For Series, the row labels are prefixed. For DataFrame, the column labels are prefixed.
| Parameters | |
|---|---|
| Name | Description | 
| prefix | strThe string to add before each label. | 
| axis | int or str or None, default None
 | 
add_suffix
add_suffix(
    suffix: str, axis: int | str | None = None
) -> bigframes.dataframe.DataFrameSuffix labels with string suffix.
For Series, the row labels are suffixed. For DataFrame, the column labels are suffixed.
agg
agg(
    func: typing.Union[str, typing.Sequence[str]]
) -> bigframes.dataframe.DataFrame | bigframes.series.SeriesAggregate using one or more operations over columns.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [3, 1, 2], "B": [1, 2, 3]})
>>> df
    A       B
0   3       1
1   1       2
2   2       3
<BLANKLINE>
[3 rows x 2 columns]
Using a single function:
>>> df.agg('sum')
A    6
B    6
dtype: Int64
Using a list of functions:
>>> df.agg(['sum', 'mean'])
          A   B
sum     6.0 6.0
mean        2.0     2.0
<BLANKLINE>
[2 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| func | functionFunction to use for aggregating the data. Accepted combinations are: string function name, list of function names, e.g.  | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame or bigframes.series.Series | Aggregated results. | 
aggregate
aggregate(
    func: typing.Union[str, typing.Sequence[str]]
) -> bigframes.dataframe.DataFrame | bigframes.series.SeriesAggregate using one or more operations over columns.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [3, 1, 2], "B": [1, 2, 3]})
>>> df
    A       B
0   3       1
1   1       2
2   2       3
<BLANKLINE>
[3 rows x 2 columns]
Using a single function:
>>> df.agg('sum')
A    6
B    6
dtype: Int64
Using a list of functions:
>>> df.agg(['sum', 'mean'])
          A   B
sum     6.0 6.0
mean        2.0     2.0
<BLANKLINE>
[2 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| func | functionFunction to use for aggregating the data. Accepted combinations are: string function name, list of function names, e.g.  | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame or bigframes.series.Series | Aggregated results. | 
align
align(
    other: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
    join: str = "outer",
    axis: typing.Optional[typing.Union[str, int]] = None,
) -> typing.Tuple[
    typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
    typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
]Align two objects on their axes with the specified join method.
Join method is specified for each axis Index.
| Parameters | |
|---|---|
| Name | Description | 
| join | {'outer', 'inner', 'left', 'right'}, default 'outer'Type of alignment to be performed. left: use only keys from left frame, preserve key order. right: use only keys from right frame, preserve key order. outer: use union of keys from both frames, sort keys lexicographically. inner: use intersection of keys from both frames, preserve the order of the left keys. | 
| axis | allowed axis of the other object, default NoneAlign on index (0), columns (1), or both (None). | 
| Returns | |
|---|---|
| Type | Description | 
| tuple of (DataFrame, type of other) | Aligned objects. | 
all
all(
    axis: typing.Union[str, int] = 0, *, bool_only: bool = False
) -> bigframes.series.SeriesReturn whether all elements are True, potentially over an axis.
Returns True unless there at least one element within a Series or along a DataFrame axis that is False or equivalent (e.g. zero or empty).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [True, True], "B": [False, False]})
>>> df
        A        B
0    True    False
1    True    False
<BLANKLINE>
[2 rows x 2 columns]
Checking if all values in each column are True(the default behavior without an explicit axis parameter):
>>> df.all()
A     True
B    False
dtype: boolean
Checking across rows to see if all values are True:
>>> df.all(axis=1)
0    False
1    False
dtype: boolean
| Parameters | |
|---|---|
| Name | Description | 
| axis | {index (0), columns (1)}Axis for the function to be applied on. For Series this parameter is unused and defaults to 0. | 
| bool_only | bool. default FalseInclude only boolean columns. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.series.Series | Series indicating if all elements are True per column. | 
any
any(
    *, axis: typing.Union[str, int] = 0, bool_only: bool = False
) -> bigframes.series.SeriesReturn whether any element is True, potentially over an axis.
Returns False unless there is at least one element within a series or along a Dataframe axis that is True or equivalent (e.g. non-zero or non-empty).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [True, True], "B": [False, False]})
>>> df
        A        B
0    True    False
1    True    False
<BLANKLINE>
[2 rows x 2 columns]
Checking if each column contains at least one True element(the default behavior without an explicit axis parameter):
>>> df.any()
A     True
B    False
dtype: boolean
Checking if each row contains at least one True element:
>>> df.any(axis=1)
0    True
1    True
dtype: boolean
| Parameters | |
|---|---|
| Name | Description | 
| axis | {index (0), columns (1)}Axis for the function to be applied on. For Series this parameter is unused and defaults to 0. | 
| bool_only | bool. default FalseInclude only boolean columns. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.series.Series | Series indicating if any element is True per column. | 
apply
apply(func, *, args: typing.Tuple = (), **kwargs)Apply a function along an axis of the DataFrame.
Objects passed to the function are Series objects whose index is
the DataFrame's index (axis=0) the final return type
is inferred from the return type of the applied function.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df
   col1  col2
0     1     3
1     2     4
<BLANKLINE>
[2 rows x 2 columns]
>>> def square(x):
...     return x * x
>>> df.apply(square)
   col1  col2
0     1     9
1     4    16
<BLANKLINE>
[2 rows x 2 columns]
| Parameters | |
|---|---|
| Name | Description | 
| args | tuplePositional arguments to pass to  | 
| func | functionFunction to apply to each column or row. | 
| Returns | |
|---|---|
| Type | Description | 
| pandas.Series or bigframes.DataFrame | Result of applying funcalong the given axis of the DataFrame. | 
applymap
applymap(
    func, na_action: typing.Optional[str] = None
) -> bigframes.dataframe.DataFrameApply a function to a Dataframe elementwise.
This method applies a function that accepts and returns a scalar to every element of a DataFrame.
Examples:>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Let's use reuse=False flag to make sure a new remote_function
is created every time we run the following code, but you can skip it
to potentially reuse a previously deployed remote_function from
the same user defined function.
>>> @bpd.remote_function(int, float, reuse=False)
... def minutes_to_hours(x):
...     return x/60
>>> df_minutes = bpd.DataFrame(
...     {"system_minutes" : [0, 30, 60, 90, 120],
...      "user_minutes" : [0, 15, 75, 90, 6]})
>>> df_minutes
system_minutes  user_minutes
0               0             0
1              30            15
2              60            75
3              90            90
4             120             6
<BLANKLINE>
[5 rows x 2 columns]
>>> df_hours = df_minutes.map(minutes_to_hours)
>>> df_hours
system_minutes  user_minutes
0             0.0           0.0
1             0.5          0.25
2             1.0          1.25
3             1.5           1.5
4             2.0           0.1
<BLANKLINE>
[5 rows x 2 columns]
If there are NA/None values in the data, you can ignore
applying the remote function on such values by specifying
na_action='ignore'.
>>> df_minutes = bpd.DataFrame(
...     {
...         "system_minutes" : [0, 30, 60, None, 90, 120, bpd.NA],
...         "user_minutes" : [0, 15, 75, 90, 6, None, bpd.NA]
...     }, dtype="Int64")
>>> df_hours = df_minutes.map(minutes_to_hours, na_action='ignore')
>>> df_hours
system_minutes  user_minutes
0             0.0           0.0
1             0.5          0.25
2             1.0          1.25
3            <NA>           1.5
4             1.5           0.1
5             2.0          <NA>
6            <NA>          <NA>
<BLANKLINE>
[7 rows x 2 columns]
| Parameters | |
|---|---|
| Name | Description | 
| func | functionPython function wrapped by  | 
| na_action | Optional[str], default None
 | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | Transformed DataFrame. | 
assign
assign(**kwargs) -> bigframes.dataframe.DataFrameAssign new columns to a DataFrame.
Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten.
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | A new DataFrame with the new columns in addition to all the existing columns. | 
astype
astype(
    dtype: typing.Union[
        typing.Literal[
            "boolean",
            "Float64",
            "Int64",
            "int64[pyarrow]",
            "string",
            "string[pyarrow]",
            "timestamp[us, tz=UTC][pyarrow]",
            "timestamp[us][pyarrow]",
            "date32[day][pyarrow]",
            "time64[us][pyarrow]",
            "decimal128(38, 9)[pyarrow]",
            "decimal256(76, 38)[pyarrow]",
            "binary[pyarrow]",
        ],
        pandas.core.arrays.boolean.BooleanDtype,
        pandas.core.arrays.floating.Float64Dtype,
        pandas.core.arrays.integer.Int64Dtype,
        pandas.core.arrays.string_.StringDtype,
        pandas.core.dtypes.dtypes.ArrowDtype,
        geopandas.array.GeometryDtype,
    ]
) -> bigframes.dataframe.DataFrameCast a pandas object to a specified dtype dtype.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Create a DataFrame:
>>> d = {'col1': [1, 2], 'col2': [3, 4]}
>>> df = bpd.DataFrame(data=d)
>>> df.dtypes
col1    Int64
col2    Int64
dtype: object
Cast all columns to Float64:
>>> df.astype('Float64').dtypes
col1    Float64
col2    Float64
dtype: object
Create a series of type Int64:
>>> ser = bpd.Series([2023010000246789, 1624123244123101, 1054834234120101], dtype='Int64')
>>> ser
0    2023010000246789
1    1624123244123101
2    1054834234120101
dtype: Int64
Convert to Float64 type:
>>> ser.astype('Float64')
0    2023010000246789.0
1    1624123244123101.0
2    1054834234120101.0
dtype: Float64
Convert to pd.ArrowDtype(pa.timestamp("us", tz="UTC")) type:
>>> ser.astype("timestamp[us, tz=UTC][pyarrow]")
0    2034-02-08 11:13:20.246789+00:00
1    2021-06-19 17:20:44.123101+00:00
2    2003-06-05 17:30:34.120101+00:00
dtype: timestamp[us, tz=UTC][pyarrow]
Note that this is equivalent of using to_datetime with unit='us':
>>> bpd.to_datetime(ser, unit='us', utc=True)
0    2034-02-08 11:13:20.246789+00:00
1    2021-06-19 17:20:44.123101+00:00
2    2003-06-05 17:30:34.120101+00:00
dtype: timestamp[us, tz=UTC][pyarrow]
Convert pd.ArrowDtype(pa.timestamp("us", tz="UTC")) type to Int64 type:
>>> timestamp_ser = ser.astype("timestamp[us, tz=UTC][pyarrow]")
>>> timestamp_ser.astype('Int64')
0    2023010000246789
1    1624123244123101
2    1054834234120101
dtype: Int64
| Parameter | |
|---|---|
| Name | Description | 
| dtype | str or pandas.ExtensionDtypeA dtype supported by BigQuery DataFrame include  | 
bfill
bfill(*, limit: typing.Optional[int] = None) -> bigframes.dataframe.DataFrameFill NA/NaN values by using the next valid observation to fill the gap.
| Returns | |
|---|---|
| Type | Description | 
| Series/DataFrame or None | Object with missing values filled. | 
cache
cache()Materializes the DataFrame to a temporary table.
Useful if the dataframe will be used multiple times, as this will avoid recomputating the shared intermediate value.
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | Self | 
combine
combine(
    other: bigframes.dataframe.DataFrame,
    func: typing.Callable[
        [bigframes.series.Series, bigframes.series.Series], bigframes.series.Series
    ],
    fill_value=None,
    overwrite: bool = True,
    *,
    how: str = "outer"
) -> bigframes.dataframe.DataFramePerform column-wise combine with another DataFrame.
Combines a DataFrame with other DataFrame using func
to element-wise combine columns. The row and column indexes of the
resulting DataFrame will be the union of the two.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df1 = bpd.DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = bpd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2
>>> df1.combine(df2, take_smaller)
   A  B
0  0  3
1  0  3
<BLANKLINE>
[2 rows x 2 columns]
| Parameters | |
|---|---|
| Name | Description | 
| other | DataFrameThe DataFrame to merge column-wise. | 
| func | functionFunction that takes two series as inputs and return a Series or a scalar. Used to merge the two dataframes column by columns. | 
| fill_value | scalar value, default NoneThe value to fill NaNs with prior to passing any column to the merge func. | 
| overwrite | bool, default TrueIf True, columns in  | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | Combination of the provided DataFrames. | 
combine_first
combine_first(other: bigframes.dataframe.DataFrame)Update null elements with value in the same location in other.
Combine two DataFrame objects by filling null values in one DataFrame with non-null values from other DataFrame. The row and column indexes of the resulting DataFrame will be the union of the two. The resulting dataframe contains the 'first' dataframe values and overrides the second one values where both first.loc[index, col] and second.loc[index, col] are not missing values, upon calling first.combine_first(second).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df1 = bpd.DataFrame({'A': [None, 0], 'B': [None, 4]})
>>> df2 = bpd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine_first(df2)
     A    B
0  1.0  3.0
1  0.0  4.0
<BLANKLINE>
[2 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| other | DataFrameProvided DataFrame to use to fill null values. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The result of combining the provided DataFrame with the other object. | 
copy
copy() -> bigframes.dataframe.DataFrameMake a copy of this object's indices and data.
A new object will be created with a copy of the calling object's data and indices. Modifications to the data or indices of the copy will not be reflected in the original object.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Modification in the original Series will not affect the copy Series:
>>> s = bpd.Series([1, 2], index=["a", "b"])
>>> s
a    1
b    2
dtype: Int64
>>> s_copy = s.copy()
>>> s_copy
a    1
b    2
dtype: Int64
>>> s.loc['b'] = 22
>>> s
a     1
b    22
dtype: Int64
>>> s_copy
a    1
b    2
dtype: Int64
Modification in the original DataFrame will not affect the copy DataFrame:
>>> df = bpd.DataFrame({'a': [1, 3], 'b': [2, 4]})
>>> df
   a  b
0  1  2
1  3  4
<BLANKLINE>
[2 rows x 2 columns]
>>> df_copy = df.copy()
>>> df_copy
   a  b
0  1  2
1  3  4
<BLANKLINE>
[2 rows x 2 columns]
>>> df.loc[df["b"] == 2, "b"] = 22
>>> df
   a   b
0  1  22
1  3   4
<BLANKLINE>
[2 rows x 2 columns]
>>> df_copy
   a  b
0  1  2
1  3  4
<BLANKLINE>
[2 rows x 2 columns]
corr
corr(
    method="pearson", min_periods=None, numeric_only=False
) -> bigframes.dataframe.DataFrameCompute pairwise correlation of columns, excluding NA/null values.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'A': [1, 2, 3],
...                    'B': [400, 500, 600],
...                    'C': [0.8, 0.4, 0.9]})
>>> df.corr(numeric_only=True)
          A         B         C
A       1.0       1.0  0.188982
B       1.0       1.0  0.188982
C  0.188982  0.188982       1.0
<BLANKLINE>
[3 rows x 3 columns]
| Parameters | |
|---|---|
| Name | Description | 
| method | string, default "pearson"Correlation method to use - currently only "pearson" is supported. | 
| min_periods | int, default NoneThe minimum number of observations needed to return a result. Non-default values are not yet supported, so a result will be returned for at least two observations. | 
| numeric_only | bool, default FalseInclude only float, int, boolean, decimal data. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | Correlation matrix. | 
count
count(*, numeric_only: bool = False) -> bigframes.series.SeriesCount non-NA cells for each column.
The values None, NaN, NaT, and optionally numpy.inf (depending
on pandas.options.mode.use_inf_as_na) are considered NA.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, None, 3, 4, 5],
...                     "B": [1, 2, 3, 4, 5],
...                     "C": [None, 3.5, None, 4.5, 5.0]})
>>> df
       A    B          C
0    1.0    1       <NA>
1   <NA>    2        3.5
2    3.0    3       <NA>
3    4.0    4        4.5
4    5.0    5        5.0
<BLANKLINE>
[5 rows x 3 columns]
Counting non-NA values for each column:
>>> df.count()
A    4
B    5
C    3
dtype: Int64
| Parameter | |
|---|---|
| Name | Description | 
| numeric_only | bool, default FalseInclude only  | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.series.Series | For each column/row the number of non-NA/null entries. If levelis specified returns aDataFrame. | 
cov
cov(*, numeric_only: bool = False) -> bigframes.dataframe.DataFrameCompute pairwise covariance of columns, excluding NA/null values.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'A': [1, 2, 3],
...                    'B': [400, 500, 600],
...                    'C': [0.8, 0.4, 0.9]})
>>> df.cov(numeric_only=True)
       A        B     C
A    1.0    100.0  0.05
B  100.0  10000.0   5.0
C   0.05      5.0  0.07
<BLANKLINE>
[3 rows x 3 columns]
| Parameter | |
|---|---|
| Name | Description | 
| numeric_only | bool, default FalseInclude only float, int, boolean, decimal data. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The covariance matrix of the series of the DataFrame. | 
cummax
cummax() -> bigframes.dataframe.DataFrameReturn cumulative maximum over columns.
Returns a DataFrame of the same size containing the cumulative maximum.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [3, 1, 2], "B": [1, 2, 3]})
>>> df
    A       B
0   3       1
1   1       2
2   2       3
<BLANKLINE>
[3 rows x 2 columns]
>>> df.cummax()
    A       B
0   3       1
1   3       2
2   3       3
<BLANKLINE>
[3 rows x 2 columns]
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | Return cumulative maximum of DataFrame. | 
cummin
cummin() -> bigframes.dataframe.DataFrameReturn cumulative minimum over columns.
Returns a DataFrame of the same size containing the cumulative minimum.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [3, 1, 2], "B": [1, 2, 3]})
>>> df
    A       B
0   3       1
1   1       2
2   2       3
<BLANKLINE>
[3 rows x 2 columns]
>>> df.cummin()
    A       B
0   3       1
1   1       1
2   1       1
<BLANKLINE>
[3 rows x 2 columns]
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | Return cumulative minimum of DataFrame. | 
cumprod
cumprod() -> bigframes.dataframe.DataFrameReturn cumulative product over columns.
Returns a DataFrame of the same size containing the cumulative product.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [3, 1, 2], "B": [1, 2, 3]})
>>> df
    A       B
0   3       1
1   1       2
2   2       3
<BLANKLINE>
[3 rows x 2 columns]
>>> df.cumprod()
     A    B
0  3.0  1.0
1  3.0  2.0
2  6.0  6.0
<BLANKLINE>
[3 rows x 2 columns]
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | Return cumulative product of DataFrame. | 
cumsum
cumsum()Return cumulative sum over columns.
Returns a DataFrame of the same size containing the cumulative sum.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [3, 1, 2], "B": [1, 2, 3]})
>>> df
    A       B
0   3       1
1   1       2
2   2       3
<BLANKLINE>
[3 rows x 2 columns]
>>> df.cumsum()
    A       B
0   3       1
1   4       3
2   6       6
<BLANKLINE>
[3 rows x 2 columns]
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | Return cumulative sum of DataFrame. | 
describe
describe() -> bigframes.dataframe.DataFrameGenerate descriptive statistics.
Descriptive statistics include those that summarize the central
tendency, dispersion and shape of a
dataset's distribution, excluding NaN values.
Only supports numeric columns.
Examples:>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [3, 1, 2], "B": [0, 2, 8]})
>>> df
    A       B
0   3       0
1   1       2
2   2       8
<BLANKLINE>
[3 rows x 2 columns]
>>> df.describe()
              A               B
count       3.0             3.0
mean        2.0        3.333333
std         1.0        4.163332
min         1.0             0.0
25%         1.0             0.0
50%         2.0             2.0
75%         3.0             8.0
max         3.0             8.0
<BLANKLINE>
[8 rows x 2 columns]
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | Summary statistics of the Series or Dataframe provided. | 
diff
diff(periods: int = 1) -> bigframes.dataframe.DataFrameFirst discrete difference of element.
Calculates the difference of a DataFrame element compared with another element in the DataFrame (default is element in previous row).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [3, 1, 2], "B": [1, 2, 3]})
>>> df
    A       B
0   3       1
1   1       2
2   2       3
<BLANKLINE>
[3 rows x 2 columns]
Calculating difference with default periods=1:
>>> df.diff()
       A       B
0   <NA>    <NA>
1     -2       1
2      1       1
<BLANKLINE>
[3 rows x 2 columns]
Calculating difference with periods=-1:
>>> df.diff(periods=-1)
       A       B
0      2      -1
1     -1      -1
2   <NA>    <NA>
<BLANKLINE>
[3 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| periods | int, default 1Periods to shift for calculating difference, accepts negative values. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | First differences of the Series. | 
div
div(
    other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
    axis: str | int = "columns",
) -> bigframes.dataframe.DataFrameGet floating division of DataFrame and other, element-wise (binary operator /).
Equivalent to dataframe / other. With reverse version, rtruediv.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [4, 5, 6],
...     })
You can use method name:
>>> df['A'].truediv(df['B'])
0    0.25
1     0.4
2     0.5
dtype: Float64
You can also use arithmetic operator /:
>>> df['A'] / (df['B'])
0    0.25
1     0.4
2     0.5
dtype: Float64
| Parameters | |
|---|---|
| Name | Description | 
| other | float, int, or SeriesAny single or multiple element data structure, or list-like object. | 
| axis | {0 or 'index', 1 or 'columns'}Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame result of the arithmetic operation. | 
divide
divide(
    other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
    axis: str | int = "columns",
) -> bigframes.dataframe.DataFrameGet floating division of DataFrame and other, element-wise (binary operator /).
Equivalent to dataframe / other. With reverse version, rtruediv.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [4, 5, 6],
...     })
You can use method name:
>>> df['A'].truediv(df['B'])
0    0.25
1     0.4
2     0.5
dtype: Float64
You can also use arithmetic operator /:
>>> df['A'] / (df['B'])
0    0.25
1     0.4
2     0.5
dtype: Float64
| Parameters | |
|---|---|
| Name | Description | 
| other | float, int, or SeriesAny single or multiple element data structure, or list-like object. | 
| axis | {0 or 'index', 1 or 'columns'}Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame result of the arithmetic operation. | 
dot
dot(other: _DataFrameOrSeries) -> _DataFrameOrSeriesCompute the matrix multiplication between the DataFrame and other.
This method computes the matrix product between the DataFrame and the values of an other Series or DataFrame.
It can also be called using self @ other.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> left = bpd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])
>>> left
   0  1   2   3
0  0  1  -2  -1
1  1  1   1   1
<BLANKLINE>
[2 rows x 4 columns]
>>> right = bpd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])
>>> right
    0   1
0   0   1
1   1   2
2  -1  -1
3   2   0
<BLANKLINE>
[4 rows x 2 columns]
>>> left.dot(right)
   0  1
0  1  4
1  2  2
<BLANKLINE>
[2 rows x 2 columns]
You can also use the operator @ for the dot product:
>>> left @ right
   0  1
0  1  4
1  2  2
<BLANKLINE>
[2 rows x 2 columns]
The right input can be a Series, in which case the result will also be a Series:
>>> right = bpd.Series([1, 2, -1,0])
>>> left @ right
0    4
1    2
dtype: Int64
Any user defined index of the left matrix and columns of the right matrix will reflect in the result.
>>> left = bpd.DataFrame([[1, 2, 3], [2, 5, 7]], index=["alpha", "beta"])
>>> left
       0  1  2
alpha  1  2  3
beta   2  5  7
<BLANKLINE>
[2 rows x 3 columns]
>>> right = bpd.DataFrame([[2, 4, 8], [1, 5, 10], [3, 6, 9]], columns=["red", "green", "blue"])
>>> right
   red  green  blue
0    2      4     8
1    1      5    10
2    3      6     9
<BLANKLINE>
[3 rows x 3 columns]
>>> left.dot(right)
       red  green  blue
alpha   13     32    55
beta    30     75   129
<BLANKLINE>
[2 rows x 3 columns]
| Parameter | |
|---|---|
| Name | Description | 
| other | Series or DataFrameThe other object to compute the matrix product with. | 
| Returns | |
|---|---|
| Type | Description | 
| Series or DataFrame | If otheris a Series, return the matrix product between self and other as a Series. If other is a DataFrame, return the matrix product of self and other in a DataFrame. | 
drop
drop(
    labels: typing.Any = None,
    *,
    axis: typing.Union[int, str] = 0,
    index: typing.Any = None,
    columns: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]] = None,
    level: typing.Optional[typing.Hashable] = None
) -> bigframes.dataframe.DataFrameDrop specified labels from columns.
Remove columns by directly specifying column names.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame(np.arange(12).reshape(3, 4),
...                    columns=['A', 'B', 'C', 'D'])
>>> df
   A  B   C   D
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11
<BLANKLINE>
[3 rows x 4 columns]
Drop columns:
>>> df.drop(['B', 'C'], axis=1)
   A   D
0  0   3
1  4   7
2  8  11
<BLANKLINE>
[3 rows x 2 columns]
>>> df.drop(columns=['B', 'C'])
   A   D
0  0   3
1  4   7
2  8  11
<BLANKLINE>
[3 rows x 2 columns]
Drop a row by index:
>>> df.drop([0, 1])
   A  B   C   D
2  8  9  10  11
<BLANKLINE>
[1 rows x 4 columns]
Drop columns and/or rows of MultiIndex DataFrame:
>>> import pandas as pd
>>> midx = pd.MultiIndex(levels=[['llama', 'cow', 'falcon'],
...                              ['speed', 'weight', 'length']],
...                      codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2],
...                             [0, 1, 2, 0, 1, 2, 0, 1, 2]])
>>> df = bpd.DataFrame(index=midx, columns=['big', 'small'],
...                    data=[[45, 30], [200, 100], [1.5, 1], [30, 20],
...                          [250, 150], [1.5, 0.8], [320, 250],
...                          [1, 0.8], [0.3, 0.2]])
>>> df
                 big  small
llama  speed    45.0   30.0
       weight  200.0  100.0
       length    1.5    1.0
cow    speed    30.0   20.0
       weight  250.0  150.0
       length    1.5    0.8
falcon speed   320.0  250.0
       weight    1.0    0.8
       length    0.3    0.2
<BLANKLINE>
[9 rows x 2 columns]
Drop a specific index and column combination from the MultiIndex
DataFrame, i.e., drop the index 'cow' and column 'small':
>>> df.drop(index='cow', columns='small')
                 big
llama  speed    45.0
       weight  200.0
       length    1.5
falcon speed   320.0
       weight    1.0
       length    0.3
<BLANKLINE>
[6 rows x 1 columns]
>>> df.drop(index='length', level=1)
                 big  small
llama  speed    45.0   30.0
       weight  200.0  100.0
cow    speed    30.0   20.0
       weight  250.0  150.0
falcon speed   320.0  250.0
       weight    1.0    0.8
<BLANKLINE>
[6 rows x 2 columns]
| Exceptions | |
|---|---|
| Type | Description | 
| KeyError | If any of the labels is not found in the selected axis. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | DataFrame without the removed column labels. | 
drop_duplicates
drop_duplicates(
    subset: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]] = None,
    *,
    keep: str = "first"
) -> bigframes.dataframe.DataFrameReturn DataFrame with duplicate rows removed.
Considering certain columns is optional. Indexes, including time indexes are ignored.
| Parameters | |
|---|---|
| Name | Description | 
| subset | column label or sequence of labels, optionalOnly consider certain columns for identifying duplicates, by default use all of the columns. | 
| keep | {'first', 'last', Determines which duplicates (if any) to keep. - 'first' : Drop duplicates except for the first occurrence. - 'last' : Drop duplicates except for the last occurrence. -  | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | DataFrame with duplicates removed | 
droplevel
droplevel(
    level: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]],
    axis: int | str = 0,
)Return DataFrame with requested index / column level(s) removed.
| Parameters | |
|---|---|
| Name | Description | 
| level | int, str, or list-likeIf a string is given, must be the name of a level If list-like, elements must be names or positional indexes of levels. | 
| axis | {0 or 'index', 1 or 'columns'}, default 0Axis along which the level(s) is removed: * 0 or 'index': remove level(s) in column. * 1 or 'columns': remove level(s) in row. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame with requested index / column level(s) removed. | 
dropna
dropna(
    *, axis: int | str = 0, inplace: bool = False, how: str = "any", ignore_index=False
) -> bigframes.dataframe.DataFrameRemove missing values.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
...                     "toy": [np.nan, 'Batmobile', 'Bullwhip'],
...                     "born": [bpd.NA, "1940-04-25", bpd.NA]})
>>> df
       name        toy        born
0    Alfred       <NA>        <NA>
1    Batman  Batmobile  1940-04-25
2  Catwoman   Bullwhip        <NA>
<BLANKLINE>
[3 rows x 3 columns]
Drop the rows where at least one element is missing:
>>> df.dropna()
     name        toy        born
1  Batman  Batmobile  1940-04-25
<BLANKLINE>
[1 rows x 3 columns]
Drop the columns where at least one element is missing.
>>> df.dropna(axis='columns')
       name
0    Alfred
1    Batman
2  Catwoman
<BLANKLINE>
[3 rows x 1 columns]
Drop the rows where all elements are missing:
>>> df.dropna(how='all')
       name        toy        born
0    Alfred       <NA>        <NA>
1    Batman  Batmobile  1940-04-25
2  Catwoman   Bullwhip        <NA>
<BLANKLINE>
[3 rows x 3 columns]
| Parameters | |
|---|---|
| Name | Description | 
| axis | {0 or 'index', 1 or 'columns'}, default 'columns'Determine if rows or columns which contain missing values are removed. * 0, or 'index' : Drop rows which contain missing values. * 1, or 'columns' : Drop columns which contain missing value. | 
| how | {'any', 'all'}, default 'any'Determine if row or column is removed from DataFrame, when we have at least one NA or all NA. * 'any' : If any NA values are present, drop that row or column. * 'all' : If all values are NA, drop that row or column. | 
| ignore_index | bool, default If  | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | DataFrame with NA entries dropped from it. | 
duplicated
duplicated(subset=None, keep: str = "first") -> bigframes.series.SeriesReturn boolean Series denoting duplicate rows.
Considering certain columns is optional.
| Parameters | |
|---|---|
| Name | Description | 
| subset | column label or sequence of labels, optionalOnly consider certain columns for identifying duplicates, by default use all of the columns. | 
| keep | {'first', 'last', False}, default 'first'Determines which duplicates (if any) to mark. -  | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.series.Series | Boolean series for each duplicated rows. | 
eq
eq(other: typing.Any, axis: str | int = "columns") -> bigframes.dataframe.DataFrameGet equal to of DataFrame and other, element-wise (binary operator eq).
Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison
operators.
Equivalent to ==, !=, <=, <, >=, > with support to choose axis
(rows or columns) and level for comparison.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can use method name:
>>> df = bpd.DataFrame({'angles': [0, 3, 4],
...        'degrees': [360, 180, 360]},
...       index=['circle', 'triangle', 'rectangle'])
>>> df["degrees"].eq(360)
circle        True
triangle     False
rectangle     True
Name: degrees, dtype: boolean
You can also use logical operator ==:
>>> df["degrees"] == 360
circle        True
triangle     False
rectangle     True
Name: degrees, dtype: boolean
| Parameters | |
|---|---|
| Name | Description | 
| other | scalar, sequence, Series, or DataFrameAny single or multiple element data structure, or list-like object. | 
| axis | {0 or 'index', 1 or 'columns'}, default 'columns'Whether to compare by the index (0 or 'index') or columns (1 or 'columns'). | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | Result of the comparison. | 
equals
equals(
    other: typing.Union[bigframes.series.Series, bigframes.dataframe.DataFrame]
) -> boolTest whether two objects contain the same elements.
This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.
The row/column index do not need to have the same type, as long as the values are considered equal. Corresponding columns must be of the same dtype.
| Parameter | |
|---|---|
| Name | Description | 
| other | Series or DataFrameThe other Series or DataFrame to be compared with the first. | 
| Returns | |
|---|---|
| Type | Description | 
| bool | True if all elements are the same in both objects, False otherwise. | 
eval
eval(expr: str) -> bigframes.dataframe.DataFrameEvaluate a string describing operations on DataFrame columns.
Operates on columns only, not specific rows or elements.  This allows
eval to run arbitrary code, which can make you vulnerable to code
injection if you pass user input to this function.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'A': range(1, 6), 'B': range(10, 0, -2)})
>>> df
A   B
0  1  10
1  2   8
2  3   6
3  4   4
4  5   2
<BLANKLINE>
[5 rows x 2 columns]
>>> df.eval('A + B')
0    11
1    10
2     9
3     8
4     7
dtype: Int64
Assignment is allowed though by default the original DataFrame is not modified.
>>> df.eval('C = A + B')
   A   B   C
0  1  10  11
1  2   8  10
2  3   6   9
3  4   4   8
4  5   2   7
<BLANKLINE>
[5 rows x 3 columns]
>>> df
   A   B
0  1  10
1  2   8
2  3   6
3  4   4
4  5   2
<BLANKLINE>
[5 rows x 2 columns]
Multiple columns can be assigned to using multi-line expressions:
>>> df.eval(
...     '''
... C = A + B
... D = A - B
... '''
... )
   A   B   C  D
0  1  10  11 -9
1  2   8  10 -6
2  3   6   9 -3
3  4   4   8  0
4  5   2   7  3
<BLANKLINE>
[5 rows x 4 columns]
| Parameter | |
|---|---|
| Name | Description | 
| expr | strThe expression string to evaluate. | 
expanding
expanding(min_periods: int = 1) -> bigframes.core.window.WindowProvide expanding window calculations.
| Parameter | |
|---|---|
| Name | Description | 
| min_periods | int, default 1Minimum number of observations in window required to have a value; otherwise, result is  | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.core.window.Window | Expandingsubclass. | 
explode
explode(
    column: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]],
    *,
    ignore_index: typing.Optional[bool] = False
) -> bigframes.dataframe.DataFrameTransform each element of an array to a row, replicating index values.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'A': [[0, 1, 2], [], [], [3, 4]],
...                     'B': 1,
...                     'C': [['a', 'b', 'c'], np.nan, [], ['d', 'e']]})
>>> df.explode('A')
    A  B              C
0     0  1  ['a' 'b' 'c']
0     1  1  ['a' 'b' 'c']
0     2  1  ['a' 'b' 'c']
1  <NA>  1             []
2  <NA>  1             []
3     3  1      ['d' 'e']
3     4  1      ['d' 'e']
<BLANKLINE>
[7 rows x 3 columns]
>>> df.explode(list('AC'))
    A  B     C
0     0  1     a
0     1  1     b
0     2  1     c
1  <NA>  1  <NA>
2  <NA>  1  <NA>
3     3  1     d
3     4  1     e
<BLANKLINE>
[7 rows x 3 columns]
| Parameters | |
|---|---|
| Name | Description | 
| column | str, Sequence[str]Column(s) to explode. For multiple columns, specify a non-empty list with each element be str or tuple, and all specified columns their list-like data on same row of the frame must have matching length. | 
| ignore_index | bool, default FalseIf True, the resulting index will be labeled 0, 1, …, n - 1. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.series.DataFrame | Exploded lists to rows of the subset columns; index will be duplicated for these rows. | 
ffill
ffill(*, limit: typing.Optional[int] = None) -> bigframes.dataframe.DataFrameFill NA/NaN values by propagating the last valid observation to next valid.
Examples:
>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame([[np.nan, 2, np.nan, 0],
...                     [3, 4, np.nan, 1],
...                     [np.nan, np.nan, np.nan, np.nan],
...                     [np.nan, 3, np.nan, 4]],
...                    columns=list("ABCD")).astype("Float64")
>>> df
      A     B     C     D
0  <NA>   2.0  <NA>   0.0
1   3.0   4.0  <NA>   1.0
2  <NA>  <NA>  <NA>  <NA>
3  <NA>   3.0  <NA>   4.0
<BLANKLINE>
[4 rows x 4 columns]
Fill NA/NaN values in DataFrames:
>>> df.ffill()
      A    B     C    D
0  <NA>  2.0  <NA>  0.0
1   3.0  4.0  <NA>  1.0
2   3.0  4.0  <NA>  1.0
3   3.0  3.0  <NA>  4.0
<BLANKLINE>
[4 rows x 4 columns]
Fill NA/NaN values in Series:
>>> series = bpd.Series([1, np.nan, 2, 3])
>>> series.ffill()
0    1.0
1    1.0
2    2.0
3    3.0
dtype: Float64
| Returns | |
|---|---|
| Type | Description | 
| Series/DataFrame or None | Object with missing values filled. | 
fillna
fillna(value=None) -> bigframes.dataframe.DataFrameFill NA/NaN values using the specified method.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame([[np.nan, 2, np.nan, 0],
...                     [3, 4, np.nan, 1],
...                     [np.nan, np.nan, np.nan, np.nan],
...                     [np.nan, 3, np.nan, 4]],
...                    columns=list("ABCD")).astype("Float64")
>>> df
    A     B     C     D
0  <NA>   2.0  <NA>   0.0
1   3.0   4.0  <NA>   1.0
2  <NA>  <NA>  <NA>  <NA>
3  <NA>   3.0  <NA>   4.0
<BLANKLINE>
[4 rows x 4 columns]
Replace all NA elements with 0s.
>>> df.fillna(0)
     A    B    C    D
0  0.0  2.0  0.0  0.0
1  3.0  4.0  0.0  1.0
2  0.0  0.0  0.0  0.0
3  0.0  3.0  0.0  4.0
<BLANKLINE>
[4 rows x 4 columns]
You can use fill values from another DataFrame:
>>> df_fill = bpd.DataFrame(np.arange(12).reshape(3, 4),
...                         columns=['A', 'B', 'C', 'D'])
>>> df_fill
   A  B   C   D
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11
<BLANKLINE>
[3 rows x 4 columns]
>>> df.fillna(df_fill)
    A    B     C     D
0   0.0  2.0   2.0   0.0
1   3.0  4.0   6.0   1.0
2   8.0  9.0  10.0  11.0
3  <NA>  3.0  <NA>   4.0
<BLANKLINE>
[4 rows x 4 columns]
| Parameter | |
|---|---|
| Name | Description | 
| value | scalar, SeriesValue to use to fill holes (e.g. 0), alternately a Series of values specifying which value to use for each index (for a Series) or column (for a DataFrame). Values not in the Series will not be filled. This value cannot be a list. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | Object with missing values filled | 
filter
filter(
    items: typing.Optional[typing.Iterable] = None,
    like: typing.Optional[str] = None,
    regex: typing.Optional[str] = None,
    axis: int | str | None = None,
) -> bigframes.dataframe.DataFrameSubset the dataframe rows or columns according to the specified index labels.
Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index.
| Parameters | |
|---|---|
| Name | Description | 
| items | list-likeKeep labels from axis which are in items. | 
| like | strKeep labels from axis for which "like in label == True". | 
| regex | str (regular expression)Keep labels from axis for which re.search(regex, label) == True. | 
| axis | {0 or 'index', 1 or 'columns', None}, default NoneThe axis to filter on, expressed either as an index (int) or axis name (str). By default this is the info axis, 'columns' for DataFrame. For  | 
first_valid_index
first_valid_index()API documentation for first_valid_index method.
floordiv
floordiv(
    other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
    axis: str | int = "columns",
) -> bigframes.dataframe.DataFrameGet integer division of DataFrame and other, element-wise (binary operator //).
Equivalent to dataframe // other. With reverse version, rfloordiv.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [4, 5, 6],
...     })
You can use method name:
>>> df['A'].floordiv(df['B'])
0    0
1    0
2    0
dtype: Int64
You can also use arithmetic operator //:
>>> df['A'] // (df['B'])
0    0
1    0
2    0
dtype: Int64
| Parameters | |
|---|---|
| Name | Description | 
| other | float, int, or SeriesAny single or multiple element data structure, or list-like object. | 
| axis | {0 or 'index', 1 or 'columns'}Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame result of the arithmetic operation. | 
from_dict
from_dict(
    data: dict, orient: str = "columns", dtype=None, columns=None
) -> bigframes.dataframe.DataFrameConstruct DataFrame from dict of array-like or dicts.
Creates DataFrame object from dictionary by columns or by index allowing dtype specification.
| Parameters | |
|---|---|
| Name | Description | 
| data | dictOf the form {field : array-like} or {field : dict}. | 
| orient | {'columns', 'index', 'tight'}, default 'columns'The "orientation" of the data. If the keys of the passed dict should be the columns of the resulting DataFrame, pass 'columns' (default). Otherwise if the keys should be rows, pass 'index'. If 'tight', assume a dict with keys ['index', 'columns', 'data', 'index_names', 'column_names']. | 
| dtype | dtype, default NoneData type to force after DataFrame construction, otherwise infer. | 
| columns | list, default NoneColumn labels to use when  | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame. | 
from_records
from_records(
    data,
    index=None,
    exclude=None,
    columns=None,
    coerce_float: bool = False,
    nrows: typing.Optional[int] = None,
) -> bigframes.dataframe.DataFrameConvert structured or record ndarray to DataFrame.
Creates a DataFrame object from a structured ndarray, sequence of tuples or dicts, or DataFrame.
| Parameters | |
|---|---|
| Name | Description | 
| data | structured ndarray, sequence of tuples or dictsStructured input data. | 
| index | str, list of fields, array-likeField of array to use as the index, alternately a specific set of input labels to use. | 
| exclude | sequence, default NoneColumns or fields to exclude. | 
| columns | sequence, default NoneColumn names to use. If the passed data do not have names associated with them, this argument provides names for the columns. Otherwise this argument indicates the order of the columns in the result (any names not found in the data will become all-NA columns). | 
| coerce_float | bool, default FalseAttempt to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets. | 
| nrows | int, default NoneNumber of rows to read if data is an iterator. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame. | 
ge
ge(other: typing.Any, axis: str | int = "columns") -> bigframes.dataframe.DataFrameGet 'greater than or equal to' of DataFrame and other, element-wise (binary operator >=).
Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison
operators.
Equivalent to ==, !=, <=, <, >=, > with support to choose axis
(rows or columns) and level for comparison.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can use method name:
>>> df = bpd.DataFrame({'angles': [0, 3, 4],
...        'degrees': [360, 180, 360]},
...       index=['circle', 'triangle', 'rectangle'])
>>> df["degrees"].ge(360)
circle        True
triangle     False
rectangle     True
Name: degrees, dtype: boolean
You can also use arithmetic operator >=:
>>> df["degrees"] >= 360
circle        True
triangle     False
rectangle     True
Name: degrees, dtype: boolean
| Parameters | |
|---|---|
| Name | Description | 
| other | scalar, sequence, Series, or DataFrameAny single or multiple element data structure, or list-like object. | 
| axis | {0 or 'index', 1 or 'columns'}, default 'columns'Whether to compare by the index (0 or 'index') or columns (1 or 'columns'). | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame of bool. The result of the comparison. | 
get
get(key, default=None)Get item from object for given key (ex: DataFrame column).
Returns default value if not found.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame(
...     [
...         [24.3, 75.7, "high"],
...         [31, 87.8, "high"],
...         [22, 71.6, "medium"],
...         [35, 95, "medium"],
...     ],
...     columns=["temp_celsius", "temp_fahrenheit", "windspeed"],
...     index=["2014-02-12", "2014-02-13", "2014-02-14", "2014-02-15"],
... )
>>> df
            temp_celsius  temp_fahrenheit windspeed
2014-02-12          24.3             75.7      high
2014-02-13          31.0             87.8      high
2014-02-14          22.0             71.6    medium
2014-02-15          35.0             95.0    medium
<BLANKLINE>
[4 rows x 3 columns]
>>> df.get(["temp_celsius", "windspeed"])
            temp_celsius windspeed
2014-02-12          24.3      high
2014-02-13          31.0      high
2014-02-14          22.0    medium
2014-02-15          35.0    medium
<BLANKLINE>
[4 rows x 2 columns]
>>> ser = df['windspeed']
>>> ser
2014-02-12      high
2014-02-13      high
2014-02-14    medium
2014-02-15    medium
Name: windspeed, dtype: string
>>> ser.get('2014-02-13')
'high'
If the key is not found, the default value will be used.
>>> df.get(["temp_celsius", "temp_kelvin"])
>>> df.get(["temp_celsius", "temp_kelvin"], default="default_value")
'default_value'
groupby
groupby(
    by: typing.Optional[
        typing.Union[
            typing.Hashable,
            bigframes.series.Series,
            typing.Sequence[typing.Union[typing.Hashable, bigframes.series.Series]],
        ]
    ] = None,
    *,
    level: typing.Optional[
        typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]]
    ] = None,
    as_index: bool = True,
    dropna: bool = True
) -> bigframes.core.groupby.DataFrameGroupByGroup DataFrame by columns.
A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'Animal': ['Falcon', 'Falcon',
...                                'Parrot', 'Parrot'],
...                     'Max Speed': [380., 370., 24., 26.]})
>>> df
   Animal  Max Speed
0  Falcon      380.0
1  Falcon      370.0
2  Parrot       24.0
3  Parrot       26.0
<BLANKLINE>
[4 rows x 2 columns]
>>> df.groupby(['Animal'])['Max Speed'].mean()
Animal
Falcon    375.0
Parrot     25.0
Name: Max Speed, dtype: Float64
We can also choose to include NA in group keys or not by setting dropna:
>>> df = bpd.DataFrame([[1, 2, 3],[1, None, 4], [2, 1, 3], [1, 2, 2]],
...                    columns=["a", "b", "c"])
>>> df.groupby(by=["b"]).sum()
     a  c
b
1.0  2  3
2.0  2  5
<BLANKLINE>
[2 rows x 2 columns]
>>> df.groupby(by=["b"], dropna=False).sum()
      a  c
b
1.0   2  3
2.0   2  5
<NA>  1  4
<BLANKLINE>
[3 rows x 2 columns]
We can also choose to return object with group labels or not by setting as_index:
>>> df.groupby(by=["b"], as_index=False).sum()
     b  a  c
0  1.0  2  3
1  2.0  2  5
<BLANKLINE>
[2 rows x 3 columns]
| Parameters | |
|---|---|
| Name | Description | 
| by | str, Sequence[str]A label or list of labels may be passed to group by the columns in  | 
| level | int, level name, or sequence of such, default NoneIf the axis is a MultiIndex (hierarchical), group by a particular level or levels. Do not specify both  | 
| as_index | bool, default TrueDefault True. Return object with group labels as the index. Only relevant for DataFrame input.  | 
| dropna | bool, default TrueDefault True. If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.core.groupby.SeriesGroupBy | A groupby object that contains information about the groups. | 
gt
gt(other: typing.Any, axis: str | int = "columns") -> bigframes.dataframe.DataFrameGet 'greater than' of DataFrame and other, element-wise (binary operator >).
Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison
operators.
Equivalent to ==, !=, <=, <, >=, > with support to choose axis
(rows or columns) and level for comparison.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'angles': [0, 3, 4],
...        'degrees': [360, 180, 360]},
...       index=['circle', 'triangle', 'rectangle'])
>>> df["degrees"].gt(360)
circle       False
triangle     False
rectangle    False
Name: degrees, dtype: boolean
You can also use arithmetic operator >:
>>> df["degrees"] > 360
circle       False
triangle     False
rectangle    False
Name: degrees, dtype: boolean
| Parameters | |
|---|---|
| Name | Description | 
| other | scalar, sequence, Series, or DataFrameAny single or multiple element data structure, or list-like object. | 
| axis | {0 or 'index', 1 or 'columns'}, default 'columns'Whether to compare by the index (0 or 'index') or columns (1 or 'columns'). | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame of bool: The result of the comparison. | 
head
head(n: int = 5) -> bigframes.dataframe.DataFrameReturn the first n rows.
This function returns the first n rows for the object based
on position. It is useful for quickly testing if your object
has the right type of data in it.
For negative values of n, this function returns
all rows except the last |n| rows, equivalent to df[:n].
If n is larger than the number of rows, this function returns all rows.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion',
...                     'monkey', 'parrot', 'shark', 'whale', 'zebra']})
>>> df
    animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey
5     parrot
6      shark
7      whale
8      zebra
<BLANKLINE>
[9 rows x 1 columns]
Viewing the first 5 lines:
>>> df.head()
    animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey
<BLANKLINE>
[5 rows x 1 columns]
Viewing the first n lines (three in this case):
>>> df.head(3)
    animal
0  alligator
1        bee
2     falcon
<BLANKLINE>
[3 rows x 1 columns]
For negative values of n:
>>> df.head(-3)
    animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey
5     parrot
<BLANKLINE>
[6 rows x 1 columns]
| Parameter | |
|---|---|
| Name | Description | 
| n | int, default 5Default 5. Number of rows to select. | 
| Returns | |
|---|---|
| Type | Description | 
| same type as caller | The first nrows of the caller object. | 
idxmax
idxmax() -> bigframes.series.SeriesReturn index of first occurrence of maximum over columns.
NA/null values are excluded.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [3, 1, 2], "B": [1, 2, 3]})
>>> df
    A       B
0   3       1
1   1       2
2   2       3
<BLANKLINE>
[3 rows x 2 columns]
>>> df.idxmax()
A    0
B    2
dtype: Int64
| Returns | |
|---|---|
| Type | Description | 
| Series | Indexes of maxima along the columns. | 
idxmin
idxmin() -> bigframes.series.SeriesReturn index of first occurrence of minimum over columns.
NA/null values are excluded.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [3, 1, 2], "B": [1, 2, 3]})
>>> df
    A       B
0   3       1
1   1       2
2   2       3
<BLANKLINE>
[3 rows x 2 columns]
>>> df.idxmin()
A    1
B    0
dtype: Int64
| Returns | |
|---|---|
| Type | Description | 
| Series | Indexes of minima along the columns. | 
info
info(
    verbose: typing.Optional[bool] = None,
    buf=None,
    max_cols: typing.Optional[int] = None,
    memory_usage: typing.Optional[bool] = None,
    show_counts: typing.Optional[bool] = None,
)Print a concise summary of a DataFrame.
This method prints information about a DataFrame including the index dtypeand columns, non-null values and memory usage.
| Parameters | |
|---|---|
| Name | Description | 
| verbose | bool, optionalWhether to print the full summary. By default, the setting in  | 
| buf | writable buffer, defaults to sys.stdoutWhere to send the output. By default, the output is printed to sys.stdout. Pass a writable buffer if you need to further process the output. | 
| max_cols | int, optionalWhen to switch from the verbose to the truncated output. If the DataFrame has more than  | 
| memory_usage | bool, optionalSpecifies whether total memory usage of the DataFrame elements (including the index) should be displayed. By default, this follows the  | 
| show_counts | bool, optionalWhether to show the non-null counts. By default, this is shown only if the DataFrame is smaller than  | 
| Returns | |
|---|---|
| Type | Description | 
| None | This method prints a summary of a DataFrame and returns None. | 
interpolate
interpolate(method: str = "linear") -> bigframes.dataframe.DataFrameFill NaN values using an interpolation method.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     'A': [1, 2, 3, None, None, 6],
...     'B': [None, 6, None, 2, None, 3],
...     }, index=[0, 0.1, 0.3, 0.7, 0.9, 1.0])
>>> df.interpolate()
       A     B
0.0  1.0  <NA>
0.1  2.0   6.0
0.3  3.0   4.0
0.7  4.0   2.0
0.9  5.0   2.5
1.0  6.0   3.0
<BLANKLINE>
[6 rows x 2 columns]
>>> df.interpolate(method="values")
            A         B
0.0       1.0      <NA>
0.1       2.0       6.0
0.3       3.0  4.666667
0.7  4.714286       2.0
0.9  5.571429  2.666667
1.0       6.0       3.0
<BLANKLINE>
[6 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| method | str, default 'linear'Interpolation technique to use. Only 'linear' supported. 'linear': Ignore the index and treat the values as equally spaced. This is the only method supported on MultiIndexes. 'index', 'values': use the actual numerical values of the index. 'pad': Fill in NaNs using existing values. 'nearest', 'zero', 'slinear': Emulates  | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | Returns the same object type as the caller, interpolated at some or all NaNvalues | 
isin
isin(values) -> bigframes.dataframe.DataFrameWhether each element in the DataFrame is contained in values.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'num_legs': [2, 4], 'num_wings': [2, 0]},
...                    index=['falcon', 'dog'])
>>> df
        num_legs  num_wings
falcon         2          2
dog            4          0
<BLANKLINE>
[2 rows x 2 columns]
When values is a list check whether every value in the DataFrame is
present in the list (which animals have 0 or 2 legs or wings).
>>> df.isin([0, 2])
        num_legs  num_wings
falcon      True       True
dog        False       True
<BLANKLINE>
[2 rows x 2 columns]
When values is a dict, we can pass it to check for each column separately:
>>> df.isin({'num_wings': [0, 3]})
        num_legs  num_wings
falcon     False      False
dog        False       True
<BLANKLINE>
[2 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| values | iterable, or dictThe result will only be true at a location if all the labels match. If  | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame of booleans showing whether each element in the DataFrame is contained in values. | 
isna
isna() -> bigframes.dataframe.DataFrameDetect missing values.
Return a boolean same-sized object indicating if the values are NA.
NA values get mapped to True values. Everything else gets mapped to
False values. Characters such as empty strings '' or
numpy.inf are not considered NA values.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> import numpy as np
>>> df = bpd.DataFrame(dict(
...         age=[5, 6, np.nan],
...         born=[bpd.NA, "1940-04-25", "1940-04-25"],
...         name=['Alfred', 'Batman', ''],
...         toy=[None, 'Batmobile', 'Joker'],
... ))
>>> df
    age        born    name        toy
0   5.0        <NA>  Alfred       <NA>
1   6.0  1940-04-25  Batman  Batmobile
2  <NA>  1940-04-25              Joker
<BLANKLINE>
[3 rows x 4 columns]
Show which entries in a DataFrame are NA:
>>> df.isna()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False
<BLANKLINE>
[3 rows x 4 columns]
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False
<BLANKLINE>
[3 rows x 4 columns]
Show which entries in a Series are NA:
>>> ser = bpd.Series([5, None, 6, np.nan, bpd.NA])
>>> ser
0       5
1    <NA>
2       6
3    <NA>
4    <NA>
dtype: Int64
>>> ser.isna()
0    False
1     True
2    False
3     True
4     True
dtype: boolean
>>> ser.isnull()
0    False
1     True
2    False
3     True
4     True
dtype: boolean
isnull
isnull() -> bigframes.dataframe.DataFrameDetect missing values.
Return a boolean same-sized object indicating if the values are NA.
NA values get mapped to True values. Everything else gets mapped to
False values. Characters such as empty strings '' or
numpy.inf are not considered NA values.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> import numpy as np
>>> df = bpd.DataFrame(dict(
...         age=[5, 6, np.nan],
...         born=[bpd.NA, "1940-04-25", "1940-04-25"],
...         name=['Alfred', 'Batman', ''],
...         toy=[None, 'Batmobile', 'Joker'],
... ))
>>> df
    age        born    name        toy
0   5.0        <NA>  Alfred       <NA>
1   6.0  1940-04-25  Batman  Batmobile
2  <NA>  1940-04-25              Joker
<BLANKLINE>
[3 rows x 4 columns]
Show which entries in a DataFrame are NA:
>>> df.isna()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False
<BLANKLINE>
[3 rows x 4 columns]
>>> df.isnull()
    age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False
<BLANKLINE>
[3 rows x 4 columns]
Show which entries in a Series are NA:
>>> ser = bpd.Series([5, None, 6, np.nan, bpd.NA])
>>> ser
0       5
1    <NA>
2       6
3    <NA>
4    <NA>
dtype: Int64
>>> ser.isna()
0    False
1     True
2    False
3     True
4     True
dtype: boolean
>>> ser.isnull()
0    False
1     True
2    False
3     True
4     True
dtype: boolean
items
items()Iterate over (column name, Series) pairs.
Iterates over the DataFrame columns, returning a tuple with the column name and the content as a Series.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'species': ['bear', 'bear', 'marsupial'],
...                     'population': [1864, 22000, 80000]},
...                    index=['panda', 'polar', 'koala'])
>>> df
         species  population
panda       bear        1864
polar       bear       22000
koala  marsupial       80000
<BLANKLINE>
[3 rows x 2 columns]
>>> for label, content in df.items():
...     print(f'--> label: {label}')
...     print(f'--> content:\n{content}')
...
--> label: species
--> content:
panda         bear
polar         bear
koala    marsupial
Name: species, dtype: string
--> label: population
--> content:
panda     1864
polar    22000
koala    80000
Name: population, dtype: Int64
| Returns | |
|---|---|
| Type | Description | 
| Iterator | Iterator of label, Series for each column. | 
iterrows
iterrows() -> typing.Iterable[tuple[typing.Any, pandas.core.series.Series]]Iterate over DataFrame rows as (index, Series) pairs.
:Yields: a tuple (index, data) where data contains row values as a Series
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [4, 5, 6],
...     })
>>> index, row = next(df.iterrows())
>>> index
0
>>> row
A    1
B    4
Name: 0, dtype: object
itertuples
itertuples(
    index: bool = True, name: typing.Optional[str] = "Pandas"
) -> typing.Iterable[tuple[typing.Any, ...]]Iterate over DataFrame rows as namedtuples.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [4, 5, 6],
...     })
>>> next(df.itertuples(name="Pair"))
Pair(Index=0, A=1, B=4)
| Parameters | |
|---|---|
| Name | Description | 
| index | bool, default TrueIf True, return the index as the first element of the tuple. | 
| name | str or None, default "Pandas"The name of the returned namedtuples or None to return regular tuples. | 
| Returns | |
|---|---|
| Type | Description | 
| iterator | An object to iterate over namedtuples for each row in the DataFrame with the first field possibly being the index and following fields being the column values. | 
join
join(
    other: bigframes.dataframe.DataFrame,
    *,
    on: typing.Optional[str] = None,
    how: str = "left"
) -> bigframes.dataframe.DataFrameJoin columns of another DataFrame.
Join columns with other DataFrame on index
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Join two DataFrames by specifying how to handle the operation:
>>> df1 = bpd.DataFrame({'col1': ['foo', 'bar'], 'col2': [1, 2]}, index=[10, 11])
>>> df1
   col1  col2
10  foo     1
11  bar     2
<BLANKLINE>
[2 rows x 2 columns]
>>> df2 = bpd.DataFrame({'col3': ['foo', 'baz'], 'col4': [3, 4]}, index=[11, 22])
>>> df2
   col3  col4
11  foo     3
22  baz     4
<BLANKLINE>
[2 rows x 2 columns]
>>> df1.join(df2)
   col1  col2  col3  col4
10  foo     1  <NA>  <NA>
11  bar     2   foo     3
<BLANKLINE>
[2 rows x 4 columns]
>>> df1.join(df2, how="left")
   col1  col2  col3  col4
10  foo     1  <NA>  <NA>
11  bar     2   foo     3
<BLANKLINE>
[2 rows x 4 columns]
>>> df1.join(df2, how="right")
    col1  col2 col3  col4
11  bar      2  foo     3
22  <NA>  <NA>  baz     4
<BLANKLINE>
[2 rows x 4 columns]
>>> df1.join(df2, how="outer")
    col1  col2  col3  col4
10   foo     1  <NA>  <NA>
11   bar     2   foo     3
22  <NA>  <NA>   baz     4
<BLANKLINE>
[3 rows x 4 columns]
>>> df1.join(df2, how="inner")
   col1  col2 col3  col4
11  bar     2  foo     3
<BLANKLINE>
[1 rows x 4 columns]
Another option to join using the key columns is to use the on parameter:
>>> df1.join(df2, on="col1", how="right")
      col1  col2 col3  col4
<NA>    11  <NA>  foo     3
<NA>    22  <NA>  baz     4
<BLANKLINE>
[2 rows x 4 columns]
| Parameter | |
|---|---|
| Name | Description | 
| how | {'left', 'right', 'outer', 'inner'}, default 'left'How to handle the operation of the two objects.  | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | A dataframe containing columns from both the caller and other. | 
keys
keys() -> pandas.core.indexes.base.IndexGet the 'info axis'.
This is index for Series, columns for DataFrame.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [4, 5, 6],
...     })
>>> df.keys()
Index(['A', 'B'], dtype='object')
| Returns | |
|---|---|
| Type | Description | 
| Index | Info axis. | 
kurt
kurt(*, numeric_only: bool = False)Return unbiased kurtosis over columns.
Kurtosis obtained using Fisher's definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 2, 3, 4, 5],
...                     "B": [3, 4, 3, 2, 1],
...                     "C": [2, 2, 3, 2, 2]})
>>> df
    A       B       C
0   1       3       2
1   2       4       2
2   3       3       3
3   4       2       2
4   5       1       2
<BLANKLINE>
[5 rows x 3 columns]
Calculating the kurtosis value of each column:
>>> df.kurt()
A        -1.2
B   -0.177515
C         5.0
dtype: Float64
| Parameter | |
|---|---|
| Name | Description | 
| numeric_only | bool, default FalseInclude only float, int, boolean columns. | 
| Returns | |
|---|---|
| Type | Description | 
| Series | Series. | 
kurtosis
kurtosis(*, numeric_only: bool = False)Return unbiased kurtosis over columns.
Kurtosis obtained using Fisher's definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 2, 3, 4, 5],
...                     "B": [3, 4, 3, 2, 1],
...                     "C": [2, 2, 3, 2, 2]})
>>> df
    A       B       C
0   1       3       2
1   2       4       2
2   3       3       3
3   4       2       2
4   5       1       2
<BLANKLINE>
[5 rows x 3 columns]
Calculating the kurtosis value of each column:
>>> df.kurt()
A        -1.2
B   -0.177515
C         5.0
dtype: Float64
| Parameter | |
|---|---|
| Name | Description | 
| numeric_only | bool, default FalseInclude only float, int, boolean columns. | 
| Returns | |
|---|---|
| Type | Description | 
| Series | Series. | 
le
le(other: typing.Any, axis: str | int = "columns") -> bigframes.dataframe.DataFrameGet 'less than or equal to' of dataframe and other, element-wise (binary operator <=).
Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison
operators.
Equivalent to ==, !=, <=, <, >=, > with support to choose axis
(rows or columns) and level for comparison.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can use method name:
>>> df = bpd.DataFrame({'angles': [0, 3, 4],
...        'degrees': [360, 180, 360]},
...       index=['circle', 'triangle', 'rectangle'])
>>> df["degrees"].le(180)
circle       False
triangle      True
rectangle    False
Name: degrees, dtype: boolean
You can also use arithmetic operator <=:
>>> df["degrees"] <= 180
circle       False
triangle      True
rectangle    False
Name: degrees, dtype: boolean
| Parameters | |
|---|---|
| Name | Description | 
| other | scalar, sequence, Series, or DataFrameAny single or multiple element data structure, or list-like object. | 
| axis | {0 or 'index', 1 or 'columns'}, default 'columns'Whether to compare by the index (0 or 'index') or columns (1 or 'columns'). | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame of bool. The result of the comparison. | 
lt
lt(other: typing.Any, axis: str | int = "columns") -> bigframes.dataframe.DataFrameGet 'less than' of DataFrame and other, element-wise (binary operator <).
Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison
operators.
Equivalent to ==, !=, <=, <, >=, > with support to choose axis
(rows or columns) and level for comparison.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can use method name:
>>> df = bpd.DataFrame({'angles': [0, 3, 4],
...        'degrees': [360, 180, 360]},
...       index=['circle', 'triangle', 'rectangle'])
>>> df["degrees"].lt(180)
circle       False
triangle     False
rectangle    False
Name: degrees, dtype: boolean
You can also use arithmetic operator <:
>>> df["degrees"] < 180
circle       False
triangle     False
rectangle    False
Name: degrees, dtype: boolean
| Parameters | |
|---|---|
| Name | Description | 
| other | scalar, sequence, Series, or DataFrameAny single or multiple element data structure, or list-like object. | 
| axis | {0 or 'index', 1 or 'columns'}, default 'columns'Whether to compare by the index (0 or 'index') or columns (1 or 'columns'). | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame of bool. The result of the comparison. | 
map
map(func, na_action: typing.Optional[str] = None) -> bigframes.dataframe.DataFrameApply a function to a Dataframe elementwise.
This method applies a function that accepts and returns a scalar to every element of a DataFrame.
Examples:>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Let's use reuse=False flag to make sure a new remote_function
is created every time we run the following code, but you can skip it
to potentially reuse a previously deployed remote_function from
the same user defined function.
>>> @bpd.remote_function(int, float, reuse=False)
... def minutes_to_hours(x):
...     return x/60
>>> df_minutes = bpd.DataFrame(
...     {"system_minutes" : [0, 30, 60, 90, 120],
...      "user_minutes" : [0, 15, 75, 90, 6]})
>>> df_minutes
system_minutes  user_minutes
0               0             0
1              30            15
2              60            75
3              90            90
4             120             6
<BLANKLINE>
[5 rows x 2 columns]
>>> df_hours = df_minutes.map(minutes_to_hours)
>>> df_hours
system_minutes  user_minutes
0             0.0           0.0
1             0.5          0.25
2             1.0          1.25
3             1.5           1.5
4             2.0           0.1
<BLANKLINE>
[5 rows x 2 columns]
If there are NA/None values in the data, you can ignore
applying the remote function on such values by specifying
na_action='ignore'.
>>> df_minutes = bpd.DataFrame(
...     {
...         "system_minutes" : [0, 30, 60, None, 90, 120, bpd.NA],
...         "user_minutes" : [0, 15, 75, 90, 6, None, bpd.NA]
...     }, dtype="Int64")
>>> df_hours = df_minutes.map(minutes_to_hours, na_action='ignore')
>>> df_hours
system_minutes  user_minutes
0             0.0           0.0
1             0.5          0.25
2             1.0          1.25
3            <NA>           1.5
4             1.5           0.1
5             2.0          <NA>
6            <NA>          <NA>
<BLANKLINE>
[7 rows x 2 columns]
| Parameters | |
|---|---|
| Name | Description | 
| func | functionPython function wrapped by  | 
| na_action | Optional[str], default None
 | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | Transformed DataFrame. | 
max
max(
    axis: typing.Union[str, int] = 0, *, numeric_only: bool = False
) -> bigframes.series.SeriesReturn the maximum of the values over the requested axis.
If you want the index of the maximum, use idxmax. This is
the equivalent of the numpy.ndarray method argmax.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 3], "B": [2, 4]})
>>> df
    A       B
0   1       2
1   3       4
<BLANKLINE>
[2 rows x 2 columns]
Finding the maximum value in each column (the default behavior without an explicit axis parameter).
>>> df.max()
A    3
B    4
dtype: Int64
Finding the maximum value in each row.
>>> df.max(axis=1)
0    2
1    4
dtype: Int64
| Parameters | |
|---|---|
| Name | Description | 
| axis | {index (0), columns (1)}Axis for the function to be applied on. For Series this parameter is unused and defaults to 0. | 
| numeric_only | bool. default FalseDefault False. Include only float, int, boolean columns. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.series.Series | Series after the maximum of values. | 
mean
mean(
    axis: typing.Union[str, int] = 0, *, numeric_only: bool = False
) -> bigframes.series.SeriesReturn the mean of the values over the requested axis.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 3], "B": [2, 4]})
>>> df
    A       B
0   1       2
1   3       4
<BLANKLINE>
[2 rows x 2 columns]
Calculating the mean of each column (the default behavior without an explicit axis parameter).
>>> df.mean()
A    2.0
B    3.0
dtype: Float64
Calculating the mean of each row.
>>> df.mean(axis=1)
0    1.5
1    3.5
dtype: Float64
| Parameters | |
|---|---|
| Name | Description | 
| axis | {index (0), columns (1)}Axis for the function to be applied on. For Series this parameter is unused and defaults to 0. | 
| numeric_only | bool. default FalseDefault False. Include only float, int, boolean columns. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.series.Series | Series with the mean of values. | 
median
median(
    *, numeric_only: bool = False, exact: bool = True
) -> bigframes.series.SeriesReturn the median of the values over colunms.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 3], "B": [2, 4]})
>>> df
    A       B
0   1       2
1   3       4
<BLANKLINE>
[2 rows x 2 columns]
Finding the median value of each column.
>>> df.median()
A    2.0
B    3.0
dtype: Float64
| Parameters | |
|---|---|
| Name | Description | 
| numeric_only | bool. default FalseDefault False. Include only float, int, boolean columns. | 
| exact | bool. default TrueDefault True. Get the exact median instead of an approximate one. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.series.Series | Series with the median of values. | 
melt
melt(
    id_vars: typing.Optional[typing.Iterable[typing.Hashable]] = None,
    value_vars: typing.Optional[typing.Iterable[typing.Hashable]] = None,
    var_name: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]] = None,
    value_name: typing.Hashable = "value",
)Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.
This function is useful to massage a DataFrame into a format where one
or more columns are identifier variables (id_vars), while all other
columns, considered measured variables (value_vars), are "unpivoted" to
the row axis, leaving just two non-identifier columns, 'variable' and
'value'.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, None, 3, 4, 5],
...                     "B": [1, 2, 3, 4, 5],
...                     "C": [None, 3.5, None, 4.5, 5.0]})
>>> df
        A       B      C
0     1.0       1   <NA>
1    <NA>       2    3.5
2     3.0       3   <NA>
3     4.0       4    4.5
4     5.0       5    5.0
<BLANKLINE>
[5 rows x 3 columns]
Using melt without optional arguments:
>>> df.melt()
    variable    value
0          A      1.0
1          A     <NA>
2          A      3.0
3          A      4.0
4          A      5.0
5          B      1.0
6          B      2.0
7          B      3.0
8          B      4.0
9          B      5.0
10         C     <NA>
11         C      3.5
12         C     <NA>
13         C      4.5
14         C      5.0
<BLANKLINE>
[15 rows x 2 columns]
Using melt with id_vars and value_vars:
>>> df.melt(id_vars='A', value_vars=['B', 'C'])
      A variable  value
0   1.0        B    1.0
1  <NA>        B    2.0
2   3.0        B    3.0
3   4.0        B    4.0
4   5.0        B    5.0
5   1.0        C   <NA>
6  <NA>        C    3.5
7   3.0        C   <NA>
8   4.0        C    4.5
9   5.0        C    5.0
<BLANKLINE>
[10 rows x 3 columns]
| Parameters | |
|---|---|
| Name | Description | 
| id_vars | tuple, list, or ndarray, optionalColumn(s) to use as identifier variables. | 
| value_vars | tuple, list, or ndarray, optionalColumn(s) to unpivot. If not specified, uses all columns that are not set as  | 
| var_name | scalarName to use for the 'variable' column. If None it uses  | 
| value_name | scalar, default 'value'Name to use for the 'value' column. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | Unpivoted DataFrame. | 
memory_usage
memory_usage(index: bool = True)Return the memory usage of each column in bytes.
The memory usage can optionally include the contribution of
the index and elements of object dtype.
This value is displayed in DataFrame.info by default. This can be
suppressed by setting pandas.options.display.memory_usage to False.
| Parameter | |
|---|---|
| Name | Description | 
| index | bool, default TrueSpecifies whether to include the memory usage of the DataFrame's index in returned Series. If  | 
| Returns | |
|---|---|
| Type | Description | 
| Series | A Series whose index is the original column names and whose values is the memory usage of each column in bytes. | 
merge
merge(
    right: bigframes.dataframe.DataFrame,
    how: typing.Literal["inner", "left", "outer", "right", "cross"] = "inner",
    on: typing.Optional[
        typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]]
    ] = None,
    *,
    left_on: typing.Optional[
        typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]]
    ] = None,
    right_on: typing.Optional[
        typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]]
    ] = None,
    sort: bool = False,
    suffixes: tuple[str, str] = ("_x", "_y")
) -> bigframes.dataframe.DataFrameMerge DataFrame objects with a database-style join.
The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on. When performing a cross merge, no column specifications to merge on are allowed.
Examples:>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Merge DataFrames df1 and df2 by specifiying type of merge:
>>> df1 = bpd.DataFrame({'a': ['foo', 'bar'], 'b': [1, 2]})
>>> df1
     a  b
0  foo  1
1  bar  2
<BLANKLINE>
[2 rows x 2 columns]
>>> df2 = bpd.DataFrame({'a': ['foo', 'baz'], 'c': [3, 4]})
>>> df2
     a  c
0  foo  3
1  baz  4
<BLANKLINE>
[2 rows x 2 columns]
>>> df1.merge(df2, how="inner", on="a")
     a  b  c
0  foo  1  3
<BLANKLINE>
[1 rows x 3 columns]
>>> df1.merge(df2, how='left', on='a')
     a  b     c
0  foo  1     3
1  bar  2  <NA>
<BLANKLINE>
[2 rows x 3 columns]
Merge df1 and df2 on the lkey and rkey columns. The value columns have the default suffixes, _x and _y, appended.
>>> df1 = bpd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
...                     'value': [1, 2, 3, 5]})
>>> df1
  lkey  value
0  foo      1
1  bar      2
2  baz      3
3  foo      5
<BLANKLINE>
[4 rows x 2 columns]
>>> df2 = bpd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
...                     'value': [5, 6, 7, 8]})
>>> df2
  rkey  value
0  foo      5
1  bar      6
2  baz      7
3  foo      8
<BLANKLINE>
[4 rows x 2 columns]
>>> df1.merge(df2, left_on='lkey', right_on='rkey')
  lkey  value_x rkey  value_y
0  foo        1  foo        5
1  foo        1  foo        8
2  bar        2  bar        6
3  baz        3  baz        7
4  foo        5  foo        5
5  foo        5  foo        8
<BLANKLINE>
[6 rows x 4 columns]
| Parameters | |
|---|---|
| Name | Description | 
| on | label or list of labelsColumns to join on. It must be found in both DataFrames. Either on or left_on + right_on must be passed in. | 
| left_on | label or list of labelsColumns to join on in the left DataFrame. Either on or left_on + right_on must be passed in. | 
| right_on | label or list of labelsColumns to join on in the right DataFrame. Either on or left_on + right_on must be passed in. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | A DataFrame of the two merged objects. | 
min
min(
    axis: typing.Union[str, int] = 0, *, numeric_only: bool = False
) -> bigframes.series.SeriesReturn the minimum of the values over the requested axis.
If you want the index of the minimum, use idxmin. This is the
equivalent of the numpy.ndarray method argmin.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 3], "B": [2, 4]})
>>> df
    A       B
0   1       2
1   3       4
<BLANKLINE>
[2 rows x 2 columns]
Finding the minimum value in each column (the default behavior without an explicit axis parameter).
>>> df.min()
A    1
B    2
dtype: Int64
Finding the minimum value in each row.
>>> df.min(axis=1)
0    1
1    3
dtype: Int64
| Parameters | |
|---|---|
| Name | Description | 
| axis | {index (0), columns (1)}Axis for the function to be applied on. For Series this parameter is unused and defaults to 0. | 
| numeric_only | bool, default FalseDefault False. Include only float, int, boolean columns. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.series.Series | Series with the minimum of the values. | 
mod
mod(
    other: int | bigframes.series.Series | bigframes.dataframe.DataFrame,
    axis: str | int = "columns",
) -> bigframes.dataframe.DataFrameGet modulo of DataFrame and other, element-wise (binary operator %).
Equivalent to dataframe % other. With reverse version, rmod.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [4, 5, 6],
...     })
You can use method name:
>>> df['A'].mod(df['B'])
0    1
1    2
2    3
dtype: Int64
You can also use arithmetic operator %:
>>> df['A'] % (df['B'])
0    1
1    2
2    3
dtype: Int64
| Parameter | |
|---|---|
| Name | Description | 
| axis | {0 or 'index', 1 or 'columns'}Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame result of the arithmetic operation. | 
mul
mul(
    other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
    axis: str | int = "columns",
) -> bigframes.dataframe.DataFrameGet multiplication of DataFrame and other, element-wise (binary operator *).
Equivalent to dataframe * other. With reverse version, rmul.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [4, 5, 6],
...     })
You can use method name:
>>> df['A'].mul(df['B'])
0     4
1    10
2    18
dtype: Int64
You can also use arithmetic operator *:
>>> df['A'] * (df['B'])
0     4
1    10
2    18
dtype: Int64
| Parameters | |
|---|---|
| Name | Description | 
| other | float, int, or SeriesAny single or multiple element data structure, or list-like object. | 
| axis | {0 or 'index', 1 or 'columns'}Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame result of the arithmetic operation. | 
multiply
multiply(
    other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
    axis: str | int = "columns",
) -> bigframes.dataframe.DataFrameGet multiplication of DataFrame and other, element-wise (binary operator *).
Equivalent to dataframe * other. With reverse version, rmul.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [4, 5, 6],
...     })
You can use method name:
>>> df['A'].mul(df['B'])
0     4
1    10
2    18
dtype: Int64
You can also use arithmetic operator *:
>>> df['A'] * (df['B'])
0     4
1    10
2    18
dtype: Int64
| Parameters | |
|---|---|
| Name | Description | 
| other | float, int, or SeriesAny single or multiple element data structure, or list-like object. | 
| axis | {0 or 'index', 1 or 'columns'}Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame result of the arithmetic operation. | 
ne
ne(other: typing.Any, axis: str | int = "columns") -> bigframes.dataframe.DataFrameGet not equal to of DataFrame and other, element-wise (binary operator ne).
Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison
operators.
Equivalent to ==, !=, <=, <, >=, > with support to choose axis
(rows or columns) and level for comparison.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can use method name:
>>> df = bpd.DataFrame({'angles': [0, 3, 4],
...        'degrees': [360, 180, 360]},
...       index=['circle', 'triangle', 'rectangle'])
>>> df["degrees"].ne(360)
circle       False
triangle      True
rectangle    False
Name: degrees, dtype: boolean
You can also use arithmetic operator !=:
>>> df["degrees"] != 360
circle       False
triangle      True
rectangle    False
Name: degrees, dtype: boolean
| Parameters | |
|---|---|
| Name | Description | 
| other | scalar, sequence, Series, or DataFrameAny single or multiple element data structure, or list-like object. | 
| axis | {0 or 'index', 1 or 'columns'}, default 'columns'Whether to compare by the index (0 or 'index') or columns (1 or 'columns'). | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | Result of the comparison. | 
nlargest
nlargest(
    n: int,
    columns: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]],
    keep: str = "first",
) -> bigframes.dataframe.DataFrameReturn the first n rows ordered by columns in descending order.
Return the first n rows with the largest values in columns, in
descending order. The columns that are not specified are returned as
well, but not used for ordering.
This method is equivalent to
df.sort_values(columns, ascending=False).head(n), but more
performant.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 1, 3, 3, 5, 5],
...                     "B": [5, 6, 3, 4, 1, 2],
...                     "C": ['a', 'b', 'a', 'b', 'a', 'b']})
>>> df
    A       B       C
0   1       5       a
1   1       6       b
2   3       3       a
3   3       4       b
4   5       1       a
5   5       2       b
<BLANKLINE>
[6 rows x 3 columns]
Returns rows with the largest value in 'A', including all ties:
>>> df.nlargest(1, 'A', keep = "all")
    A       B       C
4   5       1       a
5   5       2       b
<BLANKLINE>
[2 rows x 3 columns]
Returns the first row with the largest value in 'A', default behavior in case of ties:
>>> df.nlargest(1, 'A')
    A       B       C
4   5       1       a
<BLANKLINE>
[1 rows x 3 columns]
Returns the last row with the largest value in 'A' in case of ties:
>>> df.nlargest(1, 'A', keep = "last")
    A       B       C
5   5       2       b
<BLANKLINE>
[1 rows x 3 columns]
Returns the row with the largest combined values in both 'A' and 'C':
>>> df.nlargest(1, ['A', 'C'])
    A       B       C
5   5       2       b
<BLANKLINE>
[1 rows x 3 columns]
| Parameters | |
|---|---|
| Name | Description | 
| n | intNumber of rows to return. | 
| columns | label or list of labelsColumn label(s) to order by. | 
| keep | {'first', 'last', 'all'}, default 'first'Where there are duplicate values: -  | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The first nrows ordered by the given columns in descending order. | 
notna
notna() -> bigframes.dataframe.DataFrameDetect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA.
Non-missing values get mapped to True. Characters such as empty
strings '' or numpy.inf are not considered NA values.
NA values get mapped to False values.
| Returns | |
|---|---|
| Type | Description | 
| NDFrame | Mask of bool values for each element that indicates whether an element is not an NA value. | 
notnull
notnull() -> bigframes.dataframe.DataFrameDetect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA.
Non-missing values get mapped to True. Characters such as empty
strings '' or numpy.inf are not considered NA values.
NA values get mapped to False values.
| Returns | |
|---|---|
| Type | Description | 
| NDFrame | Mask of bool values for each element that indicates whether an element is not an NA value. | 
nsmallest
nsmallest(
    n: int,
    columns: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]],
    keep: str = "first",
) -> bigframes.dataframe.DataFrameReturn the first n rows ordered by columns in ascending order.
Return the first n rows with the smallest values in columns, in
ascending order. The columns that are not specified are returned as
well, but not used for ordering.
This method is equivalent to
df.sort_values(columns, ascending=True).head(n), but more
performant.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 1, 3, 3, 5, 5],
...                     "B": [5, 6, 3, 4, 1, 2],
...                     "C": ['a', 'b', 'a', 'b', 'a', 'b']})
>>> df
    A       B       C
0   1       5       a
1   1       6       b
2   3       3       a
3   3       4       b
4   5       1       a
5   5       2       b
<BLANKLINE>
[6 rows x 3 columns]
Returns rows with the smallest value in 'A', including all ties:
>>> df.nsmallest(1, 'A', keep = "all")
    A       B       C
0   1       5       a
1   1       6       b
<BLANKLINE>
[2 rows x 3 columns]
Returns the first row with the smallest value in 'A', default behavior in case of ties:
>>> df.nsmallest(1, 'A')
    A       B       C
0   1       5       a
<BLANKLINE>
[1 rows x 3 columns]
Returns the last row with the smallest value in 'A' in case of ties:
>>> df.nsmallest(1, 'A', keep = "last")
    A       B       C
1   1       6       b
<BLANKLINE>
[1 rows x 3 columns]
Returns rows with the smallest values in 'A' and 'C'
>>> df.nsmallest(1, ['A', 'C'])
    A       B       C
0   1       5       a
<BLANKLINE>
[1 rows x 3 columns]
| Parameters | |
|---|---|
| Name | Description | 
| n | intNumber of rows to return. | 
| columns | label or list of labelsColumn label(s) to order by. | 
| keep | {'first', 'last', 'all'}, default 'first'Where there are duplicate values: -  | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The first nrows ordered by the given columns in ascending order. | 
nunique
nunique() -> bigframes.series.SeriesCount number of distinct elements in each column.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [3, 1, 2], "B": [1, 2, 2]})
>>> df
    A       B
0   3       1
1   1       2
2   2       2
<BLANKLINE>
[3 rows x 2 columns]
>>> df.nunique()
A    3
B    2
dtype: Int64
| Returns | |
|---|---|
| Type | Description | 
| bigframes.series.Series | Series with number of distinct elements. | 
pct_change
pct_change(periods: int = 1) -> bigframes.dataframe.DataFrameFractional change between the current and a prior element.
Computes the fractional change from the immediately previous row by default. This is useful in comparing the fraction of change in a time series of elements.
| Parameter | |
|---|---|
| Name | Description | 
| periods | int, default 1Periods to shift for forming percent change. | 
| Returns | |
|---|---|
| Type | Description | 
| Series or DataFrame | The same type as the calling object. | 
peek
peek(n: int = 5, *, force: bool = True) -> pandas.core.frame.DataFramePreview n arbitrary rows from the dataframe. No guarantees about row selection or ordering.
DataFrame.peek(force=False) will always be very fast, but will not succeed if data requires
full data scanning. Using force=True will always succeed, but may be perform queries.
Query results will be cached so that future steps will benefit from these queries.
| Parameters | |
|---|---|
| Name | Description | 
| n | int, default 5The number of rows to select from the dataframe. Which N rows are returned is non-deterministic. | 
| force | bool, default TrueIf the data cannot be peeked efficiently, the dataframe will instead be fully materialized as part of the operation if  | 
| Exceptions | |
|---|---|
| Type | Description | 
| ValueError | If force=False and data cannot be efficiently peeked. | 
| Returns | |
|---|---|
| Type | Description | 
| pandas.DataFrame | A pandas DataFrame with n rows. | 
pipe
pipe(func: Callable[..., T] | tuple[Callable[..., T], str], *args, **kwargs) -> TApply chainable functions that expect Series or DataFrames.
Examples:
Constructing a income DataFrame from a dictionary.
>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> data = [[8000, 1000], [9500, np.nan], [5000, 2000]]
>>> df = bpd.DataFrame(data, columns=['Salary', 'Others'])
>>> df
   Salary  Others
0    8000  1000.0
1    9500    <NA>
2    5000  2000.0
<BLANKLINE>
[3 rows x 2 columns]
Functions that perform tax reductions on an income DataFrame.
>>> def subtract_federal_tax(df):
...     return df * 0.9
>>> def subtract_state_tax(df, rate):
...     return df * (1 - rate)
>>> def subtract_national_insurance(df, rate, rate_increase):
...     new_rate = rate + rate_increase
...     return df * (1 - new_rate)
Instead of writing
>>> subtract_national_insurance(
...     subtract_state_tax(subtract_federal_tax(df), rate=0.12),
...     rate=0.05,
...     rate_increase=0.02)  # doctest: +SKIP
You can write
>>> (
...     df.pipe(subtract_federal_tax)
...     .pipe(subtract_state_tax, rate=0.12)
...     .pipe(subtract_national_insurance, rate=0.05, rate_increase=0.02)
... )
    Salary   Others
0  5892.48   736.56
1  6997.32     <NA>
2   3682.8  1473.12
<BLANKLINE>
[3 rows x 2 columns]
If you have a function that takes the data as (say) the second
argument, pass a tuple indicating which keyword expects the
data. For example, suppose national_insurance takes its data as df
in the second argument:
>>> def subtract_national_insurance(rate, df, rate_increase):
...     new_rate = rate + rate_increase
...     return df * (1 - new_rate)
>>> (
...     df.pipe(subtract_federal_tax)
...     .pipe(subtract_state_tax, rate=0.12)
...     .pipe(
...         (subtract_national_insurance, 'df'),
...         rate=0.05,
...         rate_increase=0.02
...     )
... )
    Salary   Others
0  5892.48   736.56
1  6997.32     <NA>
2   3682.8  1473.12
<BLANKLINE>
[3 rows x 2 columns]
| Parameters | |
|---|---|
| Name | Description | 
| args | iterable, optionalPositional arguments passed into  | 
| kwargs | mapping, optionalA dictionary of keyword arguments passed into  | 
| func | functionFunction to apply to this object.  | 
pivot
pivot(
    *,
    columns: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]],
    index: typing.Optional[
        typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]]
    ] = None,
    values: typing.Optional[
        typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]]
    ] = None
) -> bigframes.dataframe.DataFrameReturn reshaped DataFrame organized by given index / column values.
Reshape data (produce a "pivot" table) based on column values. Uses
unique values from specified index / columns to form axes of the
resulting DataFrame. This function does not support data
aggregation, multiple values will result in a MultiIndex in the
columns.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     "foo": ["one", "one", "one", "two", "two"],
...     "bar": ["A", "B", "C", "A", "B"],
...     "baz": [1, 2, 3, 4, 5],
...     "zoo": ['x', 'y', 'z', 'q', 'w']
... })
>>> df
    foo     bar     baz     zoo
0   one       A       1       x
1   one       B       2       y
2   one       C       3       z
3   two       A       4       q
4   two       B       5       w
<BLANKLINE>
[5 rows x 4 columns]
Using pivot without optional arguments:
>>> df.pivot(columns='foo')
        bar             baz             zoo
foo  one     two     one     two     one     two
0      A    <NA>       1    <NA>       x    <NA>
1      B    <NA>       2    <NA>       y    <NA>
2      C    <NA>       3    <NA>       z    <NA>
3   <NA>       A    <NA>       4    <NA>       q
4   <NA>       B    <NA>       5    <NA>       w
<BLANKLINE>
[5 rows x 6 columns]
Using pivot with index and values:
>>> df.pivot(columns='foo', index='bar', values='baz')
foo     one     two
bar
A       1         4
B       2         5
C       3      <NA>
<BLANKLINE>
[3 rows x 2 columns]
| Parameters | |
|---|---|
| Name | Description | 
| columns | str or object or a list of strColumn to use to make new frame's columns. | 
| index | str or object or a list of str, optionalColumn to use to make new frame's index. If not given, uses existing index. | 
| values | str, object or a list of the previous, optionalColumn(s) to use for populating new frame's values. If not specified, all remaining columns will be used and the result will have hierarchically indexed columns. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | Returns reshaped DataFrame. | 
pivot_table
pivot_table(
    values: typing.Optional[
        typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]]
    ] = None,
    index: typing.Optional[
        typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]]
    ] = None,
    columns: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]] = None,
    aggfunc: str = "mean",
) -> bigframes.dataframe.DataFrameCreate a spreadsheet-style pivot table as a DataFrame.
The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     'Product': ['Product A', 'Product B', 'Product A', 'Product B', 'Product A', 'Product B'],
...     'Region': ['East', 'West', 'East', 'West', 'West', 'East'],
...     'Sales': [100, 200, 150, 100, 200, 150],
...     'Rating': [3, 5, 4, 3, 3, 5]
... })
>>> df
     Product Region  Sales  Rating
0  Product A   East    100       3
1  Product B   West    200       5
2  Product A   East    150       4
3  Product B   West    100       3
4  Product A   West    200       3
5  Product B   East    150       5
<BLANKLINE>
[6 rows x 4 columns]
Using pivot_table with default aggfunc "mean":
>>> pivot_table = df.pivot_table(
...     values=['Sales', 'Rating'],
...     index='Product',
...     columns='Region'
... )
>>> pivot_table
          Rating       Sales
Region      East West   East   West
Product
Product A    3.5  3.0  125.0  200.0
Product B    5.0  4.0  150.0  150.0
<BLANKLINE>
[2 rows x 4 columns]
Using pivot_table with specified aggfunc "max":
>>> pivot_table = df.pivot_table(
...     values=['Sales', 'Rating'],
...     index='Product',
...     columns='Region',
...     aggfunc="max"
... )
>>> pivot_table
          Rating      Sales
Region      East West  East West
Product
Product A      4    3   150  200
Product B      5    5   150  200
<BLANKLINE>
[2 rows x 4 columns]
| Parameters | |
|---|---|
| Name | Description | 
| values | str, object or a list of the previous, optionalColumn(s) to use for populating new frame's values. If not specified, all remaining columns will be used and the result will have hierarchically indexed columns. | 
| index | str or object or a list of str, optionalColumn to use to make new frame's index. If not given, uses existing index. | 
| columns | str or object or a list of strColumn to use to make new frame's columns. | 
| aggfunc | str, default "mean"Aggregation function name to compute summary statistics (e.g., 'sum', 'mean'). | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | An Excel style pivot table. | 
pow
pow(
    other: int | bigframes.series.Series, axis: str | int = "columns"
) -> bigframes.dataframe.DataFrameGet Exponential power of dataframe and other, element-wise (binary operator **).
Equivalent to dataframe ** other, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, rpow.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [4, 5, 6],
...     })
You can use method name:
>>> df['A'].pow(df['B'])
0      1
1     32
2    729
dtype: Int64
You can also use arithmetic operator **:
>>> df['A'] ** (df['B'])
0      1
1     32
2    729
dtype: Int64
| Parameters | |
|---|---|
| Name | Description | 
| other | float, int, or SeriesAny single or multiple element data structure, or list-like object. | 
| axis | {0 or 'index', 1 or 'columns'}Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame result of the arithmetic operation. | 
prod
prod(
    axis: typing.Union[str, int] = 0, *, numeric_only: bool = False
) -> bigframes.series.SeriesReturn the product of the values over the requested axis.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 2, 3], "B": [4.5, 5.5, 6.5]})
>>> df
    A    B
0   1  4.5
1   2  5.5
2   3  6.5
<BLANKLINE>
[3 rows x 2 columns]
Calculating the product of each column(the default behavior without an explicit axis parameter):
>>> df.prod()
A        6.0
B    160.875
dtype: Float64
Calculating the product of each row:
>>> df.prod(axis=1)
0     4.5
1    11.0
2    19.5
dtype: Float64
| Parameters | |
|---|---|
| Name | Description | 
| axis | {index (0), columns (1)}Axis for the function to be applied on. For Series this parameter is unused and defaults to 0. | 
| numeric_only | bool. default FalseInclude only float, int, boolean columns. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.series.Series | Series with the product of the values. | 
product
product(
    axis: typing.Union[str, int] = 0, *, numeric_only: bool = False
) -> bigframes.series.SeriesReturn the product of the values over the requested axis.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 2, 3], "B": [4.5, 5.5, 6.5]})
>>> df
    A    B
0   1  4.5
1   2  5.5
2   3  6.5
<BLANKLINE>
[3 rows x 2 columns]
Calculating the product of each column(the default behavior without an explicit axis parameter):
>>> df.prod()
A        6.0
B    160.875
dtype: Float64
Calculating the product of each row:
>>> df.prod(axis=1)
0     4.5
1    11.0
2    19.5
dtype: Float64
| Parameters | |
|---|---|
| Name | Description | 
| axis | {index (0), columns (1)}Axis for the function to be applied on. For Series this parameter is unused and defaults to 0. | 
| numeric_only | bool. default FalseInclude only float, int, boolean columns. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.series.Series | Series with the product of the values. | 
quantile
quantile(
    q: typing.Union[float, typing.Sequence[float]] = 0.5, *, numeric_only: bool = False
)Return values at the given quantile over requested axis.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame(np.array([[1, 1], [2, 10], [3, 100], [4, 100]]),
...                   columns=['a', 'b'])
>>> df.quantile(.1)
a    1.3
b    3.7
Name: 0.1, dtype: Float64
>>> df.quantile([.1, .5])
       a     b
0.1  1.3   3.7
0.5  2.5  55.0
<BLANKLINE>
[2 rows x 2 columns]
| Parameters | |
|---|---|
| Name | Description | 
| q | float or array-like, default 0.5 (50% quantile)Value between 0 <= q <= 1, the quantile(s) to compute. | 
| numeric_only | bool, default FalseInclude only  | 
| Returns | |
|---|---|
| Type | Description | 
| Series or DataFrame | If qis an array, a DataFrame will be returned where the index isq, the columns are the columns of self, and the values are the quantiles. Ifqis a float, a Series will be returned where the index is the columns of self and the values are the quantiles. | 
query
query(expr: str) -> bigframes.dataframe.DataFrameQuery the columns of a DataFrame with a boolean expression.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'A': range(1, 6),
...                    'B': range(10, 0, -2),
...                    'C C': range(10, 5, -1)})
>>> df
A   B  C C
0  1  10   10
1  2   8    9
2  3   6    8
3  4   4    7
4  5   2    6
<BLANKLINE>
[5 rows x 3 columns]
>>> df.query('A > B')
A  B  C C
4  5  2    6
<BLANKLINE>
[1 rows x 3 columns]
The previous expression is equivalent to
>>> df[df.A > df.B]
A  B  C C
4  5  2    6
<BLANKLINE>
[1 rows x 3 columns]
For columns with spaces in their name, you can use backtick quoting.
>>> df.query('B == `C C`')
A   B  C C
0  1  10   10
<BLANKLINE>
[1 rows x 3 columns]
The previous expression is equivalent to
>>> df[df.B == df['C C']]
A   B  C C
0  1  10   10
<BLANKLINE>
[1 rows x 3 columns]
| Parameter | |
|---|---|
| Name | Description | 
| expr | strThe query string to evaluate. You can refer to variables in the environment by prefixing them with an '@' character like  | 
radd
radd(
    other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
    axis: str | int = "columns",
) -> bigframes.dataframe.DataFrameGet addition of DataFrame and other, element-wise (binary operator +).
Equivalent to other + dataframe. With reverse version, add.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [4, 5, 6],
...     })
You can use method name:
>>> df['A'].radd(df['B'])
0    5
1    7
2    9
dtype: Int64
You can also use arithmetic operator +:
>>> df['A'] + df['B']
0    5
1    7
2    9
dtype: Int64
| Parameters | |
|---|---|
| Name | Description | 
| other | float, int, or SeriesAny single or multiple element data structure, or list-like object. | 
| axis | {0 or 'index', 1 or 'columns'}Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame result of the arithmetic operation. | 
rank
rank(
    axis=0,
    method: str = "average",
    numeric_only=False,
    na_option: str = "keep",
    ascending=True,
) -> bigframes.dataframe.DataFrameCompute numerical data ranks (1 through n) along axis.
By default, equal values are assigned a rank that is the average of the ranks of those values.
| Parameters | |
|---|---|
| Name | Description | 
| method | {'average', 'min', 'max', 'first', 'dense'}, default 'average'How to rank the group of records that have the same value (i.e. ties):  | 
| numeric_only | bool, default FalseFor DataFrame objects, rank only numeric columns if set to True. | 
| na_option | {'keep', 'top', 'bottom'}, default 'keep'How to rank NaN values:  | 
| ascending | bool, default TrueWhether or not the elements should be ranked in ascending order. | 
| Returns | |
|---|---|
| Type | Description | 
| same type as caller | Return a Series or DataFrame with data ranks as values. | 
rdiv
rdiv(
    other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
    axis: str | int = "columns",
) -> bigframes.dataframe.DataFrameGet floating division of DataFrame and other, element-wise (binary operator /).
Equivalent to other / dataframe. With reverse version, truediv.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [4, 5, 6],
...     })
>>> df['A'].rtruediv(df['B'])
0    4.0
1    2.5
2    2.0
dtype: Float64
It's equivalent to using arithmetic operator: /:
>>> df['B'] / (df['A'])
0    4.0
1    2.5
2    2.0
dtype: Float64
| Parameters | |
|---|---|
| Name | Description | 
| other | float, int, or SeriesAny single or multiple element data structure, or list-like object. | 
| axis | {0 or 'index', 1 or 'columns'}Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame result of the arithmetic operation. | 
reindex
reindex(
    labels=None,
    *,
    index=None,
    columns=None,
    axis: typing.Optional[typing.Union[str, int]] = None,
    validate: typing.Optional[bool] = None
)Conform DataFrame to new index with optional filling logic.
Places NA in locations having no value in the previous index. A new object is produced.
| Parameters | |
|---|---|
| Name | Description | 
| labels | array-like, optionalNew labels / index to conform the axis specified by 'axis' to. | 
| index | array-like, optionalNew labels for the index. Preferably an Index object to avoid duplicating data. | 
| columns | array-like, optionalNew labels for the columns. Preferably an Index object to avoid duplicating data. | 
| axis | int or str, optionalAxis to target. Can be either the axis name ('index', 'columns') or number (0, 1). | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame with changed index. | 
reindex_like
reindex_like(
    other: bigframes.dataframe.DataFrame, *, validate: typing.Optional[bool] = None
)Return an object with matching indices as other object.
Conform the object to the same index on all axes. Optional filling logic, placing Null in locations having no value in the previous index.
| Parameter | |
|---|---|
| Name | Description | 
| other | Object of the same data typeIts row and column indices are used to define the new indices of this object. | 
| Returns | |
|---|---|
| Type | Description | 
| Series or DataFrame | Same type as caller, but with changed indices on each axis. | 
rename
rename(
    *, columns: typing.Mapping[typing.Hashable, typing.Hashable]
) -> bigframes.dataframe.DataFrameRename columns.
Dict values must be unique (1-to-1). Labels not contained in a dict will be left as-is. Extra labels listed don't throw an error.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
>>> df
   A  B
0  1  4
1  2  5
2  3  6
<BLANKLINE>
[3 rows x 2 columns]
Rename columns using a mapping:
>>> df.rename(columns={"A": "col1", "B": "col2"})
   col1  col2
0     1     4
1     2     5
2     3     6
<BLANKLINE>
[3 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| columns | MappingDict-like from old column labels to new column labels. | 
| Exceptions | |
|---|---|
| Type | Description | 
| KeyError | If any of the labels is not found. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | DataFrame with the renamed axis labels. | 
rename_axis
rename_axis(
    mapper: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]], **kwargs
) -> bigframes.dataframe.DataFrameSet the name of the axis for the index.
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | DataFrame with the new index name | 
reorder_levels
reorder_levels(
    order: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]],
    axis: int | str = 0,
)Rearrange index levels using input order. May not drop or duplicate levels.
| Parameters | |
|---|---|
| Name | Description | 
| order | list of int or list of strList representing new level order. Reference level by number (position) or by key (label). | 
| axis | {0 or 'index', 1 or 'columns'}, default 0Where to reorder levels. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame of rearranged index. | 
replace
replace(to_replace: typing.Any, value: typing.Any = None, *, regex: bool = False)Replace values given in to_replace with value.
Values of the Series/DataFrame are replaced with other values dynamically.
This differs from updating with .loc or .iloc, which require
you to specify a location to update with some value.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     'int_col': [1, 1, 2, 3],
...     'string_col': ["a", "b", "c", "b"],
...     })
Using scalar to_replace and value:
>>> df.replace("b", "e")
   int_col string_col
0        1          a
1        1          e
2        2          c
3        3          e
<BLANKLINE>
[4 rows x 2 columns]
Using dictionary:
>>> df.replace({"a": "e", 2: 5})
   int_col string_col
0        1          e
1        1          b
2        5          c
3        3          b
<BLANKLINE>
[4 rows x 2 columns]
Using regex:
>>> df.replace("[ab]", "e", regex=True)
   int_col string_col
0        1          e
1        1          e
2        2          c
3        3          e
<BLANKLINE>
[4 rows x 2 columns]
| Parameters | |
|---|---|
| Name | Description | 
| to_replace | str, regex, list, int, float or NoneHow to find the values that will be replaced. numeric: numeric values equal to  | 
| value | scalar, default NoneValue to replace any values matching  | 
| regex | bool, default FalseWhether to interpret  | 
| Returns | |
|---|---|
| Type | Description | 
| Series/DataFrame | Object after replacement. | 
reset_index
reset_index(*, drop: bool = False) -> bigframes.dataframe.DataFrameReset the index.
Reset the index of the DataFrame, and use the default one instead.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> import numpy as np
>>> df = bpd.DataFrame([('bird', 389.0),
...                     ('bird', 24.0),
...                     ('mammal', 80.5),
...                     ('mammal', np.nan)],
...                    index=['falcon', 'parrot', 'lion', 'monkey'],
...                    columns=('class', 'max_speed'))
>>> df
         class  max_speed
falcon    bird      389.0
parrot    bird       24.0
lion    mammal       80.5
monkey  mammal       <NA>
<BLANKLINE>
[4 rows x 2 columns]
When we reset the index, the old index is added as a column, and a new sequential index is used:
>>> df.reset_index()
    index   class  max_speed
0  falcon    bird      389.0
1  parrot    bird       24.0
2    lion  mammal       80.5
3  monkey  mammal       <NA>
<BLANKLINE>
[4 rows x 3 columns]
We can use the drop parameter to avoid the old index being added as a column:
>>> df.reset_index(drop=True)
    class  max_speed
0    bird      389.0
1    bird       24.0
2  mammal       80.5
3  mammal       <NA>
<BLANKLINE>
[4 rows x 2 columns]
You can also use reset_index with MultiIndex.
>>> import pandas as pd
>>> index = pd.MultiIndex.from_tuples([('bird', 'falcon'),
...                                    ('bird', 'parrot'),
...                                    ('mammal', 'lion'),
...                                    ('mammal', 'monkey')],
...                                   names=['class', 'name'])
>>> columns = ['speed', 'max']
>>> df = bpd.DataFrame([(389.0, 'fly'),
...                     (24.0, 'fly'),
...                     (80.5, 'run'),
...                     (np.nan, 'jump')],
...                    index=index,
...                    columns=columns)
>>> df
               speed   max
class  name
bird   falcon  389.0   fly
       parrot   24.0   fly
mammal lion     80.5   run
       monkey   <NA>  jump
<BLANKLINE>
[4 rows x 2 columns]
>>> df.reset_index()
    class    name  speed   max
0    bird  falcon  389.0   fly
1    bird  parrot   24.0   fly
2  mammal    lion   80.5   run
3  mammal  monkey   <NA>  jump
<BLANKLINE>
[4 rows x 4 columns]
>>> df.reset_index(drop=True)
   speed   max
0  389.0   fly
1   24.0   fly
2   80.5   run
3   <NA>  jump
<BLANKLINE>
[4 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description | 
| drop | bool, default FalseDo not try to insert index into dataframe columns. This resets the index to the default integer index. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.dataframe.DataFrame | DataFrame with the new index. | 
rfloordiv
rfloordiv(
    other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
    axis: str | int = "columns",
) -> bigframes.dataframe.DataFrameGet integer division of DataFrame and other, element-wise (binary operator //).
Equivalent to other // dataframe. With reverse version, rfloordiv.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [4, 5, 6],
...     })
>>> df['A'].rfloordiv(df['B'])
0    4
1    2
2    2
dtype: Int64
It's equivalent to using arithmetic operator: //:
>>> df['B'] // (df['A'])
0    4
1    2
2    2
dtype: Int64
| Parameters | |
|---|---|
| Name | Description | 
| other | float, int, or SeriesAny single or multiple element data structure, or list-like object. | 
| axis | {0 or 'index', 1 or 'columns'}Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame result of the arithmetic operation. | 
rmod
rmod(
    other: int | bigframes.series.Series | bigframes.dataframe.DataFrame,
    axis: str | int = "columns",
) -> bigframes.dataframe.DataFrameGet modulo of DataFrame and other, element-wise (binary operator %).
Equivalent to other % dataframe. With reverse version, mod.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [4, 5, 6],
...     })
>>> df['A'].rmod(df['B'])
0    0
1    1
2    0
dtype: Int64
It's equivalent to using arithmetic operator: %:
>>> df['B'] % (df['A'])
0    0
1    1
2    0
dtype: Int64
| Parameters | |
|---|---|
| Name | Description | 
| other | float, int, or SeriesAny single or multiple element data structure, or list-like object. | 
| axis | {0 or 'index', 1 or 'columns'}Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame result of the arithmetic operation. | 
rmul
rmul(
    other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
    axis: str | int = "columns",
) -> bigframes.dataframe.DataFrameGet multiplication of DataFrame and other, element-wise (binary operator *).
Equivalent to other * dataframe. With reverse version, mul.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [4, 5, 6],
...     })
You can use method name:
>>> df['A'].rmul(df['B'])
0     4
1    10
2    18
dtype: Int64
You can also use arithmetic operator *:
>>> df['A'] * (df['B'])
0     4
1    10
2    18
dtype: Int64
| Parameters | |
|---|---|
| Name | Description | 
| other | float, int, or SeriesAny single or multiple element data structure, or list-like object. | 
| axis | {0 or 'index', 1 or 'columns'}Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame result of the arithmetic operation. | 
rolling
rolling(window: int, min_periods=None) -> bigframes.core.window.WindowProvide rolling window calculations.
| Parameters | |
|---|---|
| Name | Description | 
| window | int, timedelta, str, offset, or BaseIndexer subclassSize of the moving window. If an integer, the fixed number of observations used for each window. If a timedelta, str, or offset, the time period of each window. Each window will be a variable sized based on the observations included in the time-period. This is only valid for datetime-like indexes. To learn more about the offsets & frequency strings, please see  | 
| min_periods | int, default NoneMinimum number of observations in window required to have a value; otherwise, result is  | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.core.window.Window | Windowsubclass if awin_typeis passed.Rollingsubclass ifwin_typeis not passed. | 
rpow
rpow(
    other: int | bigframes.series.Series, axis: str | int = "columns"
) -> bigframes.dataframe.DataFrameGet Exponential power of dataframe and other, element-wise (binary operator rpow).
Equivalent to other ** dataframe, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, pow.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [4, 5, 6],
...     })
>>> df['A'].rpow(df['B'])
0      4
1     25
2    216
dtype: Int64
It's equivalent to using arithmetic operator: **:
>>> df['B'] ** (df['A'])
0      4
1     25
2    216
dtype: Int64
| Parameters | |
|---|---|
| Name | Description | 
| other | float, int, or SeriesAny single or multiple element data structure, or list-like object. | 
| axis | {0 or 'index', 1 or 'columns'}Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame result of the arithmetic operation. | 
rsub
rsub(
    other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
    axis: str | int = "columns",
) -> bigframes.dataframe.DataFrameGet subtraction of DataFrame and other, element-wise (binary operator -).
Equivalent to other - dataframe. With reverse version, sub.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [4, 5, 6],
...     })
>>> df['A'].rsub(df['B'])
0    3
1    3
2    3
dtype: Int64
It's equivalent to using arithmetic operator: -:
>>> df['B'] - (df['A'])
0    3
1    3
2    3
dtype: Int64
| Parameters | |
|---|---|
| Name | Description | 
| other | float, int, or SeriesAny single or multiple element data structure, or list-like object. | 
| axis | {0 or 'index', 1 or 'columns'}Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame result of the arithmetic operation. | 
rtruediv
rtruediv(
    other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
    axis: str | int = "columns",
) -> bigframes.dataframe.DataFrameGet floating division of DataFrame and other, element-wise (binary operator /).
Equivalent to other / dataframe. With reverse version, truediv.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [4, 5, 6],
...     })
>>> df['A'].rtruediv(df['B'])
0    4.0
1    2.5
2    2.0
dtype: Float64
It's equivalent to using arithmetic operator: /:
>>> df['B'] / (df['A'])
0    4.0
1    2.5
2    2.0
dtype: Float64
| Parameters | |
|---|---|
| Name | Description | 
| other | float, int, or SeriesAny single or multiple element data structure, or list-like object. | 
| axis | {0 or 'index', 1 or 'columns'}Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame result of the arithmetic operation. | 
sample
sample(
    n: typing.Optional[int] = None,
    frac: typing.Optional[float] = None,
    *,
    random_state: typing.Optional[int] = None,
    sort: typing.Optional[typing.Union[bool, typing.Literal["random"]]] = "random"
) -> bigframes.dataframe.DataFrameReturn a random sample of items from an axis of object.
You can use random_state for reproducibility.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'num_legs': [2, 4, 8, 0],
...                     'num_wings': [2, 0, 0, 0],
...                     'num_specimen_seen': [10, 2, 1, 8]},
...                    index=['falcon', 'dog', 'spider', 'fish'])
>>> df
        num_legs  num_wings  num_specimen_seen
falcon         2          2                 10
dog            4          0                  2
spider         8          0                  1
fish           0          0                  8
<BLANKLINE>
[4 rows x 3 columns]
Fetch one random row from the DataFrame (Note that we use random_state
to ensure reproducibility of the examples):
>>> df.sample(random_state=1)
     num_legs  num_wings  num_specimen_seen
dog         4          0                  2
<BLANKLINE>
[1 rows x 3 columns]
A random 50% sample of the DataFrame:
>>> df.sample(frac=0.5, random_state=1)
      num_legs  num_wings  num_specimen_seen
dog          4          0                  2
fish         0          0                  8
<BLANKLINE>
[2 rows x 3 columns]
Extract 3 random elements from the Series df['num_legs']:
>>> s = df['num_legs']
>>> s.sample(n=3, random_state=1)
dog       4
fish      0
spider    8
Name: num_legs, dtype: Int64
| Parameters | |
|---|---|
| Name | Description | 
| n | Optional[int], default NoneNumber of items from axis to return. Cannot be used with  | 
| frac | Optional[float], default NoneFraction of axis items to return. Cannot be used with  | 
| random_state | Optional[int], default NoneSeed for random number generator. | 
| sort | Optional[bool|Literal["random"]], default "random"
 | 
select_dtypes
select_dtypes(include=None, exclude=None) -> bigframes.dataframe.DataFrameReturn a subset of the DataFrame's columns based on the column dtypes.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': ["hello", "world"], 'col3': [True, False]})
>>> df.select_dtypes(include=['Int64'])
   col1
0     1
1     2
<BLANKLINE>
[2 rows x 1 columns]
>>> df.select_dtypes(exclude=['Int64'])
    col2   col3
0  hello   True
1  world  False
<BLANKLINE>
[2 rows x 2 columns]
| Parameters | |
|---|---|
| Name | Description | 
| include | scalar or list-likeA selection of dtypes or strings to be included. | 
| exclude | scalar or list-likeA selection of dtypes or strings to be excluded. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The subset of the frame including the dtypes in includeand excluding the dtypes inexclude. | 
set_index
set_index(
    keys: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]],
    append: bool = False,
    drop: bool = True,
) -> bigframes.dataframe.DataFrameSet the DataFrame index using existing columns.
Set the DataFrame index (row labels) using one existing column. The index can replace the existing index.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'month': [1, 4, 7, 10],
...                     'year': [2012, 2014, 2013, 2014],
...                     'sale': [55, 40, 84, 31]})
>>> df
   month  year  sale
0      1  2012    55
1      4  2014    40
2      7  2013    84
3     10  2014    31
<BLANKLINE>
[4 rows x 3 columns]
Set the 'month' column to become the index:
>>> df.set_index('month')
       year  sale
month
1      2012    55
4      2014    40
7      2013    84
10     2014    31
<BLANKLINE>
[4 rows x 2 columns]
Create a MultiIndex using columns 'year' and 'month':
>>> df.set_index(['year', 'month'])
            sale
year month
2012 1        55
2014 4        40
2013 7        84
2014 10       31
<BLANKLINE>
[4 rows x 1 columns]
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | Changed row labels. | 
shift
shift(periods: int = 1) -> bigframes.dataframe.DataFrameShift index by desired number of periods.
Shifts the index without realigning the data.
| Returns | |
|---|---|
| Type | Description | 
| NDFrame | Copy of input object, shifted. | 
skew
skew(*, numeric_only: bool = False)Return unbiased skew over columns.
Normalized by N-1.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'A': [1, 2, 3, 4, 5],
...                    'B': [5, 4, 3, 2, 1],
...                    'C': [2, 2, 3, 2, 2]})
>>> df
    A       B       C
0   1       5       2
1   2       4       2
2   3       3       3
3   4       2       2
4   5       1       2
<BLANKLINE>
[5 rows x 3 columns]
Calculating the skewness of each column.
>>> df.skew()
A         0.0
B         0.0
C    2.236068
dtype: Float64
| Parameter | |
|---|---|
| Name | Description | 
| numeric_only | bool, default FalseInclude only float, int, boolean columns. | 
| Returns | |
|---|---|
| Type | Description | 
| Series | Series. | 
sort_index
sort_index(
    ascending: bool = True, na_position: typing.Literal["first", "last"] = "last"
) -> bigframes.dataframe.DataFrameSort object by labels (along an axis).
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The original DataFrame sorted by the labels. | 
sort_values
sort_values(
    by: typing.Union[str, typing.Sequence[str]],
    *,
    ascending: typing.Union[bool, typing.Sequence[bool]] = True,
    kind: str = "quicksort",
    na_position: typing.Literal["first", "last"] = "last"
) -> bigframes.dataframe.DataFrameSort by the values along row axis.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     'col1': ['A', 'A', 'B', bpd.NA, 'D', 'C'],
...     'col2': [2, 1, 9, 8, 7, 4],
...     'col3': [0, 1, 9, 4, 2, 3],
...     'col4': ['a', 'B', 'c', 'D', 'e', 'F']
... })
>>> df
   col1  col2  col3 col4
0     A     2     0    a
1     A     1     1    B
2     B     9     9    c
3  <NA>     8     4    D
4     D     7     2    e
5     C     4     3    F
<BLANKLINE>
[6 rows x 4 columns]
Sort by col1:
>>> df.sort_values(by=['col1'])
   col1  col2  col3 col4
0     A     2     0    a
1     A     1     1    B
2     B     9     9    c
5     C     4     3    F
4     D     7     2    e
3  <NA>     8     4    D
<BLANKLINE>
[6 rows x 4 columns]
Sort by multiple columns:
>>> df.sort_values(by=['col1', 'col2'])
   col1  col2  col3 col4
1     A     1     1    B
0     A     2     0    a
2     B     9     9    c
5     C     4     3    F
4     D     7     2    e
3  <NA>     8     4    D
<BLANKLINE>
[6 rows x 4 columns]
Sort Descending:
>>> df.sort_values(by='col1', ascending=False)
   col1  col2  col3 col4
4     D     7     2    e
5     C     4     3    F
2     B     9     9    c
0     A     2     0    a
1     A     1     1    B
3  <NA>     8     4    D
<BLANKLINE>
[6 rows x 4 columns]
Putting NAs first:
>>> df.sort_values(by='col1', ascending=False, na_position='first')
   col1  col2  col3 col4
3  <NA>     8     4    D
4     D     7     2    e
5     C     4     3    F
2     B     9     9    c
0     A     2     0    a
1     A     1     1    B
<BLANKLINE>
[6 rows x 4 columns]
| Parameters | |
|---|---|
| Name | Description | 
| by | str or Sequence[str]Name or list of names to sort by. | 
| ascending | bool or Sequence[bool], default TrueSort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the by. | 
| kind | str, default 'quicksort'Choice of sorting algorithm. Accepts 'quicksort', 'mergesort', 'heapsort', 'stable'. Ignored except when determining whether to sort stably. 'mergesort' or 'stable' will result in stable reorder. | 
| na_position | {'first', 'last'}, default 
 | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame with sorted values. | 
stack
stack(level: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]] = -1)Stack the prescribed level(s) from columns to index.
Return a reshaped DataFrame or Series having a multi-level index with one or more new inner-most levels compared to the current DataFrame. The new inner-most levels are created by pivoting the columns of the current dataframe:
- if the columns have a single level, the output is a Series;
- if the columns have multiple levels, the new index level(s) is (are) taken from the prescribed level(s) and the output is a DataFrame.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'A': [1, 3], 'B': [2, 4]}, index=['foo', 'bar'])
>>> df
        A   B
foo     1   2
bar     3   4
<BLANKLINE>
[2 rows x 2 columns]
>>> df.stack()
foo  A    1
     B    2
bar  A    3
     B    4
dtype: Int64
| Parameter | |
|---|---|
| Name | Description | 
| level | int, str, or list of these, default -1 (last level)Level(s) to stack from the column axis onto the index axis. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame or Series | Stacked dataframe or series. | 
std
std(
    axis: typing.Union[str, int] = 0, *, numeric_only: bool = False
) -> bigframes.series.SeriesReturn sample standard deviation over columns.
Normalized by N-1 by default.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 2, 3, 4, 5],
...                     "B": [3, 4, 3, 2, 1],
...                     "C": [2, 2, 3, 2, 2]})
>>> df
    A       B       C
0   1       3       2
1   2       4       2
2   3       3       3
3   4       2       2
4   5       1       2
<BLANKLINE>
[5 rows x 3 columns]
Calculating the standard deviation of each column:
>>> df.std()
A    1.581139
B    1.140175
C    0.447214
dtype: Float64
| Parameter | |
|---|---|
| Name | Description | 
| numeric_only | bool. default FalseDefault False. Include only float, int, boolean columns. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.series.Series | Series with sample standard deviation. | 
sub
sub(
    other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
    axis: str | int = "columns",
) -> bigframes.dataframe.DataFrameGet subtraction of DataFrame and other, element-wise (binary operator -).
Equivalent to dataframe - other. With reverse version, rsub.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [4, 5, 6],
...     })
You can use method name:
>>> df['A'].sub(df['B'])
0    -3
1    -3
2    -3
dtype: Int64
You can also use arithmetic operator -:
>>> df['A'] - (df['B'])
0    -3
1    -3
2    -3
dtype: Int64
| Parameters | |
|---|---|
| Name | Description | 
| other | float, int, or SeriesAny single or multiple element data structure, or list-like object. | 
| axis | {0 or 'index', 1 or 'columns'}Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame result of the arithmetic operation. | 
subtract
subtract(
    other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
    axis: str | int = "columns",
) -> bigframes.dataframe.DataFrameGet subtraction of DataFrame and other, element-wise (binary operator -).
Equivalent to dataframe - other. With reverse version, rsub.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [4, 5, 6],
...     })
You can use method name:
>>> df['A'].sub(df['B'])
0    -3
1    -3
2    -3
dtype: Int64
You can also use arithmetic operator -:
>>> df['A'] - (df['B'])
0    -3
1    -3
2    -3
dtype: Int64
| Parameters | |
|---|---|
| Name | Description | 
| other | float, int, or SeriesAny single or multiple element data structure, or list-like object. | 
| axis | {0 or 'index', 1 or 'columns'}Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame result of the arithmetic operation. | 
sum
sum(
    axis: typing.Union[str, int] = 0, *, numeric_only: bool = False
) -> bigframes.series.SeriesReturn the sum of the values over the requested axis.
This is equivalent to the method numpy.sum.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 3], "B": [2, 4]})
>>> df
    A       B
0   1       2
1   3       4
<BLANKLINE>
[2 rows x 2 columns]
Calculating the sum of each column (the default behavior without an explicit axis parameter).
>>> df.sum()
A    4
B    6
dtype: Int64
Calculating the sum of each row.
>>> df.sum(axis=1)
0    3
1    7
dtype: Int64
| Parameters | |
|---|---|
| Name | Description | 
| axis | {index (0), columns (1)}Axis for the function to be applied on. For Series this parameter is unused and defaults to 0. | 
| numeric_only | bool. default FalseDefault False. Include only float, int, boolean columns. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.series.Series | Series with the sum of values. | 
swaplevel
swaplevel(i: int = -2, j: int = -1, axis: int | str = 0)Swap levels i and j in a MultiIndex.
Default is to swap the two innermost levels of the index.
| Parameters | |
|---|---|
| Name | Description | 
| i | int or strLevels of the indices to be swapped. Can pass level name as string. | 
| j | int or strLevels of the indices to be swapped. Can pass level name as string. | 
| axis | {0 or 'index', 1 or 'columns'}, default 0The axis to swap levels on. 0 or 'index' for row-wise, 1 or 'columns' for column-wise. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame with levels swapped in MultiIndex. | 
tail
tail(n: int = 5) -> bigframes.dataframe.DataFrameReturn the last n rows.
This function returns last n rows from the object based on
position. It is useful for quickly verifying data, for example,
after sorting or appending rows.
For negative values of n, this function returns all rows except
the first |n| rows, equivalent to df[|n|:].
If n is larger than the number of rows, this function returns all rows.
| Parameter | |
|---|---|
| Name | Description | 
| n | int, default 5Number of rows to select. | 
to_csv
to_csv(
    path_or_buf: str, sep=",", *, header: bool = True, index: bool = True
) -> NoneWrite object to a comma-separated values (csv) file on Cloud Storage.
| Parameters | |
|---|---|
| Name | Description | 
| path_or_buf | strA destination URI of Cloud Storage files(s) to store the extracted dataframe in format of  | 
| index | bool, default TrueIf True, write row names (index). | 
| Returns | |
|---|---|
| Type | Description | 
| None | String output not yet supported. | 
to_dict
to_dict(orient: typing.Literal['dict', 'list', 'series', 'split', 'tight', 'records', 'index'] = 'dict', into: type[dict] = <class 'dict'>, **kwargs) -> dict | list[dict]Convert the DataFrame to a dictionary.
The type of the key-value pairs can be customized with the parameters (see below).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.to_dict()
{'col1': {0: 1, 1: 2}, 'col2': {0: 3, 1: 4}}
You can specify the return orientation.
>>> df.to_dict('series')
{'col1': 0    1
1    2
Name: col1, dtype: Int64,
'col2': 0    3
1    4
Name: col2, dtype: Int64}
>>> df.to_dict('split')
{'index': [0, 1], 'columns': ['col1', 'col2'], 'data': [[1, 3], [2, 4]]}
>>> df.to_dict("tight")
{'index': [0, 1],
'columns': ['col1', 'col2'],
'data': [[1, 3], [2, 4]],
'index_names': [None],
'column_names': [None]}
| Parameters | |
|---|---|
| Name | Description | 
| orient | str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}Determines the type of the values of the dictionary. 'dict' (default) : dict like {column -> {index -> value}}. 'list' : dict like {column -> [values]}. 'series' : dict like {column -> Series(values)}. split' : dict like {'index' -> [index], 'columns' -> [columns], 'data' -> [values]}. 'tight' : dict like {'index' -> [index], 'columns' -> [columns], 'data' -> [values], 'index_names' -> [index.names], 'column_names' -> [column.names]}. 'records' : list like [{column -> value}, ... , {column -> value}]. 'index' : dict like {index -> {column -> value}}. | 
| into | class, default dictThe collections.abc.Mapping subclass used for all Mappings in the return value. Can be the actual class or an empty instance of the mapping type you want. If you want a collections.defaultdict, you must pass it initialized. | 
| index | bool, default TrueWhether to include the index item (and index_names item if  | 
| Returns | |
|---|---|
| Type | Description | 
| dict or list of dict | Return a collections.abc.Mapping object representing the DataFrame. The resulting transformation depends on the orientparameter. | 
to_excel
to_excel(excel_writer, sheet_name: str = "Sheet1", **kwargs) -> NoneWrite DataFrame to an Excel sheet.
To write a single DataFrame to an Excel .xlsx file it is only necessary to
specify a target file name. To write to multiple sheets it is necessary to
create an ExcelWriter object with a target file name, and specify a sheet
in the file to write to.
Multiple sheets may be written to by specifying unique sheet_name.
With all data written to the file it is necessary to save the changes.
Note that creating an ExcelWriter object with a file name that already
exists will result in the contents of the existing file being erased.
Examples:
>>> import bigframes.pandas as bpd
>>> import tempfile
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.to_excel(tempfile.TemporaryFile())
| Parameters | |
|---|---|
| Name | Description | 
| excel_writer | path-like, file-like, or ExcelWriter objectFile path or existing ExcelWriter. | 
| sheet_name | str, default 'Sheet1'Name of sheet which will contain DataFrame. | 
to_gbq
to_gbq(
    destination_table: typing.Optional[str] = None,
    *,
    if_exists: typing.Optional[typing.Literal["fail", "replace", "append"]] = None,
    index: bool = True,
    ordering_id: typing.Optional[str] = None,
    clustering_columns: typing.Union[
        pandas.core.indexes.base.Index, typing.Iterable[typing.Hashable]
    ] = ()
) -> strWrite a DataFrame to a BigQuery table.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Write a DataFrame to a BigQuery table.
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> # destination_table = PROJECT_ID + "." + DATASET_ID + "." + TABLE_NAME
>>> df.to_gbq("bigframes-dev.birds.test-numbers", if_exists="replace")
'bigframes-dev.birds.test-numbers'
Write a DataFrame to a temporary BigQuery table in the anonymous dataset.
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> destination = df.to_gbq(ordering_id="ordering_id")
>>> # The table created can be read outside of the current session.
>>> bpd.close_session()  # Optional, to demonstrate a new session.
>>> bpd.read_gbq(destination, index_col="ordering_id")
             col1  col2
ordering_id
0               1     3
1               2     4
<BLANKLINE>
[2 rows x 2 columns]
Write a DataFrame to a BigQuery table with clustering columns:
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4], 'col3': [5, 6]})
>>> clustering_cols = ['col1', 'col3']
>>> df.to_gbq(
...             "bigframes-dev.birds.test-clusters",
...             if_exists="replace",
...             clustering_columns=clustering_cols,
...           )
'bigframes-dev.birds.test-clusters'
| Parameters | |
|---|---|
| Name | Description | 
| destination_table | Optional[str]Name of table to be written, in the form  | 
| if_exists | Optional[str]Behavior when the destination table exists. When  | 
| index | bool. default Truewhether write row names (index) or not. | 
| ordering_id | Optional[str], default NoneIf set, write the ordering of the DataFrame as a column in the result table with this name. | 
| clustering_columns | Union[pd.Index, Iterable[Hashable]], default ()Specifies the columns for clustering in the BigQuery table. The order of columns in this list is significant for clustering hierarchy. Index columns may be included in clustering if the  | 
| Returns | |
|---|---|
| Type | Description | 
| str | The fully-qualified ID for the written table, in the form project.dataset.tablename. | 
to_html
to_html(
    buf=None,
    columns: typing.Optional[typing.Sequence[str]] = None,
    col_space=None,
    header: bool = True,
    index: bool = True,
    na_rep: str = "NaN",
    formatters=None,
    float_format=None,
    sparsify: bool | None = None,
    index_names: bool = True,
    justify: str | None = None,
    max_rows: int | None = None,
    max_cols: int | None = None,
    show_dimensions: bool = False,
    decimal: str = ".",
    bold_rows: bool = True,
    classes: str | list | tuple | None = None,
    escape: bool = True,
    notebook: bool = False,
    border: int | None = None,
    table_id: str | None = None,
    render_links: bool = False,
    encoding: str | None = None,
) -> strRender a DataFrame as an HTML table.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> print(df.to_html())
<table border="1" class="dataframe">
<thead>
    <tr style="text-align: right;">
    <th></th>
    <th>col1</th>
    <th>col2</th>
    </tr>
</thead>
<tbody>
    <tr>
    <th>0</th>
    <td>1</td>
    <td>3</td>
    </tr>
    <tr>
    <th>1</th>
    <td>2</td>
    <td>4</td>
    </tr>
</tbody>
</table>
| Parameters | |
|---|---|
| Name | Description | 
| buf | str, Path or StringIO-like, optional, default NoneBuffer to write to. If None, the output is returned as a string. | 
| columns | sequence, optional, default NoneThe subset of columns to write. Writes all columns by default. | 
| col_space | str or int, list or dict of int or str, optionalThe minimum width of each column in CSS length units. An int is assumed to be px units. | 
| header | bool, optionalWhether to print column labels, default True. | 
| index | bool, optional, default TrueWhether to print index (row) labels. | 
| na_rep | str, optional, default 'NaN'String representation of NAN to use. | 
| formatters | list, tuple or dict of one-param. functions, optionalFormatter functions to apply to columns' elements by position or name. The result of each function must be a unicode string. List/tuple must be of length equal to the number of columns. | 
| float_format | one-parameter function, optional, default NoneFormatter function to apply to columns' elements if they are floats. This function must return a unicode string and will be applied only to the non-NaN elements, with NaN being handled by na_rep. | 
| sparsify | bool, optional, default TrueSet to False for a DataFrame with a hierarchical index to print every multiindex key at each row. | 
| index_names | bool, optional, default TruePrints the names of the indexes. | 
| justify | str, default NoneHow to justify the column labels. If None uses the option from the print configuration (controlled by set_option), 'right' out of the box. Valid values are, 'left', 'right', 'center', 'justify', 'justify-all', 'start', 'end', 'inherit', 'match-parent', 'initial', 'unset'. | 
| max_rows | int, optionalMaximum number of rows to display in the console. | 
| max_cols | int, optionalMaximum number of columns to display in the console. | 
| show_dimensions | bool, default FalseDisplay DataFrame dimensions (number of rows by number of columns). | 
| decimal | str, default '.'Character recognized as decimal separator, e.g. ',' in Europe. | 
| bold_rows | bool, default TrueMake the row labels bold in the output. | 
| classes | str or list or tuple, default NoneCSS class(es) to apply to the resulting html table. | 
| escape | bool, default TrueConvert the characters <, >, and & to HTML-safe sequences. | 
| notebook | bool, default FalseWhether the generated HTML is for IPython Notebook. | 
| border | intA border=border attribute is included in the openingtag. Default pd.options.display.html.border. 
 | 
| table_id | str, optionalA css id is included in the openingtag if specified. 
 | 
| render_links | bool, default FalseConvert URLs to HTML links. | 
| encoding | str, default "utf-8"Set character encoding. | 
| Returns | |
|---|---|
| Type | Description | 
| str or None | If buf is None, returns the result as a string. Otherwise returns None. | 
to_json
to_json(
    path_or_buf: str,
    orient: typing.Literal[
        "split", "records", "index", "columns", "values", "table"
    ] = "columns",
    *,
    lines: bool = False,
    index: bool = True
) -> NoneConvert the object to a JSON string, written to Cloud Storage.
Note NaN's and None will be converted to null and datetime objects will be converted to UNIX timestamps.
| Parameters | |
|---|---|
| Name | Description | 
| path_or_buf | strA destination URI of Cloud Storage files(s) to store the extracted dataframe in format of  | 
| orient | {Indication of expected JSON string format. * Series: - default is 'index' - allowed values are: {{'split', 'records', 'index', 'table'}}. * DataFrame: - default is 'columns' - allowed values are: {{'split', 'records', 'index', 'columns', 'values', 'table'}}. * The format of the JSON string: - 'split' : dict like {{'index' -> [index], 'columns' -> [columns], 'data' -> [values]}} - 'records' : list like [{{column -> value}}, ... , {{column -> value}}] - 'index' : dict like {{index -> {{column -> value}}}} - 'columns' : dict like {{column -> {{index -> value}}}} - 'values' : just the values array - 'table' : dict like {{'schema': {{schema}}, 'data': {{data}}}} Describing the data, where data component is like  | 
| index | bool, default TrueIf True, write row names (index). | 
| lines | bool, default FalseIf 'orient' is 'records' write out line-delimited json format. Will throw ValueError if incorrect 'orient' since others are not list-like. | 
| Returns | |
|---|---|
| Type | Description | 
| None | String output not yet supported. | 
to_latex
to_latex(
    buf=None,
    columns: typing.Optional[typing.Sequence] = None,
    header: typing.Union[bool, typing.Sequence[str]] = True,
    index: bool = True,
    **kwargs
) -> str | NoneRender object to a LaTeX tabular, longtable, or nested table.
Requires \usepackage{{booktabs}}.  The output can be copy/pasted
into a main LaTeX document or read from an external file
with \input{{table.tex}}.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> print(df.to_latex())
\begin{tabular}{lrr}
\toprule
& col1 & col2 \\
\midrule
0 & 1 & 3 \\
1 & 2 & 4 \\
\bottomrule
\end{tabular}
<BLANKLINE>
| Parameters | |
|---|---|
| Name | Description | 
| buf | str, Path or StringIO-like, optional, default NoneBuffer to write to. If None, the output is returned as a string. | 
| columns | list of label, optionalThe subset of columns to write. Writes all columns by default. | 
| header | bool or list of str, default TrueWrite out the column names. If a list of strings is given, it is assumed to be aliases for the column names. | 
| index | bool, default TrueWrite row names (index). | 
to_markdown
to_markdown(buf=None, mode: str = "wt", index: bool = True, **kwargs) -> str | NonePrint DataFrame in Markdown-friendly format.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> print(df.to_markdown())
|    |   col1 |   col2 |
|---:|-------:|-------:|
|  0 |      1 |      3 |
|  1 |      2 |      4 |
| Parameters | |
|---|---|
| Name | Description | 
| buf | str, Path or StringIO-like, optional, default NoneBuffer to write to. If None, the output is returned as a string. | 
| mode | str, optionalMode in which file is opened. | 
| index | bool, optional, default TrueAdd index (row) labels. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame in Markdown-friendly format. | 
to_numpy
to_numpy(dtype=None, copy=False, na_value=None, **kwargs) -> numpy.ndarrayConvert the DataFrame to a NumPy array.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.to_numpy()
array([[1, 3],
       [2, 4]], dtype=object)
| Parameters | |
|---|---|
| Name | Description | 
| dtype | NoneThe dtype to pass to  | 
| copy | bool, default NoneWhether to ensure that the returned value is not a view on another array. | 
| na_value | Any, default NoneThe value to use for missing values. The default value depends on dtype and the dtypes of the DataFrame columns. | 
| Returns | |
|---|---|
| Type | Description | 
| numpy.ndarray | The converted NumPy array. | 
to_orc
to_orc(path=None, **kwargs) -> bytes | NoneWrite a DataFrame to the ORC format.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> import tempfile
>>> df.to_orc(tempfile.TemporaryFile())
| Parameter | |
|---|---|
| Name | Description | 
| path | str, file-like object or None, default NoneIf a string, it will be used as Root Directory path when writing a partitioned dataset. By file-like object, we refer to objects with a write() method, such as a file handle (e.g. via builtin open function). If path is None, a bytes object is returned. | 
to_pandas
to_pandas(
    max_download_size: typing.Optional[int] = None,
    sampling_method: typing.Optional[str] = None,
    random_state: typing.Optional[int] = None,
    *,
    ordered: bool = True
) -> pandas.core.frame.DataFrameWrite DataFrame to pandas DataFrame.
| Parameters | |
|---|---|
| Name | Description | 
| max_download_size | int, default NoneDownload size threshold in MB. If max_download_size is exceeded when downloading data (e.g., to_pandas()), the data will be downsampled if bigframes.options.sampling.enable_downsampling is True, otherwise, an error will be raised. If set to a value other than None, this will supersede the global config. | 
| sampling_method | str, default NoneDownsampling algorithms to be chosen from, the choices are: "head": This algorithm returns a portion of the data from the beginning. It is fast and requires minimal computations to perform the downsampling; "uniform": This algorithm returns uniform random samples of the data. If set to a value other than None, this will supersede the global config. | 
| random_state | int, default NoneThe seed for the uniform downsampling algorithm. If provided, the uniform method may take longer to execute and require more computation. If set to a value other than None, this will supersede the global config. | 
| ordered | bool, default TrueDetermines whether the resulting pandas dataframe will be deterministically ordered. In some cases, unordered may result in a faster-executing query. | 
| Returns | |
|---|---|
| Type | Description | 
| pandas.DataFrame | A pandas DataFrame with all rows and columns of this DataFrame if the data_sampling_threshold_mb is not exceeded; otherwise, a pandas DataFrame with downsampled rows and all columns of this DataFrame. | 
to_pandas_batches
to_pandas_batches() -> typing.Iterable[pandas.core.frame.DataFrame]Stream DataFrame results to an iterable of pandas DataFrame
to_parquet
to_parquet(
    path: str,
    *,
    compression: typing.Optional[typing.Literal["snappy", "gzip"]] = "snappy",
    index: bool = True
) -> NoneWrite a DataFrame to the binary Parquet format.
This function writes the dataframe as a parquet file
<https://parquet.apache.org/>_ to Cloud Storage.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> gcs_bucket = "gs://bigframes-dev-testing/sample_parquet*.parquet"
>>> df.to_parquet(path=gcs_bucket)
| Parameters | |
|---|---|
| Name | Description | 
| path | strDestination URI(s) of Cloud Storage files(s) to store the extracted dataframe in format of  | 
| compression | str, default 'snappy'Name of the compression to use. Use  | 
| index | bool, default TrueIf  | 
to_pickle
to_pickle(path, **kwargs) -> NonePickle (serialize) object to file.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> gcs_bucket = "gs://bigframes-dev-testing/sample_pickle_gcs.pkl"
>>> df.to_pickle(path=gcs_bucket)
| Parameter | |
|---|---|
| Name | Description | 
| path | strFile path where the pickled object will be stored. | 
to_records
to_records(
    index: bool = True, column_dtypes=None, index_dtypes=None
) -> numpy.recarrayConvert DataFrame to a NumPy record array.
Index will be included as the first field of the record array if requested.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.to_records()
rec.array([(0, 1, 3), (1, 2, 4)],
          dtype=[('index', '<i8'), ('col1', '<i8'), ('col2', '<i8')])
| Parameters | |
|---|---|
| Name | Description | 
| index | bool, default TrueInclude index in resulting record array, stored in 'index' field or using the index label, if set. | 
| column_dtypes | str, type, dict, default NoneIf a string or type, the data type to store all columns. If a dictionary, a mapping of column names and indices (zero-indexed) to specific data types. | 
| index_dtypes | str, type, dict, default NoneIf a string or type, the data type to store all index levels. If a dictionary, a mapping of index level names and indices (zero-indexed) to specific data types. This mapping is applied only if  | 
| Returns | |
|---|---|
| Type | Description | 
| np.recarray | NumPy ndarray with the DataFrame labels as fields and each row of the DataFrame as entries. | 
to_string
to_string(
    buf=None,
    columns: typing.Optional[typing.Sequence[str]] = None,
    col_space=None,
    header: typing.Union[bool, typing.Sequence[str]] = True,
    index: bool = True,
    na_rep: str = "NaN",
    formatters=None,
    float_format=None,
    sparsify: bool | None = None,
    index_names: bool = True,
    justify: str | None = None,
    max_rows: int | None = None,
    max_cols: int | None = None,
    show_dimensions: bool = False,
    decimal: str = ".",
    line_width: int | None = None,
    min_rows: int | None = None,
    max_colwidth: int | None = None,
    encoding: str | None = None,
) -> str | NoneRender a DataFrame to a console-friendly tabular output.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> print(df.to_string())
   col1  col2
0     1     3
1     2     4
| Parameters | |
|---|---|
| Name | Description | 
| buf | str, Path or StringIO-like, optional, default NoneBuffer to write to. If None, the output is returned as a string. | 
| columns | sequence, optional, default NoneThe subset of columns to write. Writes all columns by default. | 
| col_space | int, list or dict of int, optionalThe minimum width of each column. | 
| header | bool or sequence, optionalWrite out the column names. If a list of strings is given, it is assumed to be aliases for the column names. | 
| index | bool, optional, default TrueWhether to print index (row) labels. | 
| na_rep | str, optional, default 'NaN'String representation of NAN to use. | 
| formatters | list, tuple or dict of one-param. functions, optionalFormatter functions to apply to columns' elements by position or name. The result of each function must be a unicode string. List/tuple must be of length equal to the number of columns. | 
| float_format | one-parameter function, optional, default NoneFormatter function to apply to columns' elements if they are floats. The result of this function must be a unicode string. | 
| sparsify | bool, optional, default TrueSet to False for a DataFrame with a hierarchical index to print every multiindex key at each row. | 
| index_names | bool, optional, default TruePrints the names of the indexes. | 
| justify | str, default NoneHow to justify the column labels. If None uses the option from the print configuration (controlled by set_option), 'right' out of the box. Valid values are, 'left', 'right', 'center', 'justify', 'justify-all', 'start', 'end', 'inherit', 'match-parent', 'initial', 'unset'. | 
| max_rows | int, optionalMaximum number of rows to display in the console. | 
| min_rows | int, optionalThe number of rows to display in the console in a truncated repr (when number of rows is above  | 
| max_cols | int, optionalMaximum number of columns to display in the console. | 
| show_dimensions | bool, default FalseDisplay DataFrame dimensions (number of rows by number of columns). | 
| decimal | str, default '.'Character recognized as decimal separator, e.g. ',' in Europe. | 
| line_width | int, optionalWidth to wrap a line in characters. | 
| max_colwidth | int, optionalMax width to truncate each column in characters. By default, no limit. | 
| encoding | str, default "utf-8"Set character encoding. | 
| Returns | |
|---|---|
| Type | Description | 
| str or None | If buf is None, returns the result as a string. Otherwise returns None. | 
transpose
transpose() -> bigframes.dataframe.DataFrameTranspose index and columns.
Reflect the DataFrame over its main diagonal by writing rows as columns
and vice-versa. The property .T is an accessor to the method
transpose.
All columns must be the same dtype (numerics can be coerced to a common supertype).
Examples:
**Square DataFrame with homogeneous dtype**
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> d1 = {'col1': [1, 2], 'col2': [3, 4]}
>>> df1 = bpd.DataFrame(data=d1)
>>> df1
   col1  col2
0     1     3
1     2     4
<BLANKLINE>
[2 rows x 2 columns]
>>> df1_transposed = df1.T  # or df1.transpose()
>>> df1_transposed
      0  1
col1  1  2
col2  3  4
<BLANKLINE>
[2 rows x 2 columns]
When the dtype is homogeneous in the original DataFrame, we get a
transposed DataFrame with the same dtype:
>>> df1.dtypes
col1    Int64
col2    Int64
dtype: object
>>> df1_transposed.dtypes
0    Int64
1    Int64
dtype: object
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | The transposed DataFrame. | 
truediv
truediv(
    other: float | int | bigframes.series.Series | bigframes.dataframe.DataFrame,
    axis: str | int = "columns",
) -> bigframes.dataframe.DataFrameGet floating division of DataFrame and other, element-wise (binary operator /).
Equivalent to dataframe / other. With reverse version, rtruediv.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, *, /, //, %, **.
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [4, 5, 6],
...     })
You can use method name:
>>> df['A'].truediv(df['B'])
0    0.25
1     0.4
2     0.5
dtype: Float64
You can also use arithmetic operator /:
>>> df['A'] / (df['B'])
0    0.25
1     0.4
2     0.5
dtype: Float64
| Parameters | |
|---|---|
| Name | Description | 
| other | float, int, or SeriesAny single or multiple element data structure, or list-like object. | 
| axis | {0 or 'index', 1 or 'columns'}Whether to compare by the index (0 or 'index') or columns. (1 or 'columns'). For Series input, axis to match Series index on. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame | DataFrame result of the arithmetic operation. | 
unstack
unstack(
    level: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]] = -1
)Pivot a level of the (necessarily hierarchical) index labels.
Returns a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels.
If the index is not a MultiIndex, the output will be a Series (the analogue of stack when the columns are not a MultiIndex).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'A': [1, 3], 'B': [2, 4]}, index=['foo', 'bar'])
>>> df
        A   B
foo     1   2
bar     3   4
<BLANKLINE>
[2 rows x 2 columns]
>>> df.unstack()
A   foo    1
    bar    3
B   foo    2
    bar    4
dtype: Int64
| Parameter | |
|---|---|
| Name | Description | 
| level | int, str, or list of these, default -1 (last level)Level(s) of index to unstack, can pass level name. | 
| Returns | |
|---|---|
| Type | Description | 
| DataFrame or Series | DataFrame or Series. | 
update
update(other, join: str = "left", overwrite=True, filter_func=None)Modify in place using non-NA values from another DataFrame.
Aligns on indices. There is no return value.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'A': [1, 2, 3],
...                    'B': [400, 500, 600]})
>>> new_df = bpd.DataFrame({'B': [4, 5, 6],
...                        'C': [7, 8, 9]})
>>> df.update(new_df)
>>> df
   A  B
0  1  4
1  2  5
2  3  6
<BLANKLINE>
[3 rows x 2 columns]
| Parameters | |
|---|---|
| Name | Description | 
| other | DataFrame, or object coercible into a DataFrameShould have at least one matching index/column label with the original DataFrame. If a Series is passed, its name attribute must be set, and that will be used as the column name to align with the original DataFrame. | 
| join | {'left'}, default 'left'Only left join is implemented, keeping the index and columns of the original object. | 
| overwrite | bool, default TrueHow to handle non-NA values for overlapping keys: True: overwrite original DataFrame's values with values from  | 
| filter_func | callable(1d-array) -> bool 1d-array, optionalCan choose to replace values other than NA. Return True for values that should be updated. | 
| Returns | |
|---|---|
| Type | Description | 
| None | This method directly changes calling object. | 
value_counts
value_counts(
    subset: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]] = None,
    normalize: bool = False,
    sort: bool = True,
    ascending: bool = False,
    dropna: bool = True,
)Return a Series containing counts of unique rows in the DataFrame.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'num_legs': [2, 4, 4, 6, 7],
...                     'num_wings': [2, 0, 0, 0, bpd.NA]},
...                    index=['falcon', 'dog', 'cat', 'ant', 'octopus'],
...                    dtype='Int64')
>>> df
         num_legs  num_wings
falcon          2          2
dog             4          0
cat             4          0
ant             6          0
octopus         7       <NA>
<BLANKLINE>
[5 rows x 2 columns]
value_counts sorts the result by counts in a descending order by default:
>>> df.value_counts()
num_legs  num_wings
4         0          2
2         2          1
6         0          1
Name: count, dtype: Int64
You can normalize the counts to return relative frequencies by setting normalize=True:
>>> df.value_counts(normalize=True)
num_legs  num_wings
4         0             0.5
2         2            0.25
6         0            0.25
Name: proportion, dtype: Float64
You can get the rows in the ascending order of the counts by setting ascending=True:
>>> df.value_counts(ascending=True)
num_legs  num_wings
2         2          1
6         0          1
4         0          2
Name: count, dtype: Int64
You can include the counts of the rows with NA values by setting dropna=False:
>>> df.value_counts(dropna=False)
num_legs  num_wings
4         0            2
2         2            1
6         0            1
7         <NA>         1
Name: count, dtype: Int64
| Parameters | |
|---|---|
| Name | Description | 
| subset | label or list of labels, optionalColumns to use when counting unique combinations. | 
| normalize | bool, default FalseReturn proportions rather than frequencies. | 
| sort | bool, default TrueSort by frequencies. | 
| ascending | bool, default FalseSort in ascending order. | 
| dropna | bool, default TrueDon’t include counts of rows that contain NA values. | 
| Returns | |
|---|---|
| Type | Description | 
| Series | Series containing counts of unique rows in the DataFrame | 
var
var(
    axis: typing.Union[str, int] = 0, *, numeric_only: bool = False
) -> bigframes.series.SeriesReturn unbiased variance over requested axis.
Normalized by N-1 by default.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({"A": [1, 3], "B": [2, 4]})
>>> df
    A       B
0   1       2
1   3       4
<BLANKLINE>
[2 rows x 2 columns]
Calculating the variance of each column (the default behavior without an explicit axis parameter).
>>> df.var()
A    2.0
B    2.0
dtype: Float64
Calculating the variance of each row.
>>> df.var(axis=1)
0    0.5
1    0.5
dtype: Float64
| Parameters | |
|---|---|
| Name | Description | 
| axis | {index (0), columns (1)}Axis for the function to be applied on. For Series this parameter is unused and defaults to 0. | 
| numeric_only | bool. default FalseDefault False. Include only float, int, boolean columns. | 
| Returns | |
|---|---|
| Type | Description | 
| bigframes.series.Series | Series with unbiased variance over requested axis. |