How to Drop Row in Polars - Python

Polars is a fast and efficient DataFrame library designed for handling large datasets in Python. While Pandas is the go-to for many, Polars is gaining traction due to its performance advantages, especially with larger datasets. If we're transitioning from Pandas or exploring Polars for our data manipulation tasks, understanding how to drop rows from a DataFrame is essential. In this article, we'll explore different methods to drop rows in a Polars DataFrame.

Installing Polars

First, make sure that Polars is installed. If not, we can install it using pip:

pip install polars

Creating a Polars DataFrame

Let’s start by creating a simple DataFrame in Polars:

Python

import polars as pl

# Creating a sample DataFrame
df = pl.DataFrame({
    "Name": ["Amit", "Raj", "Sita", "Pooja"],
    "Age": [25, 30, 35, 40],
    "City": ["Mumbai", "Delhi", "Kolkata", "Chennai"]
})

print("Original DataFrame with Indian Hindi Names:")
print(df)

Output:


shape: (4, 3)
┌───────┬─────┬─────────┐
│ Name  ┆ Age ┆ City    │
│ ---   ┆ --- ┆ ---     │
│ str   ┆ i64 ┆ str     │
╞═══════╪═════╪═════════╡
│ Amit  ┆ 25  ┆ Mumbai  │
│ Raj   ┆ 30  ┆ Delhi   │
│ Sita  ┆ 35  ┆ Kolkata │
│ Pooja ┆ 40  ┆ Chennai │
└───────┴─────┴─────────┘

1. Dropping Rows Based on Condition

To drop rows, we can use the filter method in Polars. This method allows us to keep only the rows that meet a certain condition. For example, if we want to drop rows where the age is greater than 30:

Python

# ...
# Dropping rows where Age > 30
df_filtered = df.filter(pl.col("Age") <= 30)

print("\nDataFrame after Dropping Rows where Age > 30:")
print(df_filtered)

Output:

DataFrame after Dropping Rows where Age > 30:
shape: (2, 3)
┌──────┬─────┬────────────┐
│ Name ┆ Age ┆ City       │
│ ---  ┆ --- ┆ ---        │
│ str  ┆ i64 ┆ str        │
╞══════╪═════╪════════════╡
│ Amit ┆ 25  ┆ Mumbai     │
│ Raj  ┆ 30  ┆ Delhi      │
└──────┴─────┴────────────┘

2. Dropping Rows by Index

If we want to drop specific rows by their index, Polars doesn't have a direct method to drop by index, but we can work around it by creating a mask:

Python

# Dropping the row at index 2 (Sita)
indexes_to_drop = [2]  # This is the index list to drop
df_dropped = df.filter(~pl.Series(range(len(df))).is_in(indexes_to_drop))

print("\nDataFrame after Dropping Row at Index 2:")
print(df_dropped)

Output:

DataFrame after Dropping Row at Index 2:
shape: (3, 3)
┌───────┬─────┬─────────┐
│ Name  ┆ Age ┆ City    │
│ ---   ┆ --- ┆ ---     │
│ str   ┆ i64 ┆ str     │
╞═══════╪═════╪═════════╡
│ Amit  ┆ 25  ┆ Mumbai  │
│ Raj   ┆ 30  ┆ Delhi   │
│ Pooja ┆ 40  ┆ Chennai │
└───────┴─────┴─────────┘

3. Dropping Rows with Missing Values

Sometimes, rows may have missing values, and we may want to drop those rows:

Python

# ...
# Adding a row with missing values
df_with_missing = df.extend(pl.DataFrame({"Name": [None], "Age": [None], "City": [None]}))

print("\nDataFrame with Missing Values:")
print(df_with_missing)

# Dropping rows with missing values
df_no_missing = df_with_missing.drop_nulls()

print("\nDataFrame after Dropping Rows with Missing Values:")
print(df_no_missing)

Output:

DataFrame with Missing Values:
shape: (5, 3)
┌───────┬──────┬─────────┐
│ Name  ┆ Age  ┆ City    │
│ ---   ┆ ---  ┆ ---     │
│ str   ┆ i64  ┆ str     │
╞═══════╪══════╪═════════╡
│ Amit  ┆ 25   ┆ Mumbai  │
│ Raj   ┆ 30   ┆ Delhi   │
│ Sita  ┆ 35   ┆ Kolkata │
│ Pooja ┆ 40   ┆ Chennai │
│ null  ┆ null ┆ null    │
└───────┴──────┴─────────┘

DataFrame after Dropping Rows with Missing Values:
shape: (4, 3)
┌───────┬─────┬─────────┐
│ Name  ┆ Age ┆ City    │
│ ---   ┆ --- ┆ ---     │
│ str   ┆ i64 ┆ str     │
╞═══════╪═════╪═════════╡
│ Amit  ┆ 25  ┆ Mumbai  │
│ Raj   ┆ 30  ┆ Delhi   │
│ Sita  ┆ 35  ┆ Kolkata │
│ Pooja ┆ 40  ┆ Chennai │
└───────┴─────┴─────────┘

4. Dropping Duplicate Rows

To drop duplicate rows in Polars, we can use the unique method. This method returns a DataFrame with duplicate rows removed based on the columns we specify:

Python

# Create a DataFrame with duplicate rows
df_with_duplicates = pl.DataFrame({
    "Name": ["Alice", "Bob", "Charlie", "Bob", "Alice"],
    "Age": [24, 19, 34, 19, 24],
    "City": ["New York", "Los Angeles", "Chicago", "Los Angeles", "New York"]
})

# Drop duplicate rows
df_no_duplicates = df_with_duplicates.unique()

print(df_no_duplicates)

Output:

shape: (3, 3)
┌─────────┬─────┬─────────────┐
│ Name    │ Age │ City        │
│ ---     │ --- │ ---         │
│ str     │ i64 │ str         │
├─────────┼─────┼─────────────┤
│ Alice   │ 24  │ New York    │
│ Bob     │ 19  │ Los Angeles │
│ Charlie │ 34  │ Chicago     │
└─────────┴─────┴─────────────┘

Conclusion

Polars is a powerful tool for data manipulation in Python, offering a range of options for dropping rows based on various conditions. Whether we're dropping rows by condition, index, or due to null values or duplicates, Polars provides a straightforward and efficient way to perform these operations.

With its focus on performance and parallelism, Polars is an excellent choice for working with large datasets. As we explore Polars further, we'll discover even more advanced features that can help streamline our data manipulation tasks.

By understanding how to drop rows in Polars, we're well on our way to mastering this powerful DataFrame library and making our data processing tasks more efficient.

How to Drop Row in Polars - Python

Installing Polars

Creating a Polars DataFrame

1. Dropping Rows Based on Condition

2. Dropping Rows by Index

3. Dropping Rows with Missing Values

4. Dropping Duplicate Rows

Conclusion

Explore