Food delivery services have become an integral part of daily life, with platforms like Zomato, Swiggy, and Foodpanda leading the way. These companies generate large amounts of data that can be analyzed to offer insights. We will demonstrate a comprehensive analysis of a food delivery dataset in R Programming Language.
Project Overview
In this project, we will analyze a food delivery dataset to uncover patterns and trends within the food delivery industry. The dataset contains several key variables, including Delivery Person Age, Ratings, Order Date, Time Taken, and Weather Conditions.
We will:
- Import necessary libraries and load the dataset.
- Clean the dataset by handling missing values and formatting issues.
- Conduct exploratory data analysis (EDA) to visualize key patterns and trends.
- Perform descriptive statistics to better understand delivery times, ratings, and other key variables.
- Analyze various factors that affect food delivery performance, including weather, traffic conditions, and time of day.
By the end of this analysis, we will gain valuable insights that can help businesses optimize their delivery services and improve customer satisfaction.
Dataset Link: Food Delivery Data
1. Importing Libraries
We will begin by loading the necessary libraries for data manipulation and visualization. We installed and loaded the libraries dplyr, ggplot2, forecast, and car to handle data processing, visualization, and time series analysis.
install.packages(c("forecast","dplyr","car","ggplot2"))
library(dplyr)
library(ggplot2)
library(forecast)
library(car)
2. Loading the Dataset
Next, we will load the food delivery dataset into R using the read.csv() function. We will display the first few rows to get an overview of the data structure.
food_delivery_data <- read.csv("food_delivery_data.csv")
head(food_delivery_data)
Output:

3. Cleaning the Dataset
We will clean the dataset to ensure its accuracy and prepare it for analysis.
- We will remove any duplicate rows, handled missing values, and converted relevant columns (such as Order_Date and Time_Orderd) into appropriate formats.
- We will extract the hour from Time_Orderd to analyze peak ordering hours.
food_delivery_data <- food_delivery_data %>% distinct()
food_delivery_data <- food_delivery_data %>%
mutate(across(everything(), ~ ifelse(is.na(.), mean(., na.rm = TRUE), .)))
food_delivery_data$Order_Date <- as.Date(food_delivery_data$Order_Date, format="%d-%m-%Y")
food_delivery_data$Time_Orderd <- hms::as_hms(food_delivery_data$Time_Orderd)
food_delivery_data$order_hour <- hour(food_delivery_data$Time_Orderd)
4. Performing Exploratory Data Analysis (EDA)
EDA helps us understand the underlying characteristics of the dataset. We will visualize important patterns and distributions.
4.1. Histogram of Delivery Time
We will create a histogram of delivery times, a bar plot of delivery person ratings, a pie chart for road traffic density, and more to gain insights into the food delivery process.
ggplot(food_delivery_data, aes(x = Time_taken.min.)) +
geom_histogram(binwidth = 5, fill = "darkgreen", color = "black") +
labs(title = "Distribution of Delivery Time", x = "Delivery Time (min)",y = "Frequency")
Output:

The histogram shows the distribution of delivery times, revealing the most common delivery durations and providing insights into delivery performance.
4.2. Delivery person Ratings Distribution
We will visualize the distribution of delivery person ratings to understand the feedback given to delivery personnel. We will create a bar plot to show the frequency of various delivery person ratings.
ggplot(food_delivery_data, aes(x = Delivery_person_Ratings)) +
geom_bar(fill = "red", color = "black") +
labs(title = "Delivery Person Ratings", x = "Ratings", y = "Count")
Output:

The bar plot reveals the distribution of ratings, indicating how delivery personnel are generally rated by customers.
4.3. Orders by Road Traffic Density
Now we will visualize the Orders by Road Traffic Density by creating a a pie chart to show the proportion of orders under different road traffic conditions.
traffic_density_counts <- food_delivery_data %>%
count(Road_traffic_density) %>%
mutate(percentage = n / sum(n) * 100)
ggplot(traffic_density_counts, aes(x = "", y = percentage,
fill = Road_traffic_density)) +
geom_bar(stat = "identity", width = 1) +
coord_polar("y") +
labs(title = "Orders by Road Traffic Density", x = "", y = "") +
theme_void()
Output:

The pie chart shows how traffic density affects the number of orders, providing insights into delivery challenges during peak traffic hours.
4.4. Distribution of Multiple Deliveries
Now we will visualize the Distribution of Multiple Deliveries.
ggplot(food_delivery_data, aes(x = factor(multiple_deliveries))) +
geom_bar(fill = "darkgreen", color = "black") +
labs(title = "Distribution of Multiple Deliveries", x = "Multiple Deliveries",
y = "Count")
Output:

The bar plot shows that delivery personnel often handle multiple deliveries at a time, providing insights into delivery efficiency.
4.5. Average Delivery Person Ratings by City
Now we will visualize the Average Delivery Person Ratings by City.
rating_by_city <- food_delivery_data %>%
group_by(City) %>%
summarise(avg_rating = mean(Delivery_person_Ratings, na.rm = TRUE))
ggplot(rating_by_city, aes(x = City, y = avg_rating, fill = avg_rating)) +
geom_tile() +
scale_fill_gradient(low = "darkgreen", high = "lightgreen") +
labs(title = "Average Delivery Person Ratings by City", x = "City",
y = "Average Rating")
Output:

This heatmap visualizes the average ratings of delivery persons in different cities, highlighting cities with higher or lower average ratings.
4.6. Delivery Person Ratings by Weather Conditions
Now we will visualize the Delivery Person Ratings by Weather Conditions.
ggplot(food_delivery_data, aes(x = Weatherconditions, y = Delivery_person_Ratings)) +
geom_bar(stat = "identity", fill = "purple") +
labs(title = "Delivery Person Ratings by Weather Conditions",
x = "Weather Conditions", y = "Average Ratings")
Output:

This bar plot displays the average ratings of delivery persons under different weather conditions, indicating how weather affects performance.
4.7. Delivery Ratings by Age Group
Now we will visualize the Delivery Ratings by Age Group.
food_delivery_data <- food_delivery_data %>%
mutate(Age_Group = cut(Delivery_person_Age, breaks = c(20, 25, 30, 35, 40, 45),
labels = c("20-25", "25-30", "30-35", "35-40", "40-45")))
ggplot(food_delivery_data, aes(x = Age_Group, y = Delivery_person_Ratings)) +
geom_boxplot(fill = "Yellow") +
labs(title = "Delivery Ratings by Age Group", x = "Age Group",y="Ratings")
Output:

This box plot shows the distribution of delivery person ratings for each age group, allowing us to compare the ratings received by different age groups of delivery persons.
Conclusion
From our analysis, we:
- Identified trends in delivery time, ratings, and traffic conditions.
- Explored how weather conditions and multiple deliveries affect ratings.
- Visualized regional and age-related differences in delivery performance.
We concluded that delivery times, weather, and traffic play a significant role in customer satisfaction. By understanding these patterns, businesses can optimize routes, manage delivery times more effectively, and improve service quality, ultimately enhancing customer retention.