一个自我贷款状态评测的机器学习模型

题目

我们需要将根据一个人的收入、教育程度、工作经验、以前的贷款情况以及更多的因素来判断他/她是否可以获得贷款金额。

分析

  • 在这个贷款状态预测数据集中,我们有以前根据property Loan的属性申请贷款的申请人的数据。
  • 银行将根据申请人的收入、贷款金额、以前的信用记录、共同申请人的收入等因素来决定是否贷款给申请人。
  • 我们的目标是建立一个机器学习模型来预测申请人的贷款被批准或被拒绝。

数据名对应关系

Loan_ID:唯一的贷款ID。
Gender:男性或女性。
Married:婚姻状况。
Dependents: 依赖于客户端的人数。
Education: 申请人学历(研究生或本科)。
Self_Employed: 自雇(是/否)。
ApplicantIncome::申请人收入。
CoapplicantIncome:共同申请人收入。
LoanAmount:以千为单位的贷款金额。
Loan_Amount_Term:以月为单位的贷款期限。
Credit_History: 信用记录符合指导原则。
Property_Area: 申请人居住在城市、半城市或农村。
Loan_Status: 贷款批准(Y/N)。

一、导入包

此模型导入numpy,pandas,matplotlib,seaborn等包,进行使用

import numpy as np 
import pandas as pd 

import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings("ignore")

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

二、加载数据

df = pd.read_csv("/kaggle/input/loan-status-prediction/loan_data.csv")
df.head()
Loan_IDGenderMarriedDependentsEducationSelf_EmployedApplicantIncomeCoapplicantIncomeLoanAmountLoan_Amount_TermCredit_HistoryProperty_AreaLoan_Status
0LP001003MaleYes1GraduateNo45831508.0128.0360.01.0RuralN
1LP001005MaleYes0GraduateYes30000.066.0360.01.0UrbanY
2LP001006MaleYes0Not GraduateNo25832358.0120.0360.01.0UrbanY
3LP001008MaleNo0GraduateNo60000.0141.0360.01.0UrbanY
4LP001013MaleYes0Not GraduateNo23331516.095.0360.01.0UrbanY
df = df.drop(['Loan_ID'], axis=1)
# 数据集中的行数和列数
df.shape
(381, 12)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 381 entries, 0 to 380
Data columns (total 12 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Gender             376 non-null    object 
 1   Married            381 non-null    object 
 2   Dependents         373 non-null    object 
 3   Education          381 non-null    object 
 4   Self_Employed      360 non-null    object 
 5   ApplicantIncome    381 non-null    int64  
 6   CoapplicantIncome  381 non-null    float64
 7   LoanAmount         381 non-null    float64
 8   Loan_Amount_Term   370 non-null    float64
 9   Credit_History     351 non-null    float64
 10  Property_Area      381 non-null    object 
 11  Loan_Status        381 non-null    object 
dtypes: float64(4), int64(1), object(7)
memory usage: 35.8+ KB

三、处理数据集中缺失的值

df.isnull().sum()
Gender                5
Married               0
Dependents            8
Education             0
Self_Employed        21
ApplicantIncome       0
CoapplicantIncome     0
LoanAmount            0
Loan_Amount_Term     11
Credit_History       30
Property_Area         0
Loan_Status           0
dtype: int64
df['Gender'] = df['Gender'].fillna(df['Gender'].mode().iloc[0])
df['Self_Employed'] = df['Self_Employed'].fillna(df['Self_Employed'].mode().iloc[0])
df['Loan_Amount_Term'] = df['Loan_Amount_Term'].fillna(df['Loan_Amount_Term'].mode().iloc[0]).astype(int)
df['Credit_History'] = df['Credit_History'].fillna(df['Credit_History'].mode().iloc[0]).astype(int)

df['Dependents'] = df['Dependents'].replace(['0', '1', '2', '3+'], [0,1,2,3,])
df['Dependents'] = df['Dependents'].fillna(df['Dependents'].mode().iloc[0])

df['CoapplicantIncome'] = df['CoapplicantIncome'].astype(int)
df['LoanAmount'] = df['LoanAmount'].astype(int)
df.isnull().sum()
Gender               0
Married              0
Dependents           0
Education            0
Self_Employed        0
ApplicantIncome      0
CoapplicantIncome    0
LoanAmount           0
Loan_Amount_Term     0
Credit_History       0
Property_Area        0
Loan_Status          0
dtype: int64

四、将分类数据转换为数字形式

def cat_to_num(df, c_var):
    for i in c_var:
        uniques_value = df[i].unique()
        df[i].replace(uniques_value, [0, 1], inplace=True)

    for i in ['Property_Area']:
        uniques_value = df[i].unique()
        df[i].replace(uniques_value, [0, 1, 3], inplace=True)
c_variables = ['Gender', 'Married', 'Education', 'Education','Self_Employed', 'Loan_Status']

cat_to_num(df, c_variables)
df.head()
GenderMarriedDependentsEducationSelf_EmployedApplicantIncomeCoapplicantIncomeLoanAmountLoan_Amount_TermCredit_HistoryProperty_AreaLoan_Status
0001.00045831508128360100
1000.0013000066360111
2000.01025832358120360111
3010.00060000141360111
4000.0102333151695360111

五、数据可视化

分析分配给列的分类值

fig, ax = plt.subplots(3, 2, figsize=(12,15))

for index, cat_col in enumerate(c_variables):
    row, col = index//2, index%2
    sns.countplot(x=cat_col, data=df, hue='Loan_Status', ax=ax[row, col])

plt.subplots_adjust(hspace=1)

分析数值列

numerical_columns = ['ApplicantIncome', 'CoapplicantIncome', 'LoanAmount']

fig,axes = plt.subplots(1,3,figsize=(17,5))
for idx,cat_col in enumerate(numerical_columns):
    sns.boxplot(y=cat_col,data=df,x='Loan_Status',ax=axes[idx])

print(df[numerical_columns].describe())
plt.subplots_adjust(hspace=1)

六、数据预处理

X = df.drop(['Loan_Status'], axis=1)
y = df['Loan_Status']
X.shape, y.shape
((381, 11), (381,))

划分训练集和测试集

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train.shape, y_train.shape, X_test.shape, y_test.shape
((304, 11), (304,), (77, 11), (77,))
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)
X_test = scaler.fit_transform(X_test)

七、模型——决策树分类器

from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score,roc_auc_score
model = DecisionTreeClassifier(max_depth=3,min_samples_leaf = 35)
model.fit(X_train,y_train)

八、测试

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
roc_score = roc_auc_score(y_test, y_pred)

print(f'Accuracy Score: {accuracy*100:0.2f}%')
print(f'Roc Score: {roc_score*100:0.2f}%')
Accuracy Score: 81.82%
Roc Score: 66.67%
pd.crosstab(y_test, y_pred, rownames=['True'], colnames=['Predicted'], margins=True)
Predicted01All
True
071421
105656
All77077
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值