Predicting Defaults on Credit Card Payments

This model will predict the probability that a credit card holder will default on their payment given their payment history and demographic information.

Load libraries:

library(readxl)
library(xgboost)
library(caTools)
library(config)

Load config file:

model_config <- get("model_config")

This notebook refers to model-a.

Load data

Load the data and view its contents:

# Data from https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients
df <- read_excel("data/default-of-credit-card clients.xls", skip = 1)
df

Split data into training and testing sets

Split the data set into 80% training and 20% testing portions:

set.seed(123)
sample <- sample.split(df, SplitRatio = 0.80)

train <- as.matrix(subset(df, sample == TRUE))
x_train <- train[,-25]
y_train  <- train[, 25, drop=FALSE]

test <- as.matrix(subset(df, sample == FALSE))
x_test <- test[,-25]
y_test  <- test[, 25, drop=FALSE]

Train model

Train the model using a binary classification algoritm:

bst <- xgboost(data = x_train,
               label = y_train,
               max.depth = 2,
               eta = 1,
               nthread = 2,
               nrounds = 2,
               booster = "gbtree",
               objective = "binary:logistic")
[1] train-error:0.181125 
[2] train-error:0.181125 

Show the most important factors and their weights:

imp_matrix <- xgb.importance(feature_names = colnames(x_train), model = bst)
xgb.plot.importance(importance_matrix = imp_matrix)

Serialize model

Serialize the trained model to a file for later use:

model_save_path = paste0(model_config$path_prefix, model_config$id, model_config$path_suffix)
xgb.save(bst, model_save_path)
[1] TRUE

Model Testing

Generate model predictions on the test data and calculate the accuracy of the model:

pred <- predict(bst, x_test)
err <- mean(as.numeric(pred > 0.5) != y_test)
print(paste("Test Accuracy =", 1-err))
[1] "Test Accuracy = 0.8225"

Test model predictions

Generate a prediction for an account that we know has a good payment history:

test_data <- matrix(c(35, 500000, 1, 1, 1, 58, -2, -2, -2, -2, -2, -2, 13709, 5006, 31130, 3180, 0, 5293, 5006, 31178, 3180, 0, 5293, 768), nrow = 1)
pred_good <- predict(bst, test_data)
pred_good
[1] 0.1215914

This account has a 0.1215914 probability of defaulting on their payment.


Generate a prediction for an account that we know defaults on their payment:

test_data <- matrix(c(1, 20000, 2, 2, 1, 24, 2, 2, -1, -1, -2, -2, 3913, 3102, 689, 0, 0, 0, 0, 689, 0, 0, 0, 0), nrow = 1)
pred_bad <- predict(bst, test_data)
pred_bad
[1] 0.7043071

This account has a 0.7043071 probability of defaulting on their payment.

LS0tCnRpdGxlOiAiQ3JlZGl0IFJpc2sgTW9kZWxpbmcgaW4gUlN0dWRpbyIKcmVzb3VyY2VfZmlsZXM6Ci0gY29uZmlnLnltbApvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgojIyMgUHJlZGljdGluZyBEZWZhdWx0cyBvbiBDcmVkaXQgQ2FyZCBQYXltZW50cwoKVGhpcyBtb2RlbCB3aWxsIHByZWRpY3QgdGhlIHByb2JhYmlsaXR5IHRoYXQgYSBjcmVkaXQgY2FyZCBob2xkZXIgd2lsbCBkZWZhdWx0IG9uIHRoZWlyIHBheW1lbnQgZ2l2ZW4gdGhlaXIgcGF5bWVudCBoaXN0b3J5IGFuZCBkZW1vZ3JhcGhpYyBpbmZvcm1hdGlvbi4KCkxvYWQgbGlicmFyaWVzOgoKYGBge3IgbWVzc2FnZT1GQUxTRX0KbGlicmFyeShyZWFkeGwpCmxpYnJhcnkoeGdib29zdCkKbGlicmFyeShjYVRvb2xzKQpsaWJyYXJ5KGNvbmZpZykKYGBgCgpMb2FkIGNvbmZpZyBmaWxlOgoKYGBge3J9Cm1vZGVsX2NvbmZpZyA8LSBnZXQoIm1vZGVsX2NvbmZpZyIpCmBgYAoKVGhpcyBub3RlYm9vayByZWZlcnMgdG8gKipgciBtb2RlbF9jb25maWckaWRgKiouCgojIyMgTG9hZCBkYXRhCgpMb2FkIHRoZSBkYXRhIGFuZCB2aWV3IGl0cyBjb250ZW50czoKCmBgYHtyfQojIERhdGEgZnJvbSBodHRwczovL2FyY2hpdmUuaWNzLnVjaS5lZHUvbWwvZGF0YXNldHMvZGVmYXVsdCtvZitjcmVkaXQrY2FyZCtjbGllbnRzCmRmIDwtIHJlYWRfZXhjZWwoImRhdGEvZGVmYXVsdC1vZi1jcmVkaXQtY2FyZCBjbGllbnRzLnhscyIsIHNraXAgPSAxKQpkZgpgYGAKCiMjIyBTcGxpdCBkYXRhIGludG8gdHJhaW5pbmcgYW5kIHRlc3Rpbmcgc2V0cwoKU3BsaXQgdGhlIGRhdGEgc2V0IGludG8gODAlIHRyYWluaW5nIGFuZCAyMCUgdGVzdGluZyBwb3J0aW9uczoKCmBgYHtyIHdhcm5pbmc9RkFMU0V9CnNldC5zZWVkKDEyMykKc2FtcGxlIDwtIHNhbXBsZS5zcGxpdChkZiwgU3BsaXRSYXRpbyA9IDAuODApCgp0cmFpbiA8LSBhcy5tYXRyaXgoc3Vic2V0KGRmLCBzYW1wbGUgPT0gVFJVRSkpCnhfdHJhaW4gPC0gdHJhaW5bLC0yNV0KeV90cmFpbiAgPC0gdHJhaW5bLCAyNSwgZHJvcD1GQUxTRV0KCnRlc3QgPC0gYXMubWF0cml4KHN1YnNldChkZiwgc2FtcGxlID09IEZBTFNFKSkKeF90ZXN0IDwtIHRlc3RbLC0yNV0KeV90ZXN0ICA8LSB0ZXN0WywgMjUsIGRyb3A9RkFMU0VdCmBgYAoKIyMjIFRyYWluIG1vZGVsCgpUcmFpbiB0aGUgbW9kZWwgdXNpbmcgYSBiaW5hcnkgY2xhc3NpZmljYXRpb24gYWxnb3JpdG06CgpgYGB7cn0KYnN0IDwtIHhnYm9vc3QoZGF0YSA9IHhfdHJhaW4sCiAgICAgICAgICAgICAgIGxhYmVsID0geV90cmFpbiwKICAgICAgICAgICAgICAgbWF4LmRlcHRoID0gMiwKICAgICAgICAgICAgICAgZXRhID0gMSwKICAgICAgICAgICAgICAgbnRocmVhZCA9IDIsCiAgICAgICAgICAgICAgIG5yb3VuZHMgPSAyLAogICAgICAgICAgICAgICBib29zdGVyID0gImdidHJlZSIsCiAgICAgICAgICAgICAgIG9iamVjdGl2ZSA9ICJiaW5hcnk6bG9naXN0aWMiKQpgYGAKClNob3cgdGhlIG1vc3QgaW1wb3J0YW50IGZhY3RvcnMgYW5kIHRoZWlyIHdlaWdodHM6CgpgYGB7cn0KaW1wX21hdHJpeCA8LSB4Z2IuaW1wb3J0YW5jZShmZWF0dXJlX25hbWVzID0gY29sbmFtZXMoeF90cmFpbiksIG1vZGVsID0gYnN0KQp4Z2IucGxvdC5pbXBvcnRhbmNlKGltcG9ydGFuY2VfbWF0cml4ID0gaW1wX21hdHJpeCkKYGBgCgojIyMgU2VyaWFsaXplIG1vZGVsCgpTZXJpYWxpemUgdGhlIHRyYWluZWQgbW9kZWwgdG8gYSBmaWxlIGZvciBsYXRlciB1c2U6CgpgYGB7cn0KbW9kZWxfc2F2ZV9wYXRoID0gcGFzdGUwKG1vZGVsX2NvbmZpZyRwYXRoX3ByZWZpeCwgbW9kZWxfY29uZmlnJGlkLCBtb2RlbF9jb25maWckcGF0aF9zdWZmaXgpCnhnYi5zYXZlKGJzdCwgbW9kZWxfc2F2ZV9wYXRoKQpgYGAKCiMjIyBNb2RlbCBUZXN0aW5nCgpHZW5lcmF0ZSBtb2RlbCBwcmVkaWN0aW9ucyBvbiB0aGUgdGVzdCBkYXRhIGFuZCBjYWxjdWxhdGUgdGhlIGFjY3VyYWN5IG9mIHRoZQptb2RlbDoKCmBgYHtyfQpwcmVkIDwtIHByZWRpY3QoYnN0LCB4X3Rlc3QpCmVyciA8LSBtZWFuKGFzLm51bWVyaWMocHJlZCA+IDAuNSkgIT0geV90ZXN0KQpwcmludChwYXN0ZSgiVGVzdCBBY2N1cmFjeSA9IiwgMS1lcnIpKQpgYGAKCiMjIyBUZXN0IG1vZGVsIHByZWRpY3Rpb25zCgpHZW5lcmF0ZSBhIHByZWRpY3Rpb24gZm9yIGFuIGFjY291bnQgdGhhdCB3ZSBrbm93IGhhcyBhIGdvb2QgcGF5bWVudCBoaXN0b3J5OgoKYGBge3J9CnRlc3RfZGF0YSA8LSBtYXRyaXgoYygzNSwgNTAwMDAwLCAxLCAxLCAxLCA1OCwgLTIsIC0yLCAtMiwgLTIsIC0yLCAtMiwgMTM3MDksIDUwMDYsIDMxMTMwLCAzMTgwLCAwLCA1MjkzLCA1MDA2LCAzMTE3OCwgMzE4MCwgMCwgNTI5MywgNzY4KSwgbnJvdyA9IDEpCnByZWRfZ29vZCA8LSBwcmVkaWN0KGJzdCwgdGVzdF9kYXRhKQpwcmVkX2dvb2QKYGBgCgpUaGlzIGFjY291bnQgaGFzIGEgYHIgcHJlZF9nb29kYCBwcm9iYWJpbGl0eSBvZiBkZWZhdWx0aW5nIG9uIHRoZWlyIHBheW1lbnQuCgotLS0KCkdlbmVyYXRlIGEgcHJlZGljdGlvbiBmb3IgYW4gYWNjb3VudCB0aGF0IHdlIGtub3cgZGVmYXVsdHMgb24gdGhlaXIgcGF5bWVudDoKCmBgYHtyfQp0ZXN0X2RhdGEgPC0gbWF0cml4KGMoMSwgMjAwMDAsIDIsIDIsIDEsIDI0LCAyLCAyLCAtMSwgLTEsIC0yLCAtMiwgMzkxMywgMzEwMiwgNjg5LCAwLCAwLCAwLCAwLCA2ODksIDAsIDAsIDAsIDApLCBucm93ID0gMSkKcHJlZF9iYWQgPC0gcHJlZGljdChic3QsIHRlc3RfZGF0YSkKcHJlZF9iYWQKYGBgCgpUaGlzIGFjY291bnQgaGFzIGEgYHIgcHJlZF9iYWRgIHByb2JhYmlsaXR5IG9mIGRlZmF1bHRpbmcgb24gdGhlaXIgcGF5bWVudC4K