Model Evaluation
Overview
There are several methods to measure how well your model is performing. Here are some of the most commonly used metrics.
- Accuracy
- Confusion matrix
- Balanced accuracy
- Cross-validation
- Hyperparameter tuning
Accuracy
Accuracy is the ratio of correct predictions to total predictions. While it’s simple, it may not always give an accurate picture, especially with imbalanced data.
- Easy to misinterpret or obscure results
- Standard accuracy = num correct answers / num answers
- Standard accuracy can be unhelpful
For example, if a model predicts 99% of positive cases correctly, but incorrectly classifies a small number of negative cases, the accuracy may be misleading.
Confusion Matrix
A confusion matrix helps to evaluate binary classification models. It shows the actual vs predicted labels in a 2x2 table:
- True positives - Correct positive predictions
- True negatives - Correct negative predictions
- False positives - Incorrect positive predictions
- False negatives - Incorrect negative predictions
This matrix highlights where the model is making errors.
Balanced Accuracy
Balanced accuracy averages the accuracy across both classes. It’s useful when classes are imbalanced, ensuring the model doesn’t favor one class too much.
Balanced accuracy = (TP + TN) / 2
Consequently, balanced accuracy is often more reliable than standard accuracy.
Cross-validation
Cross-validation divides the data into multiple groups (folds) and evaluates the model performance on each split. It provides a more reliable estimate of the model's ability to generalize.
- Resampling procedure
- Ensure robustness of results
k-fold cross-validation
k-fold cross-validation
helps evaluate models when data is limited. It splits the data into multiple groups (k), training the model on k-1 parts and testing it on the remaining part.
- Divides data into k groups
- Trains on k-1 parts
- Tests on 1 part
- Repeats for each group
sklearn
sklearn
provides an easy way to implement k-fold cross-validation using the KFold
function.
- Set the number of splits (k) with KFold
- Use cross_val_score to calculate the scores
- Pass in the model, KFold object, dataset features, and target to evaluate the model
This allows for efficient model evaluation with different data splits.
from sklearn.model_selection import cross_val_score, KFold
# Split data into 5 equal parts
kfold = KFold(n_splits=5, shuffle=True, random_state=42)
# Get cross-validation accuracy
cv_results = cross_val_score(model, feature_X, feature_y, cv=kfold, scoring='balanced_accuracy')
print(cv_results)
Hyperparameter Tuning
Adjusting hyperparameters can improve model performance. You can test different values of a hyperparameter to find the best one for your model.
- Hyperparameter is a global model parameter
- It doesn't changed during training
For example, in logistic regression, the hyperparameter C
controls the regularization strength.
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score, KFold
# Hyperparameters to test
C_values = [0.001, 0.01, 0.1, 1, 10, 100, 1000]
# KFold cross-validation setup
kfold = KFold(n_splits=5, shuffle=True, random_state=42)
# Manually iterate over the hyperparameters
for C in C_values:
model = LogisticRegression(max_iter=200, C=C)
model.fit(X_train, y_train) # Fit the model on training data
accuracy = cross_val_score(model, X, y, cv=kfold, scoring='balanced_accuracy')
print(f"C = {C}, Accuracy: {accuracy.mean():.4f} (+/- {accuracy.std():.4f})")