We’ll build a hybrid classification model that combines Logistic Regression, Random Forest, and Support Vector Machine (SVM) using Voting Classifier from scikit-learn.
🔧 Problem: Predict if a person has diabetes based on health data
We’ll use the famous PIMA Indians Diabetes Dataset (available in sklearn
or via pandas.read_csv
).
✅ Step-by-Step Code
# Step 1: Import Libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report
# Step 2: Load Dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
columns = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness',
'Insulin', 'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome']
df = pd.read_csv(url, names=columns)
print(df.head())
# Step 3: Feature Scaling & Splitting
X = df.drop('Outcome', axis=1)
y = df['Outcome']
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
# Step 4: Define Base Models
log_clf = LogisticRegression(random_state=42)
rf_clf = RandomForestClassifier(n_estimators=100, random_state=42)
svm_clf = SVC(probability=True, kernel='rbf', random_state=42)
# Step 5: Hybrid Model using Voting Classifier
voting_clf = VotingClassifier(
estimators=[
('lr', log_clf),
('rf', rf_clf),
('svc', svm_clf)
],
voting='soft' # soft = use predicted probabilities
)
# Step 6: Train and Evaluate
voting_clf.fit(X_train, y_train)
y_pred = voting_clf.predict(X_test)
print("\n🔍 Accuracy of Hybrid Model:", accuracy_score(y_test, y_pred))
print("\n📊 Classification Report:\n", classification_report(y_test, y_pred))
📌 How This Hybrid Model Works
- Logistic Regression captures linear trends.
- Random Forest captures non-linear relationships and handles noisy features.
- SVM handles high-dimensional spaces and decision boundaries well.
- Voting Classifier combines predictions from all models:
- Hard Voting: Majority class.
- Soft Voting: Average predicted probabilities — more flexible and accurate.
📈 Output Example
markdownCopyEdit🔍 Accuracy of Hybrid Model: 0.81
📊 Classification Report:
precision recall f1-score support
0 0.84 0.86 0.85 107
1 0.74 0.70 0.72 47
accuracy 0.81 154
macro avg 0.79 0.78 0.78 154
weighted avg 0.81 0.81 0.81 154