Skip to content

Code for preprocessing, feature engineering, and training six classifiers on CICIDS2017 and 2018. Implements static and adaptive ensembles, including confidence-based weighting and meta-learning, to boost intrusion detection accuracy and robustness.

Notifications You must be signed in to change notification settings

snigdhasv/Adaptive-Ensemble-Learning-for-Intrusion-Detection-Using-CICIDS2017-and-CICIDS2018

Repository files navigation

Adaptive-Ensemble-Learning-for-Intrusion-Detection-Using-CICIDS2017-and-CICIDS2018

This repository provides a complete workflow for working with the CICIDS2017 and CICIDS2018 datasets — from raw data loading and cleaning, through exploratory data analysis (EDA), to building and evaluating dynamic ensemble learning models for Intrusion Detection Systems (IDS).


🗂️ Contents

  • Part 1: 📌 CICIDS2017 Preprocessing & EDA
  • Part 2: 📌 CICIDS2018 Preprocessing & EDA
  • Part 3: 🚀 Dynamic Ensemble Performance Evaluation

📊 Overview

This project helps you:

  • ✅ Load large network traffic CSVs efficiently
  • ✅ Clean, optimize, and engineer features
  • ✅ Visualize and understand attack distributions
  • ✅ Train, load, and test multiple ML models
  • ✅ Combine base models using static and adaptive ensembling
  • ✅ Evaluate performance across datasets and techniques

⚙️ Requirements

Install once for all modules:

pip install pandas numpy scikit-learn matplotlib seaborn missingno joblib

Or inside a Colab cell:

!pip install pandas numpy scikit-learn matplotlib seaborn missingno joblib

📌 Part 1 — CICIDS2017 Data Preprocessing & EDA

✅ Key Steps:

  1. Mount Drive

    from google.colab import drive
    drive.mount('/content/drive')
  2. Load and Clean

    data = load_cicids_data('/content/drive/MyDrive/Capstone/CICIDS2017')
    data = optimize_dtypes(data)
    data.drop_duplicates(inplace=True)
  3. Handle Missing Values

    data.replace([np.inf, -np.inf], np.nan, inplace=True)
    data.fillna(data.median(), inplace=True)
  4. Label Engineering

    data['Attack Type'] = data['Label'].map(attack_map)
    le = LabelEncoder()
    data['Attack Number'] = le.fit_transform(data['Attack Type'])
  5. EDA

    import missingno as msno
    msno.bar(data)
    sns.heatmap(data.corr(numeric_only=True))

📌 Part 2 — CICIDS2018 Data Preprocessing & EDA

✅ Key Steps:

  1. Mount Drive

    from google.colab import drive
    drive.mount('/content/drive')
  2. Combine Multiple CSVs

    df1 = pd.read_csv('/path/to/file1.csv')
    df2 = pd.read_csv('/path/to/file2.csv')
    data = pd.concat([df1, df2], ignore_index=True)
  3. Fix Data Types

    data = fixDataType(data)
    data = optimize_dtypes(data)
  4. Label Encoding

    attack_map = {...}
    data['Attack Type'] = data['Label'].map(attack_map)
    le = LabelEncoder()
    data['Attack Number'] = le.fit_transform(data['Attack Type'])
  5. Visual EDA

    msno.bar(data)
    sns.boxplot(x='Attack Type', y='Flow Duration', data=data)

🚀 Part 3 — Dynamic Ensemble Performance Evaluation

✅ Main Features

  • Load trained models for 2017 & 2018

  • Combine models: average, weighted, max-voting

  • Adaptive ensembling with:

    • Confidence metrics
    • Meta-learner (RandomForestRegressor)
  • Evaluate all pairwise model combinations

  • Generate comparison tables & plots


🧩 How to Run

1️⃣ Mount Google Drive

from google.colab import drive
drive.mount('/content/drive')

2️⃣ Run the Pipeline

if __name__ == "__main__":
    runner, standard_summary, adaptive_summary = main()

🧩 Core Classes

Class Role
ModelLoader Loads models and test splits
EnsemblePredictor Static ensembling
AdaptiveEnsemblePredictor Confidence and meta-learning
EvaluationMetrics Accuracy, F1, recall, precision
VisualizationTools Confusion matrices, bar plots, heatmaps
EnsembleExperimentRunner Runs all experiments and reporting

📈 Example: Run an Adaptive Ensemble

adaptive = AdaptiveEnsemblePredictor()
preds, confs, weights = adaptive.predict_ensemble(
    X_input, model1, model2,
    method='meta_learner',
    X_train=X_train_subset, y_train=y_train_subset
)

📊 Outputs

  • Individual model metrics (accuracy, F1, precision, recall)
  • Confusion matrix comparisons
  • Top-k model combination heatmaps
  • CSV-style DataFrame of results
  • Summary reports comparing standard vs adaptive ensembles

🏷️ License

Academic & research use only. Please cite CICIDS2017 and CICIDS2018.


✍️ Author

This project was built for security researchers working on real-time Intrusion Detection using ensemble learning techniques.


About

Code for preprocessing, feature engineering, and training six classifiers on CICIDS2017 and 2018. Implements static and adaptive ensembles, including confidence-based weighting and meta-learning, to boost intrusion detection accuracy and robustness.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published