An unsupervised, image-inspired PCA approach to distinguish benign vs. multiple attack types in NetFlow data.
Project Overview
Traditional signature-based detection often misses novel or subtle threats. Inspired by facial-recognition “eigenfaces,” we reshape each 77-feature NetFlow record into a 300×300 grayscale array and train group-specific PCA models (“eigenprofiles”) for four attack families—credential abuse, denial-of-service, exploit/malware, and application-layer abuse—plus benign traffic. By measuring reconstruction error (L2 norm) against each profile, we can both flag anomalies and infer attack type without any labeled training.
Key Highlights
- Eigenprofile Modeling: Converts flow features into image-like inputs for PCA, enabling interpretable basis vectors.
- Unsupervised Detection: Differentiates benign vs. attack solely via reconstruction error—no labels needed in inference.
- Attack-Type Profiling: Four group-specific PCA models each best reconstruct their own attack class.
- Clear Separation: Reconstruction error distributions show strong separation between benign and every attack family.
- Scalable & Lightweight: PCA fitting and inference are linear in data size, suitable for high-volume SOC pipelines.
Pipeline Overview
Example Visualizations
Select outputs from the analysis:
Application-Layer Attack Samples


Original vs. PCA Reconstruction

Reconstruction Error by Group

Notebook
Complete Jupyter notebook demonstrating data prep, PCA modeling, and error-based classification.
View Notebook →Full Report
Complete writeup including problem statement, detailed methodology and results.
Read the full report here →Key Takeaways
- PCA-based eigenprofiling robustly separates benign and multiple attack classes via error metrics.
- Image-style encoding of flow data unlocks proven computer-vision tools for network security.
- Unsupervised approach reduces reliance on labeled signatures and improves detection of novel threats.