Detector Module
1. Introduction
The Detector Module (detector.py) module is an essential component of the Network Traffic Analysis Toolkit, designed to analyze and flag potentially malicious network traffic.
By leveraging heuristic-based analysis, the detector identifies:
- Unusual port activity (e.g., SSH brute-force attempts, unauthorized SMB access).
- Large packet sizes, indicative of data exfiltration or denial-of-service (DoS) attacks.
- Suspicious payload content, detecting common attacker commands (e.g., wget, powershell, /bin/bash).
- Traffic direction classification, distinguishing inbound, outbound, internal, and external traffic patterns.
This module plays a crucial role in automated cybersecurity monitoring, aiding security analysts in identifying and mitigating threats efficiently.
2. Design Considerations and Rationale
2.1. Why Heuristic-Based Detection?
Instead of relying on signature-based detection (like traditional antivirus software), this module applies heuristics to detect anomalies. The rationale behind this approach includes:
- Flexibility: Can detect unknown threats without predefined signatures.
- Low Overhead: Runs efficiently on large datasets without requiring complex machine learning models.
- Actionable Insights: Clearly identifies why a packet is suspicious.
2.2. Key Detection Criteria
| Feature | Reason for Inclusion |
|---|---|
| Port-based filtering | Identifies traffic on commonly abused ports (e.g., SSH-22, RDP-3389, SMB-445). |
| Packet size analysis | Large packets may indicate data exfiltration or DoS attacks. |
| Payload inspection | Detects attack commands embedded in network traffic. |
| Traffic direction analysis | Determines if traffic is inbound, outbound, internal, or external, aiding investigation. |
2.3. Design Choices
| Decision | Reasoning |
|---|---|
| Removed source_ip & dest_ip | The dataset already included source and destination, making these redundant. |
| Used on_bad_lines=”skip” when reading CSV | Prevents parsing failures on malformed data. |
| Converted NaN values in payload to empty strings | Ensures payload detection does not fail on missing values. |
3. Implementation Details
The implementation consists of several key functions, each handling a different aspect of traffic analysis. Below are some of the major components:
3.1. Port-Based Threat Detection
A predefined set of commonly targeted ports is used to detect suspicious traffic:
1
2
3
4
SUSPICIOUS_PORTS = {21, 22, 23, 53, 80, 443, 445, 1433, 1521, 3306, 3389}
df.loc[df["sport"].isin(map(str, SUSPICIOUS_PORTS)), "suspicious_reason"] += "Suspicious source port; "
df.loc[df["dport"].isin(map(str, SUSPICIOUS_PORTS)), "suspicious_reason"] += "Suspicious destination port; "
3.2. Large Packet Detection
Packets exceeding a predefined size threshold are flagged as potentially suspicious:
1
2
LARGE_PACKET_SIZE = 1500
df.loc[df["length"] > LARGE_PACKET_SIZE, "suspicious_reason"] += "Large packet size detected; "
3.3. Payload Inspection
The module searches for known malicious commands in packet payloads, detecting potential remote code execution attempts:
1
2
3
SUSPICIOUS_KEYWORDS = ["cmd.exe", "powershell", "wget", "curl", "/bin/sh", "/bin/bash"]
df["suspicious_payload"] = df["payload"].fillna("").apply(contains_suspicious_payload)
df.loc[df["suspicious_payload"], "suspicious_reason"] += "Suspicious payload detected; "
3.4. Traffic Direction Classification
To better understand attack patterns, the module classifies network traffic as inbound, outbound, internal, or external:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def classify_traffic_direction(row):
src = row["source"]
dst = row["destination"]
src_internal = is_internal_ip(src)
dst_internal = is_internal_ip(dst)
if src_internal and not dst_internal:
return "outbound"
elif not src_internal and dst_internal:
return "inbound"
elif src_internal and dst_internal:
return "internal"
else:
return "external"
df["traffic_direction"] = df.apply(classify_traffic_direction, axis=1)
4. Debugging and Challenges
4.1. Malformed CSV Parsing Error:
4.1 Malformed CSV Parsing Errors
Issue: Some CSV files contained corrupt or malformed data, causing Pandas to throw a ParserError.
Fix: Skipped problematic lines:
1
df = pd.read_csv(file, engine="python", on_bad_lines="skip")
4.2. Missing Payload Column
Issue: Some datasets did not include a payload column, causing the payload analysis function to fail.
Fix: Ensured the column exists before processing:
1
2
if "payload" not in df.columns:
df["payload"] = ""
4.3 Handling NaN Values in Payload Detection
Issue: Applying the contains_suspicious_payload() function on missing values caused filtering issues.
Fix: Converted NaN values to empty strings before analysis:
1
df["suspicious_payload"] = df["payload"].fillna("").apply(contains_suspicious_payload)
4.4. Traffic Direction Analysis Fails on Invalid IPs
Issue: Some records contained invalid or missing IPs, breaking the is_internal_ip() function.
Fix: Wrapped the function in a try-except block:
1
2
3
4
5
try:
ip_obj = ipaddress.ip_address(ip)
return any(ip_obj in net for net in INTERNAL_NETWORKS)
except ValueError:
return False
5. Deployment Considerations
5.1. Instllation Requirements
The following dependencies must be installed:
1
pip install pandas ipaddress
5.2. Running the Detector
1
python3 detector.py <input_csv> <output_folder>
5.3. Expected Output
The detector produces:
- CSV file (malicious_traffic.csv) – Lists all suspicious packets.
- Summary report (malicious_summary.txt) – Provides a high-level overview of detected threats.
Example CSV file output
| Time | Source | Destination | Suspicious Reason | Traffic Direction |
|---|---|---|---|---|
| 10:01:05 | 192.168.1.2 | 8.8.8.8 | Large packet size detected | Outbound |
| 10:02:10 | 192.168.1.2 | 192.168.1.1 | Suspicious payload detected | Internal |
6. Conclusion and next steps
6.1. Key Takeaways
The detector.py module successfully implements multiple heuristic techniques for identifying malicious network activity. Key benefits include:
- Detects multiple threat indicators (suspicious ports, large packets, payload anomalies).
- Traffic direction analysis provides additional context for investigations.
- Automated processing of large datasets with minimal overhead.
6.2. Future Enhancements
- Integrate real-time network monitoring to analyze live traffic instead of just PCAP files.
- Enhance payload analysis using machine learning to detect previously unseen attack patterns.
- Add external threat intelligence feeds to enrich analysis results.
- Create a bash script (
setup_detector.sh) for easy setup
Appendix: Full Python Script
For full implementation details, visit the GitHub Repository