Post

Detector Module

Detector Module

1. Introduction

The Detector Module (detector.py) module is an essential component of the Network Traffic Analysis Toolkit, designed to analyze and flag potentially malicious network traffic.

By leveraging heuristic-based analysis, the detector identifies:

  • Unusual port activity (e.g., SSH brute-force attempts, unauthorized SMB access).
  • Large packet sizes, indicative of data exfiltration or denial-of-service (DoS) attacks.
  • Suspicious payload content, detecting common attacker commands (e.g., wget, powershell, /bin/bash).
  • Traffic direction classification, distinguishing inbound, outbound, internal, and external traffic patterns.

This module plays a crucial role in automated cybersecurity monitoring, aiding security analysts in identifying and mitigating threats efficiently.

2. Design Considerations and Rationale

2.1. Why Heuristic-Based Detection?

Instead of relying on signature-based detection (like traditional antivirus software), this module applies heuristics to detect anomalies. The rationale behind this approach includes:

  • Flexibility: Can detect unknown threats without predefined signatures.
  • Low Overhead: Runs efficiently on large datasets without requiring complex machine learning models.
  • Actionable Insights: Clearly identifies why a packet is suspicious.

2.2. Key Detection Criteria

FeatureReason for Inclusion
Port-based filteringIdentifies traffic on commonly abused ports (e.g., SSH-22, RDP-3389, SMB-445).
Packet size analysisLarge packets may indicate data exfiltration or DoS attacks.
Payload inspectionDetects attack commands embedded in network traffic.
Traffic direction analysisDetermines if traffic is inbound, outbound, internal, or external, aiding investigation.

2.3. Design Choices

DecisionReasoning
Removed source_ip & dest_ipThe dataset already included source and destination, making these redundant.
Used on_bad_lines=”skip” when reading CSVPrevents parsing failures on malformed data.
Converted NaN values in payload to empty stringsEnsures payload detection does not fail on missing values.

3. Implementation Details

The implementation consists of several key functions, each handling a different aspect of traffic analysis. Below are some of the major components:

3.1. Port-Based Threat Detection

A predefined set of commonly targeted ports is used to detect suspicious traffic:

1
2
3
4
SUSPICIOUS_PORTS = {21, 22, 23, 53, 80, 443, 445, 1433, 1521, 3306, 3389}

df.loc[df["sport"].isin(map(str, SUSPICIOUS_PORTS)), "suspicious_reason"] += "Suspicious source port; "
df.loc[df["dport"].isin(map(str, SUSPICIOUS_PORTS)), "suspicious_reason"] += "Suspicious destination port; "

3.2. Large Packet Detection

Packets exceeding a predefined size threshold are flagged as potentially suspicious:

1
2
LARGE_PACKET_SIZE = 1500
df.loc[df["length"] > LARGE_PACKET_SIZE, "suspicious_reason"] += "Large packet size detected; "

3.3. Payload Inspection

The module searches for known malicious commands in packet payloads, detecting potential remote code execution attempts:

1
2
3
SUSPICIOUS_KEYWORDS = ["cmd.exe", "powershell", "wget", "curl", "/bin/sh", "/bin/bash"]
df["suspicious_payload"] = df["payload"].fillna("").apply(contains_suspicious_payload)
df.loc[df["suspicious_payload"], "suspicious_reason"] += "Suspicious payload detected; "

3.4. Traffic Direction Classification

To better understand attack patterns, the module classifies network traffic as inbound, outbound, internal, or external:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def classify_traffic_direction(row):
    src = row["source"]
    dst = row["destination"]
    src_internal = is_internal_ip(src)
    dst_internal = is_internal_ip(dst)

    if src_internal and not dst_internal:
        return "outbound"
    elif not src_internal and dst_internal:
        return "inbound"
    elif src_internal and dst_internal:
        return "internal"
    else:
        return "external"

df["traffic_direction"] = df.apply(classify_traffic_direction, axis=1)

4. Debugging and Challenges

4.1. Malformed CSV Parsing Error:

4.1 Malformed CSV Parsing Errors

Issue: Some CSV files contained corrupt or malformed data, causing Pandas to throw a ParserError.

Fix: Skipped problematic lines:

1
df = pd.read_csv(file, engine="python", on_bad_lines="skip")

4.2. Missing Payload Column

Issue: Some datasets did not include a payload column, causing the payload analysis function to fail.

Fix: Ensured the column exists before processing:

1
2
if "payload" not in df.columns:
    df["payload"] = ""

4.3 Handling NaN Values in Payload Detection

Issue: Applying the contains_suspicious_payload() function on missing values caused filtering issues.

Fix: Converted NaN values to empty strings before analysis:

1
df["suspicious_payload"] = df["payload"].fillna("").apply(contains_suspicious_payload)

4.4. Traffic Direction Analysis Fails on Invalid IPs

Issue: Some records contained invalid or missing IPs, breaking the is_internal_ip() function.

Fix: Wrapped the function in a try-except block:

1
2
3
4
5
try:
    ip_obj = ipaddress.ip_address(ip)
    return any(ip_obj in net for net in INTERNAL_NETWORKS)
except ValueError:
    return False

5. Deployment Considerations

5.1. Instllation Requirements

The following dependencies must be installed:

1
pip install pandas ipaddress

5.2. Running the Detector

1
python3 detector.py <input_csv> <output_folder>

5.3. Expected Output

The detector produces:

  • CSV file (malicious_traffic.csv) – Lists all suspicious packets.
  • Summary report (malicious_summary.txt) – Provides a high-level overview of detected threats.

Example CSV file output

TimeSourceDestinationSuspicious ReasonTraffic Direction
10:01:05192.168.1.28.8.8.8Large packet size detectedOutbound
10:02:10192.168.1.2192.168.1.1Suspicious payload detectedInternal

6. Conclusion and next steps

6.1. Key Takeaways

The detector.py module successfully implements multiple heuristic techniques for identifying malicious network activity. Key benefits include:

  • Detects multiple threat indicators (suspicious ports, large packets, payload anomalies).
  • Traffic direction analysis provides additional context for investigations.
  • Automated processing of large datasets with minimal overhead.

6.2. Future Enhancements

  • Integrate real-time network monitoring to analyze live traffic instead of just PCAP files.
  • Enhance payload analysis using machine learning to detect previously unseen attack patterns.
  • Add external threat intelligence feeds to enrich analysis results.
  • Create a bash script (setup_detector.sh) for easy setup

Appendix: Full Python Script

For full implementation details, visit the GitHub Repository

This post is licensed under CC BY 4.0 by the author.