How I Built My Own Custom Antivirus Using AI - Sentinel-AI

Important Educational Disclaimer

This project is purely educational and experimental. Building a production-ready antivirus system using AI is still far too complex and requires deep expertise in cybersecurity, kernel-level programming, malware analysis, and threat intelligence. Commercial antivirus solutions are developed by teams of hundreds of security experts over many years, with access to massive threat intelligence databases, advanced machine learning models, and kernel-level drivers.

This project demonstrates the potential of AI-assisted development for rapid prototyping and learning, but it should never be used as a replacement for professional security solutions. The limitations discussed in this article highlight why AI alone cannot yet replace the sophisticated multi-layered defense systems used by commercial security vendors.

Summary / Introduction

In an era where cyber threats evolve in minutes, I decided to explore a fascinating question: Can we use LLM-powered IDEs to build bespoke security tools? Using Cursor, I developed a functional, lightweight, and custom-tailored Antivirus (EDR prototype) for my machine.

This project wasn't just about "generating code"—it was about architecting a multi-layered defense system using AI as a high-speed co-pilot. Here is the breakdown of how I built it, the prompts I used, and what I learned about the limits of AI in cybersecurity.

Educational Value: While this prototype demonstrates the power of AI-assisted development, it's crucial to understand that creating a truly production-ready antivirus requires expertise far beyond what AI can currently provide. This project serves as an excellent learning tool to understand the fundamental concepts of endpoint protection, but it should be viewed as an educational exercise rather than a practical security solution.

The Architecture

The system, which I've named "Sentinel-AI", is built in Python and operates on three fundamental pillars of endpoint protection:

Real-time File System Monitoring

Using the watchdog library to hook into OS events. This allows the system to monitor file system changes in real-time, detecting when new files are created, modified, or deleted in monitored directories.

Signature-based Detection

A hashing engine that compares new files against a database of known threats (SHA-256). This is the most basic form of malware detection, similar to how traditional antivirus software identifies known malicious files.

Behavioral Heuristics

A process monitor powered by psutil that flags suspicious CPU spikes and unauthorized execution patterns. This helps detect potentially malicious behavior even when file signatures aren't recognized.

The "Prompt Engineering" Phase

I didn't ask the AI for a "complete antivirus" in one go. I used a Modular Prompting strategy to ensure code quality and avoid "hallucinated" security holes. This approach is crucial when building security-critical applications, as AI can sometimes generate code that appears correct but contains subtle vulnerabilities.

The Master Prompt

Why Modular Prompting? Breaking down complex security requirements into smaller, focused prompts helps ensure that each component is properly implemented and tested. This is especially important in security applications where a single vulnerability could compromise the entire system.

Subject: Development of Local Security Monitoring System Prototype (Basic EDR)

I want to develop an educational prototype of a protection system for my Mac machine. 
The project must be written in Python and must consist of the following modules:

1. File Watcher:
   - Use the watchdog library to monitor in real-time the 'Downloads' folder and Desktop
   - If a new executable file is created, the system must intercept it immediately
   - Support monitoring of multiple directories simultaneously
   - Handle file creation, modification, and deletion events

2. Integrity Check:
   - For each new file detected, calculate the SHA-256 hash
   - Compare the hash with a local JSON file called malware_db.json 
     (which will function as our blacklist of signatures)
   - Implement efficient hash lookup using Set or Dictionary data structures
   - Support both file-based and in-memory hash database for performance

3. Process Monitor:
   - Implement periodic monitoring (using psutil) that lists active processes
   - Signal if a process is consuming more than 90% CPU for more than 10 seconds, 
     suggesting potential anomalous behavior
   - Track process creation and termination events
   - Monitor memory usage and network connections for suspicious patterns
   - Generate alerts for processes that spawn multiple child processes rapidly

4. Logging & Alert:
   - Write every suspicious event to a security_log.txt file
   - Include timestamp, event type, file path/process name, and severity level
   - Show system notifications (use plyer or similar library)
   - Support different alert levels: INFO, WARNING, CRITICAL
   - Implement log rotation to prevent file size issues

Technical Requirements:

1. Generate modular and commented code:
   - Separate each module into its own class/file
   - Include comprehensive docstrings for all functions
   - Add type hints for better code clarity
   - Implement proper error handling throughout

2. Add a function to my project that connects to MalwareBazaar (abuse.ch) APIs 
   or downloads their CSV/text list of the most recent SHA-256 hashes:
   - The system must download the list, extract only the hashes
   - Save them in my malware_db.json file
   - Include a check to avoid duplicates
   - Implement automatic periodic updates (e.g., daily)
   - Handle API rate limiting and connection errors gracefully
   - Support both API and CSV/text file formats

3. Data Structure Optimization:
   - For the antivirus to be fast, do not save hashes as a simple list, 
     but as a Set or Dictionary
   - Here's how the JSON file should appear to be efficient (example):

{
    "last_update": "2023-10-27",
    "version": "1.0",
    "hash_count": 2,
    "hashes": [
        "5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8",
        "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
    ]
}

   - Use Set for O(1) hash lookup performance
   - Implement hash validation (must be 64-character hexadecimal strings)
   - Support incremental updates without full database reload

4. Exception Handling:
   - Include exception handling to prevent the program from crashing 
     if it doesn't have permissions to access certain system files
   - Handle file system errors, network errors, and permission errors gracefully
   - Implement retry logic for transient failures
   - Log all exceptions with full stack traces for debugging

5. Additional Features:
   - Implement a configuration file (config.json) for customizable settings
   - Add command-line arguments for different operation modes
   - Support daemon/service mode for background operation
   - Include a simple CLI interface for status checking and manual scans
   - Implement graceful shutdown handling (SIGTERM/SIGINT)

6. Installation & Execution:
   - Explain briefly how to install dependencies (requirements.txt)
   - Explain how to start the monitor in administrator mode
   - Include setup instructions for macOS permissions (Full Disk Access, etc.)
   - Provide example usage scenarios and test cases

Security Considerations:
- Do not run with root/admin privileges unless absolutely necessary
- Validate all file paths to prevent directory traversal attacks
- Sanitize all log entries to prevent log injection
- Use secure file permissions for sensitive files (malware_db.json, security_log.txt)
- Implement rate limiting for API calls to avoid abuse

Note: This expanded prompt demonstrates the level of detail required when using AI to build security-critical applications. Each requirement must be explicit, and edge cases must be considered to avoid vulnerabilities.

The Threat Intelligence Prompt

To make the tool "real," I prompted Cursor to build a Threat Feed Integrator:

"Write a Python script that connects to the MalwareBazaar (Abuse.ch) API. It should pull the latest 100 SHA-256 hashes of confirmed malware and update my local malware_db.json automatically while avoiding duplicates."

Note on Threat Intelligence: While this approach works for educational purposes, production antivirus systems use much more sophisticated threat intelligence feeds that include behavioral indicators, network signatures, and machine learning models trained on millions of samples. The 100-hash database used here is minuscule compared to the millions of known threats that commercial solutions track.

Technical Implementation Details

Hashing Engine

I implemented hashlib to generate SHA-256 signatures. I chose SHA-256 over MD5 because it is collision-resistant and the industry standard for modern IoCs (Indicators of Compromise).

Limitation: While SHA-256 is cryptographically secure, signature-based detection has fundamental weaknesses. Polymorphic malware can change its hash with minimal code modifications, rendering signature databases ineffective against advanced threats.

The Observer Pattern

The watchdog.observers module allows the program to sleep while waiting for an OS-level interrupt, making it extremely lightweight on system resources.

Architecture Note: This user-space monitoring approach is efficient for learning, but production security solutions often require kernel-level drivers to intercept system calls before malicious code can execute. Without kernel-level access, sophisticated malware can bypass user-space monitors.

Process Telemetry

The system doesn't just look at files; it looks at telemetry. If a process exhibits "Ransomware-like" behavior (high CPU + rapid file modifications), the system triggers an immediate alert.

Behavioral Analysis: While this heuristic-based approach can catch some threats, modern malware is designed to evade such detection by operating slowly, using legitimate system processes, or encrypting files in ways that mimic normal system activity. Advanced behavioral analysis requires machine learning models trained on vast datasets of both malicious and benign behavior.

Pros & Cons: The Reality Check

Building a security tool with AI provides a unique perspective on the "Defender's Dilemma." However, it's essential to understand both the capabilities and limitations of AI-assisted security development.

The Pros (The Power of AI)

Rapid Prototyping: I went from an idea to a functional monitor in under 30 minutes. This speed is invaluable for learning and experimentation.
Bespoke Logic: Unlike commercial AVs, I can tell my tool to never allow an executable to run from a specific temporary folder, no exceptions. This customization is perfect for specific use cases.
Educational Depth: I now understand the "hooks" between the Python interpreter and the Windows/macOS Kernel much better. This hands-on experience provides invaluable learning.

The Cons (Why Big Tech isn't worried yet)

Signature Fragility: A single byte change in a malware file changes the SHA-256 hash, rendering my "Database" useless against polymorphic threats. This is why commercial solutions use multiple detection methods.
Kernel Limitations: Without writing a Kernel Mode Driver (which is extremely difficult with AI alone), my AV can be terminated by a high-level virus. This is a fundamental architectural limitation.
The "False Positive" Trap: AI-generated heuristics can be aggressive, flagging legitimate system updates as "suspicious." Tuning detection rules requires extensive testing and domain expertise.

Why Building a Production Antivirus with AI is Still Too Complex

While AI tools like Cursor are incredibly powerful for rapid prototyping and learning, creating a production-ready antivirus requires capabilities that AI cannot yet provide:

Multi-Layered Defense Requirements

Production antivirus systems use multiple detection layers:

Kernel-Level Drivers: Intercept system calls before malicious code executes
Sandboxing: Execute suspicious files in isolated environments
Machine Learning Models: Trained on millions of samples over years
Cloud-Based Threat Intelligence: Real-time updates from global threat networks
Behavioral Analysis: Advanced heuristics that understand context and intent
Network Traffic Analysis: Detect command-and-control communications

Each of these layers requires deep expertise that goes far beyond what AI can generate from prompts.

Team and Resource Requirements

Commercial antivirus solutions are developed by:

Hundreds of security researchers and malware analysts
Years of continuous development and refinement
Access to massive threat intelligence databases
Extensive testing infrastructure
Compliance with security certifications and standards

No AI tool can replicate this level of expertise and infrastructure.

Advanced Threat Evasion

Modern malware uses sophisticated techniques to evade detection:

Polymorphic and metamorphic code that changes with each infection
Fileless malware that runs entirely in memory
Living-off-the-land attacks using legitimate system tools
Zero-day exploits that target unknown vulnerabilities
AI-generated malware designed to evade AI detection

Detecting these threats requires advanced techniques that are beyond the scope of simple signature-based or heuristic-based systems.

Conclusion & Future Steps

This project proves that AI tools like Cursor are force multipliers. They allow a single developer to build tools that would have previously required a small team of security engineers—for prototyping and learning purposes.

Educational Value: The real value of this project lies in the learning experience. By building a basic antivirus prototype, I gained deep insights into:

How endpoint detection systems work at a fundamental level
The relationship between user-space and kernel-space security
The limitations of signature-based detection
The complexity of behavioral analysis
Why commercial security solutions require such extensive resources

What's next? I plan to integrate a YARA-rule engine for better pattern matching and perhaps a local LLM (like Llama 3) to analyze suspicious PowerShell scripts in real-time. However, these enhancements will still be for educational purposes, as building a truly production-ready security solution requires expertise and resources far beyond what AI-assisted development can currently provide.

Final Note: This project demonstrates the exciting potential of AI in software development, but it also highlights the critical importance of understanding the limitations. Security is not an area where "good enough" is acceptable—production security tools must be built by experts with deep knowledge of threats, defenses, and the ever-evolving landscape of cybersecurity.

Need More Information?

If you have specific questions about this project, please send me an email or leave a comment. I will reply as soon as I will be available.

If you need additional information about this topic or if you want to discuss it personally, please write an email.