Cyber Crawler — Internet-scale threat discovery

Cyber Army^™'s autonomous, AI-powered scanner continuously scans the entire public internet to discover exposed assets, identify vulnerabilities, map misconfigurations, and detect publicly visible risks—helping organizations fix issues before attackers can exploit them.

• Fully RFC-compliant • Respects robots.txt
• Non-intrusive & passive observation only
• Clear user-agent identification

How It Works

→ AI-assisted discovery & classification
→ DNS enumeration & TLS fingerprinting
→ Passive vulnerability mapping
→ Real-time visibility across the internet
→ Shadow IT & unknown asset detection

What is Cyber Crawler?

Cyber Crawler is Cyber Army^™'s autonomous, AI-powered External Attack Surface and Vulnerability Scanner that continuously scans the entire public internet. It discovers exposed assets, identifies vulnerabilities, maps misconfigurations, and detects publicly visible risks—helping organizations fix issues before attackers can exploit them.

Cyber Crawler does not require organizations to sign up or enroll to be scanned. It only observes information already publicly visible on the internet, similar to how search engines index websites. All scanning is passive, legal, safe, and adheres to responsible scanning standards, including robots.txt respect, rate limiting, and clear user-agent identification.

Using large-scale internet reconnaissance, Cyber Crawler provides real-time visibility across domains, IP addresses, APIs, cloud assets, certificates, and open services—revealing shadow IT, unknown assets, and exposed infrastructure.

How Cyber Crawler Works

Cyber Crawler uses AI-assisted discovery, DNS enumeration, TLS fingerprinting, metadata extraction, and passive vulnerability mapping to build a complete picture of the public attack surface.

Responsible Crawling Practices:

• Fully RFC-compliant
• Respects robots.txt where applicable
• Uses a clear and identifiable user-agent
• Honors caching, retry, and backoff signals

Non-Intrusive & Passive:

• Observes publicly visible information only
• No brute force, password guessing, exploitation, or authentication attempts

Adaptive Scan Control:

• Automatic throttling and congestion awareness
• Host sensitivity detection to avoid impact

AI-Driven Classification:

• Server and technology fingerprinting
• Certificate and TLS security validation
• Cloud exposure identification
• API discovery and analysis

Vulnerability & Misconfiguration Detection

Cyber Crawler performs non-intrusive vulnerability detection using version fingerprints, TLS behavior, protocol metadata, and security header analysis.

It can identify:

• Outdated software versions across servers and services
• Known CVEs via version and fingerprint mapping
• Weak TLS (SSLv3, TLS 1.0/1.1) and insecure cipher suites
• Missing HTTP security headers (HSTS, CSP, X-Frame-Options)
• Exposed RDP/SSH/FTP/DB ports
• Vulnerable CMS versions (WordPress, Drupal, Joomla)
• Public cloud misconfigurations (AWS S3, Azure, GCP)
• Exposed admin dashboards or management interfaces
• Vulnerable JavaScript libraries and frameworks

Active vulnerability testing is performed only for Cyber Army^™ customers under a signed and authorized Pentesting or Red Team engagement.

What Information Cyber Crawler Collects

✓ We Collect (Public Only)

• Domain names and subdomains
• DNS records and IP address mappings
• Public IPs and exposed ports
• Service banners and protocol metadata
• TLS certificate information and cipher details
• HTTP headers and web technology fingerprints
• Public cloud storage visibility (if accessible)

✗ We Do NOT Collect

• User data or private content
• Credentials or internal system data
• Information behind authentication
• Data behind access controls

Cyber Crawler never attempts to bypass security controls, authenticate, or exploit vulnerabilities.

Opt-Out Instructions

If you prefer your domains or IPs not to be scanned, Cyber Army^™ provides two opt-out mechanisms:

1. Email Opt-Out

Send a request to:

crawler-optout@cyberarmy.tech

Include:

• Organization name
• Contact name and role
• Domains, subdomains, IPs, or CIDR ranges to exclude
• Proof of ownership (may be required)

2. robots.txt Exclusion

Cyber Crawler honors robots.txt directives:

robots.txt

User-agent: CyberCrawler
Disallow: /

Note: robots.txt applies only to web crawling, not basic IP visibility or port discovery.

Important Notes About Opt-Out

• Opt-out removes assets from future scans
• You may stop receiving exposure notifications
• Data already collected from public sources cannot be erased retroactively
• Minimal audit/compliance records may be retained