Suspicious User-Agent Triage: Bots, Scanners, and False Positives

Technical cover image for Suspicious User-Agent Triage: Bots, Scanners, Broken Clients, and False Positives

A Weird User-Agent in Your Logs

You grep your access logs and find a request from a User-Agent you do not recognize: "Mozilla/5.0 (compatible; Nimbostratus-Bot/v1.3.2; +http://example.com/bot)". Is this a threat, a legitimate crawler, or a broken client?

Use OpsCheck User Agent Parser to extract the browser, OS, and device category from any UA string. Then cross-reference the IP with OpsCheck HTTP Headers to see if the request carries other suspicious headers.

# Parse User-Agent with a one-liner
awk -F'"' '{print $6}' /var/log/nginx/access.log \
  | grep -iE "bot|crawl|spider|scrap" | sort | uniq -c | sort -rn | head -10

# Check if a UA belongs to a known search engine
echo "Mozilla/5.0 (compatible; Googlebot/2.1)" | grep -i googlebot \
  && echo "search crawler" || echo "unknown"

Classification Framework

# Extract User-Agent strings from nginx access log
grep -oP '"\K[^"]*(?=" "[0-9]{3})' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -20

# Filter for known scanner signatures
grep -iE "nmap|nikto|sqlmap|burp|nessus|zgrab|masscan" /var/log/nginx/access.log

# Check the IP reputation
whois 203.0.113.45 | grep -iE "^OrgName|^netname|^descr"

Categories of Suspicious UAs

# 1. Legitimate crawlers — identify by reverse DNS
dig -x 66.249.66.1 +short  # Googlebot IP should resolve to *.google.com
dig -x 40.77.167.0 +short  # Bingbot IP should resolve to *.search.msn.com

# 2. Vulnerability scanners — often use default UA strings
# "Nikto", "Nessus", "OpenVAS", "ZmEu", "zgrab"

# 3. Broken clients — ancient browser versions or corrupted strings
# "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" — genuine old browser

# 4. Scrapers — custom UAs, high request rate, no images/CSS loaded
# Check if they request robots.txt, favicon, or CSS — scrapers usually skip these

Real-World Scenario

An e-commerce site saw 50,000 requests in one hour from a User-Agent string "python-requests/2.28.1". The IP belonged to a legitimate data analytics company, but the request pattern — hitting product pages sequentially without loading images or CSS — confirmed it was a scraper, not a broken integration. Blocking at the CDN level stopped the traffic, and the company reached out through official channels for API access.

The OpsCheck Blacklist Checker confirmed the IP was not on any spam blacklists — this was a grey-area scraper, not a known bad actor. Context matters: a python-requests UA from a university IP doing 10 requests/hour is research; from a commercial datacenter doing 500 requests/minute is scraping.

Triage Checklist

Parse the UA — does it declare itself as a bot or a browser?
Check the IP: residential, datacenter, or cloud? Datacenter IPs are more likely scrapers
Look at request rate and pattern — sequential product IDs? No CSS/images?
Verify crawler identity via reverse DNS if it claims to be Google/Bing/etc
Check if the IP respects robots.txt — fetch it from the same IP range

← Back to Blog

Suspicious User-Agent Triage: Bots, Scanners, Broken Clients, and False Positives

A Weird User-Agent in Your Logs

Classification Framework

Categories of Suspicious UAs

Real-World Scenario

Triage Checklist

Related Articles