katana
Fast web crawler for collecting URLs and endpoints. ProjectDiscovery.
Quickstart
# Crawl single URL
katana -u https://target.com
# Crawl with JS rendering
katana -u https://target.com -headless
# Crawl list of URLs
katana -list urls.txt
# Pipe to nuclei
katana -u https://target.com -silent | nuclei -silent
Core Concepts
| Concept |
Description |
| Crawling |
Follow links and discover endpoints |
| Headless |
Use browser for JS-heavy sites |
| Scope |
Control what gets crawled |
| Passive |
Extract URLs without making requests |
Syntax
katana -u <url> [options]
katana -list <file> [options]
Options
Input
| Option |
Description |
-u <url> |
Single URL |
-list <file> |
URL list |
- |
Read from stdin |
-resume <file> |
Resume from file |
Crawling
| Option |
Description |
-d <n> |
Max depth (default 3) |
-jc |
Crawl JS files |
-ct <sec> |
Crawl timeout |
-kf |
Keep query string in URLs |
-ef <ext> |
Exclude extensions |
-em <type> |
Exclude media |
-fs <pattern> |
Field scope |
Headless
| Option |
Description |
-headless |
Enable headless browser |
-hl |
Headless with full browser |
-sc |
Use system Chrome |
-xhr |
Extract XHR requests |
-ws |
Extract WebSocket URLs |
Scope
| Option |
Description |
-cs <scope> |
Crawl scope (dn, rdn, fqdn) |
-do |
Display out of scope URLs |
-fs <regex> |
Filter scope |
-sf <domain> |
Scope filter domain |
Output
| Option |
Description |
-o <file> |
Output file |
-json |
JSON output |
-silent |
Silent mode |
-nc |
No color |
-v |
Verbose |
Performance
| Option |
Description |
-c <n> |
Concurrency (default 10) |
-p <n> |
Parallelism |
-rl <n> |
Rate limit |
-timeout <sec> |
Timeout |
-retry <n> |
Retries |
Request
| Option |
Description |
-H "Header: val" |
Custom header |
-proxy <url> |
HTTP proxy |
-xhr |
XHR extraction |
Recipes
Basic Crawling
# Simple crawl
katana -u https://target.com
# Deeper crawl
katana -u https://target.com -d 5
# Silent output
katana -u https://target.com -silent
# Multiple targets
katana -list urls.txt -silent
JS-Heavy Sites
# Headless crawling
katana -u https://target.com -headless
# With XHR extraction
katana -u https://target.com -headless -xhr
# System Chrome
katana -u https://target.com -headless -sc
Endpoint Discovery
# Crawl + JS parsing
katana -u https://target.com -jc
# Keep query strings
katana -u https://target.com -kf
# Extract forms
katana -u https://target.com -f
Scope Control
# Same domain only
katana -u https://target.com -cs dn
# Include subdomains
katana -u https://target.com -cs rdn
# Exclude file types
katana -u https://target.com -ef png,jpg,gif,css,woff
Pipeline Integration
# katana → nuclei
katana -u https://target.com -silent | nuclei -silent
# subfinder → httpx → katana
subfinder -d target.com -silent | httpx -silent | katana -silent
# katana → gf (pattern extract)
katana -u https://target.com -silent | gf xss
# Crawl and find params
katana -u https://target.com -silent | grep "?" | sort -u
API Endpoint Discovery
# Find API endpoints
katana -u https://target.com -silent | grep -E "/api/|/v[0-9]/"
# JSON output for parsing
katana -u https://target.com -json -o crawl.json
# Extract unique paths
katana -u https://target.com -silent | \
sed 's/\?.*//' | sort -u
Through Proxy
# Burp/Caido proxy
katana -u https://target.com -proxy http://127.0.0.1:8080
# With headers
katana -u https://target.com -H "Authorization: Bearer token"
Output & Parsing
# JSON output
katana -u https://target.com -json -o results.json
# Parse JSON
cat results.json | jq -r '.request.endpoint'
# Extract unique endpoints
katana -u https://target.com -silent | sort -u > endpoints.txt
# Filter by pattern
katana -u https://target.com -silent | grep -E "\.(php|asp|jsp)"
Troubleshooting
| Issue |
Solution |
| Missing JS endpoints |
Use -headless |
| Too slow |
Reduce -d, increase -c |
| Stuck on site |
Add -ct timeout |
| Scope issues |
Check -cs setting |
References