OSINT and Recon Methodology: A Practical Guide for Security Professionals

Before an attacker fires a single exploit, they already know your external IP ranges, which employees use which SaaS tools, and whether any developer committed credentials to a public repository six months ago. The recon phase is silent, leaves no logs on your systems, and often yields more actionable intelligence than any technical vulnerability scan.

TL;DR

OSINT recon is entirely passive — no traffic reaches the target’s systems

The goal is mapping the attack surface: domains, IPs, employees, credentials, secrets

Shodan, theHarvester, Amass, and GitHub dorking cover most of the surface area

Leaked credentials and exposed secrets are often the fastest path to initial access

Blue teams can run the same methodology defensively to find what attackers will find first

Why Recon Is the Most Important Phase

Most security testing jumps straight to scanning and exploitation. Professional attackers don’t. They spend the majority of their time on reconnaissance — gathering intelligence that makes every subsequent step faster, quieter, and more targeted.

Good recon answers three questions before any active testing begins:

What is exposed? — external infrastructure, domains, services, cloud assets
Who works there? — employees, roles, email formats, LinkedIn profiles
What was leaked? — credentials, API keys, source code, configuration files

The answers shape the entire engagement. A single leaked credential beats weeks of exploitation attempts. A misconfigured subdomain bypasses perimeter controls entirely.

The Methodology: Five Phases

Recon isn’t a list of tools to run — it’s a structured process. Each phase feeds the next.

1. Scope definition      → what are you allowed to investigate?
2. Passive discovery     → what exists publicly without touching targets?
3. Infrastructure mapping → IPs, ASNs, cloud footprint, certificates
4. People & org recon    → employees, email formats, social profiles
5. Credential & secrets hunt → leaked passwords, exposed API keys, public repos

The first phase is critical for authorized engagements: define exactly what’s in scope before starting. For threat intelligence work (studying an adversary’s infrastructure), all five phases apply without restriction.

Phase 1: Domain and Infrastructure Discovery

Starting with the Domain

Every engagement starts with the organization’s primary domain. From there, the attack surface expands outward.

Amass — the industry standard for passive subdomain enumeration. It queries certificate transparency logs, DNS databases, and dozens of passive sources:

# Passive enumeration only (no active DNS queries to target)
amass enum -passive -d example.com -o subdomains.txt

# More thorough: include active DNS resolution
amass enum -d example.com -o subdomains.txt

Certificate Transparency logs are publicly available records of every TLS certificate ever issued. They reveal subdomains the target never intended to make public — staging environments, internal tools, forgotten assets. Query them directly at crt.sh:

# Search crt.sh for all certificates issued to *.example.com
https://crt.sh/?q=%25.example.com&output=json

theHarvester aggregates data from search engines, DNS records, and public sources — email addresses, subdomains, IP ranges, and employee names in a single run:

# Harvest from multiple sources
theHarvester -d example.com -b google,bing,linkedin,certspotter,dnsdumpster -f output.html

Mapping IP Ranges and ASNs

Once you have subdomains, map them to IP addresses and identify the organization’s full network footprint. An Autonomous System Number (ASN) — think of it as the organization’s unique identifier on the internet — reveals all IP ranges they own:

# Find ASN from a known IP
whois 93.184.216.34 | grep -i "AS\|aut-num\|org"

# Query BGP data for all prefixes owned by an ASN
curl -s "https://api.bgpview.io/asn/AS15169/prefixes" | jq '.data.ipv4_prefixes[].prefix'

BGP.he.net and BGPView provide complete IP prefix data for any ASN — useful for understanding the full scope of cloud and on-premises infrastructure.

Shodan: The Internet-Wide Scanner

Shodan indexes internet-facing services globally — servers, routers, IoT devices, industrial control systems, cloud instances. Unlike a port scanner (which touches the target), querying Shodan is entirely passive.

# Find all services associated with an IP range
shodan host 93.184.216.0/24

# Find exposed services for a specific organization
shodan search "org:\"Example Corp\""

# Find Elasticsearch instances (often unauthenticated)
shodan search "product:elastic port:9200"

# Combine: specific org + specific service
shodan search "org:\"Example Corp\" http.title:\"Kibana\""

Shodan reveals what the target exposes to the internet — often including services the security team doesn’t know are public. Exposed admin panels, unpatched VPN gateways, and unauthenticated databases appear regularly.

Phase 2: People and Organization Recon

Employee Discovery

LinkedIn is the most complete public database of an organization’s employees. It reveals org structure, technology stack (from job postings), and key personnel. Direct LinkedIn scraping violates terms of service, but several legitimate approaches exist:

Google dorking: site:linkedin.com/in "Example Corp" "security engineer"
theHarvester with LinkedIn source (metadata only)
OSINT Framework categories for social media lookups

Job postings are often overlooked as intelligence sources. A job posting for “Senior AWS Security Engineer familiar with GuardDuty and CloudTrail” tells you the organization runs AWS, uses GuardDuty, and likely has gaps in their cloud security posture.

Email Format Discovery

Once you have employee names, identifying the email format enables phishing simulation and credential hunting. Common formats: first.last@, flast@, firstl@.

Hunter.io identifies the email format for any domain and often returns verified addresses. theHarvester scrapes email addresses from search engines and public sources automatically.

Verify a discovered format with SMTP verification (checking if an address exists without sending mail) or by finding confirmed addresses in data breach databases.

For targeted engagements, social profiles reveal:

Technologies used — badges, certifications, posts about tools
Physical locations — offices, remote work patterns
Relationships — who knows who, org hierarchy
Operational patterns — travel announcements, conference attendance

This intelligence feeds social engineering and spear-phishing scenarios. A security engineer posting “excited to present at BSides next week” has a predictable travel pattern for a targeted physical attack.

Phase 3: Credential and Secrets Hunting

This phase is often where engagements end — not because there’s nothing left to find, but because leaked credentials provide direct access.

GitHub Dorking

Developers frequently commit secrets to public repositories — API keys, database passwords, private keys, internal URLs. GitHub’s search syntax makes systematic hunting straightforward:

# Search for AWS keys associated with a domain
"example.com" "AKIA"

# Find private keys
"example.com" extension:pem

# Find hardcoded passwords
"example.com" "password" extension:env

# Find internal URLs
"example.com" "internal" "api_key"

Trufflehog automates secret scanning across GitHub organizations and repositories:

# Scan an entire GitHub organization for secrets
trufflehog github --org=examplecorp --only-verified

# Scan a specific repository including full git history
trufflehog git https://github.com/examplecorp/backend-api

Git history is critical — a secret deleted last month is still in the commit history unless the repository was scrubbed with git filter-repo. Most aren’t.

Breach Data and Credential Databases

Have I Been Pwned (HIBP) allows querying email addresses against known data breaches. For an organization, querying the domain reveals how many employee accounts appear in breach databases — and which breaches they came from.

Dehashed and similar services provide full credential records for authorized investigations. The combination of a valid username, a previously breached password, and the same password reused on a VPN login is one of the most common initial access vectors in real engagements.

Paste Sites and Dark Web Monitoring

Credentials and internal data frequently appear on Pastebin, GitHub Gist, and similar paste sites before they’re removed. Pulsedive, IntelligenceX, and grep.app index paste site content. For ongoing monitoring, BreachDirectory and commercial threat intelligence platforms track credential exposures in near-real-time.

Phase 4: Putting It Together — A Sample Workflow

Here’s how these tools chain together in a real engagement:

1. amass enum -passive -d target.com
   → 47 subdomains discovered

2. httpx -l subdomains.txt -status-code -title
   → 12 live hosts, 3 with interesting titles: "Admin Panel", "Jenkins", "Grafana"

3. shodan host [IP of Jenkins instance]
   → Jenkins 2.346, exposed on port 8080, no authentication header in response

4. theHarvester -d target.com -b linkedin
   → 23 employee names, email format: first.last@target.com

5. trufflehog github --org=targetcorp
   → AWS key found in commit 8 months ago (still in history)

6. HIBP domain search: target.com
   → 847 accounts in breach databases

The Jenkins instance, the leaked AWS key, and 847 potentially-reused credentials represent three distinct initial access paths — all discovered before sending a single packet to the target network.

Blue Team: What Does Your Organization Expose?

OSINT methodology works in both directions. Run it against your own organization before attackers do.

Weekly checks worth running:

Certificate transparency monitoring — alerts when new certificates are issued for your domain (unexpected subdomains = shadow IT or forgotten assets)
GitHub secret scanning — enable GitHub Advanced Security’s secret scanning for your organization
Shodan monitoring — set up email alerts for new results matching your IP ranges or organization name
HIBP domain monitoring — get notified when employee emails appear in new breaches

One-time audit:

Run Amass and theHarvester against your own domain — compare results against your known asset inventory
Search GitHub for your domain, company name, and internal service names
Query breach databases for your domain

The goal is to find what attackers find — and fix it before they do. The Non-Human Identity Security article covers what happens when service accounts and API keys discovered via OSINT are left unrotated. GitHub Secrets Management Crisis 2026 goes deep on the scale of the credential exposure problem in public repositories.

Tool Reference

Tool	Purpose	Free?
Amass	Subdomain enumeration, DNS mapping	✅
theHarvester	Email, subdomain, employee harvest	✅
Shodan	Internet-facing service discovery	Freemium
crt.sh	Certificate transparency search	✅
Trufflehog	Secret scanning in git repos	✅
Hunter.io	Email format discovery	Freemium
Maltego	Visual link analysis	Freemium
Recon-ng	Modular recon framework	✅
HIBP	Breach database lookup	✅
ExifTool	Metadata extraction from files	✅
BGPView	ASN and IP range lookup	✅
IntelligenceX	Paste site and dark web search	Freemium

BloodHound from First Run to Domain Admin — OSINT feeds directly into BloodHound enumeration; combine external recon with internal AD mapping once inside
AD Attack Chains: From Initial Access to Domain Admin — how recon-derived credentials and exposed services translate into full domain compromise
GitHub Secrets Management Crisis 2026 — the scale of the secret exposure problem in public repositories
Non-Human Identity Security: The Biggest Blind Spot of 2026 — what attackers do with the API keys and service account credentials discovered via OSINT