TL;DR
Repository secret leaks are not edge cases—65% of Forbes AI 50 companies had confirmed credential exposures on GitHub, with a median remediation time of 94 days. Most leaked secrets (71%) tie to web app infrastructure and CI/CD pipelines, creating direct attack paths. This guide provides blue team detection strategies, prevention workflows, and incident response procedures based on 2025-2026 breach data.
Table of Contents
- The Magnitude Problem: Who Is Leaking What
- Why 94 Days Is Lethal
- Where Secrets Hide (And Attackers Look First)
- Detection Engineering: Find Secrets Before Attackers Do
- Prevention Architecture: Stop Secrets at Commit Time
- Incident Response: The First 4 Hours
- The Non-Human Identity Problem
- Summary
- Sources
- Important Links
The Magnitude Problem: Who Is Leaking What
Wiz’s November 2025 research analyzed the Forbes AI 50 companies—the most well-funded, security-conscious organizations in tech—and found that 65% had confirmed secret leaks on GitHub. Not “potential” or “false positive.” Confirmed exposures of API keys, tokens, and credentials.
The leaked material wasn’t limited to active repositories. Attackers routinely scan:
- Deleted forks (material persists in Git history)
- Gists (treated as “scratch pads” with production credentials)
- Secondary repositories (POC/test repos with copy-pasted production configs)
What Gets Leaked
Verizon’s 2025 Data Breach Investigations Report (DBIR), covering 22,052 incidents and 12,195 confirmed breaches, breaks down scanner-detected secrets by infrastructure type:
| Infrastructure Type | Percentage | Most Common Secret Type |
|---|---|---|
| Web Application Infrastructure | 39% | JWTs (66% of web app secrets) |
| CI/CD Pipelines | 32% | Service account tokens |
| Cloud Infrastructure | 15% | Google Cloud API keys (43% of cloud secrets) |
| Databases | 5% | Connection strings |
| Other | 9% | Mixed credentials |
That’s 71% web app and CI/CD. These aren’t obscure credential types—they’re the backbone of modern development workflows.
The Fortune 500 Isn’t Immune
This isn’t a “small company” problem. When security teams with substantial budgets and full-time AppSec engineers show 65% leak rates, the rest of the industry is statistically worse.
If your posture is “we scan repos, so we’re fine,” this data should unsettle you. Scanning is reactive. By the time your scanner fires, the secret has been committed, pushed, and potentially harvested.
Why 94 Days Is Lethal
Verizon DBIR reports a 94-day median time to remediate leaked GitHub secrets. That’s not time-to-detection. That’s time from detection to remediation—meaning the secret is rotated, scope is validated, and access is revoked.
What Happens in 94 Days
Here’s what attackers accomplish with a single valid credential in that window:
Week 1-2: Reconnaissance
- Validate credential scope and permissions
- Map accessible resources (databases, S3 buckets, API endpoints)
- Establish persistence with secondary access methods
- Avoid obvious activity that triggers alerts
Week 3-4: Lateral Movement
- Use compromised service account to access adjacent systems
- Pivot through API integrations
- Enumerate additional credentials stored in configuration
Week 5-12: Data Exfiltration
- Slow, gradual data pulls to avoid anomaly detection
- Stage data in attacker-controlled cloud storage (often legitimate services like Google Drive)
- Catalog sensitive data for later monetization
Week 13+: Decision Point
- Ransom demand (if target appears capable of paying)
- Silent long-term access (APT-style persistence)
- Credential resale on dark web markets
The Real Cost
ReliaQuest’s 2025 Annual Cyber-Threat Report shows that 80% of breaches involved data exfiltration. In cases with confirmed exfiltration:
- 60% used mainstream cloud storage (Google Drive, Mega, Amazon S3)
- 40% used C2 infrastructure
The use of legitimate cloud services is deliberate. It’s difficult to block Google Drive or S3 at the network edge without breaking legitimate workflows. Attackers know this.
Where Secrets Hide (And Attackers Look First)
Secret leaks follow predictable patterns. Attackers use automated scanners that search these locations systematically:
Primary Targets
1. Commit History
- Secrets removed in later commits remain in Git history
- Rewriting history doesn’t help if forks exist
- Public repositories preserve history forever
2. Pull Request Discussions
- Developers post configuration snippets for troubleshooting
- API keys embedded in error messages or debug output
- Often overlooked during security reviews
3. Issue Tracker Comments
- Support requests include credential dumps
- “Here’s my config file, why isn’t this working?”
- Issues remain public even after repository is deleted
4. Gists and Snippets
- Treated as temporary but indexed by search engines
- No organizational oversight
- Often contain production credentials from debugging sessions
5. GitHub Actions Logs
- Build logs may echo environment variables
- Secrets printed during failed deployments
- Accessible to anyone with repository read access
Secondary Targets
Deleted Repositories
- GitHub preserves forks even after parent is deleted
- Forks maintain complete commit history
- Attackers specifically search for “[original-repo]-fork” patterns
Dependency Files
- Hardcoded credentials in
package.json,requirements.txt,Gemfile - Base64-encoded secrets (easily decoded)
- Environment files (
.env,config.yaml) accidentally committed
Documentation
- Setup guides with example credentials that are actually production keys
- Architecture diagrams with credential paths
- Runbooks with embedded service account tokens
Detection Engineering: Find Secrets Before Attackers Do
Scanning is necessary but insufficient. Here’s a layered detection strategy based on how attackers actually operate.
Layer 1: Pre-Commit Scanning
Block secrets before they enter Git history:
Tool: git-secrets or Talisman
# Install git-secrets globally
git secrets --install ~/.git-templates/git-secrets
git config --global init.templateDir ~/.git-templates/git-secrets
# Add AWS patterns
git secrets --register-aws --global
# Add custom patterns
git secrets --add --global 'AKIA[0-9A-Z]{16}' # AWS Access Key
git secrets --add --global '[0-9a-f]{40}' # Generic API key
Why this matters: Prevention at commit time is the only control that avoids remediation cost. Once a secret enters history, you’re in incident response mode.
Layer 2: Repository Scanning
Detect secrets in existing repositories:
TruffleHog v3 Configuration
# Scan entire repository history
trufflehog github --repo https://github.com/org/repo \
--only-verified \
--json
# High-confidence results only with specific detectors
trufflehog github --repo https://github.com/org/repo \
--only-verified \
--json \
--filter-detectors="aws,github,slack,stripe"
# Scan entire organization including archived repos
trufflehog github --org your-org \
--only-verified \
--json \
--include-archived
Critical: Use --only-verified flag. This tests detected credentials against live APIs to confirm validity. A valid credential is not a false positive.
Layer 3: Attack Surface Monitoring
Monitor for leaked secrets outside your control:
What to Monitor:
- GitHub’s public search API for your organization’s domain patterns
- Pastebin, GitLab Snippets, Bitbucket public repos
- Docker Hub public images (often contain embedded credentials)
- Stack Overflow and developer forums
Detection Query Example:
# Search GitHub for potential credential patterns mentioning your org
curl -H "Authorization: Bearer GITHUB_TOKEN" \
"https://api.github.com/search/code?q=yourorg.com+password"
# Search for AWS keys mentioning your org
curl -H "Authorization: Bearer GITHUB_TOKEN" \
"https://api.github.com/search/code?q=AKIA+yourorg"
Layer 4: Behavioral Detection
Credential usage anomalies often indicate compromise:
Red Flags:
- API key used from unexpected geographic regions
- Service account authentication outside normal business hours
- Sudden spike in API call volume
- Access to resources never previously touched
- Multiple failed authentication attempts followed by success
SIEM Query (Splunk Example):
index=cloudtrail eventName=AssumeRole
| stats count by sourceIPAddress, userAgent, awsRegion
| where count > 100
| where awsRegion!="us-east-1" AND awsRegion!="eu-west-1"
This detects API keys used at abnormal volumes from unexpected regions—common attacker behavior after harvesting credentials from repositories.
Prevention Architecture: Stop Secrets at Commit Time
Detection is reactive. Prevention requires architectural changes to how credentials are handled.
Principle 1: Secrets Never Enter Code
Use Secret Management Services:
- AWS Secrets Manager / Azure Key Vault / GCP Secret Manager for cloud
- HashiCorp Vault for on-premise or hybrid
- Doppler / Infisical for development environment management
Implementation Pattern:
# ❌ WRONG: Hardcoded credential
api_key = "sk_live_51Hx..."
# ✅ CORRECT: Runtime secret retrieval
import boto3
secrets_client = boto3.client('secretsmanager')
response = secrets_client.get_secret_value(SecretId='prod/api-key')
api_key = response['SecretString']
Principle 2: Short-Lived Credentials
Static credentials with no expiration are persistent attack vectors.
Implement:
- AWS STS AssumeRole with 1-hour session tokens
- GitHub Actions OIDC tokens (no long-lived secrets needed)
- OAuth refresh tokens with 15-minute access token expiry
Example: GitHub Actions OIDC (No Secrets Required):
name: Deploy
on: push
permissions:
id-token: write # OIDC token generation
contents: read
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789:role/GitHubActions
aws-region: us-east-1
# No AWS_ACCESS_KEY_ID or AWS_SECRET_ACCESS_KEY needed
Principle 3: Least Privilege by Default
Credential Scoping Checklist:
- Service account can only access required resources
- API key has narrowest possible scope (read-only when possible)
- Database user has minimal table-level permissions
- Cloud role uses explicit resource ARNs, not wildcards
- CI/CD service account cannot modify production infrastructure
Example: AWS IAM Policy (Scoped):
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": ["s3:PutObject", "s3:GetObject"],
"Resource": "arn:aws:s3:::specific-bucket/specific-prefix/*"
}]
}
Compare to overprivileged pattern:
{
"Effect": "Allow",
"Action": "s3:*",
"Resource": "*" // ❌ Full account access
}
Principle 4: Secret Rotation Automation
Manual rotation fails. Automate it.
Rotation Cadence:
- Service accounts: 90 days maximum
- API keys: 60 days
- Database passwords: 30 days
- CI/CD tokens: 14 days
Automation Tools:
- AWS Secrets Manager has built-in rotation for RDS, Redshift, DocumentDB
- HashiCorp Vault supports dynamic secret generation with TTLs
- Custom rotation: Use AWS Lambda or Azure Functions triggered by scheduled events
Incident Response: The First 4 Hours
Despite prevention efforts, leaks happen. Speed of response determines blast radius.
Hour 1: Immediate Actions
Minute 0-15: Rotate Compromised Credential
- Generate new credential in secret manager
- Update production services to use new credential
- Revoke old credential immediately
- Do not wait to assess scope—assume compromise
Minute 15-30: Kill Active Sessions
- AWS: Revoke STS sessions via IAM policy update
- Google Cloud: Revoke service account keys
- GitHub: Regenerate personal access tokens
- Database: Kill active connections (
SELECT pg_terminate_backend(pid))
Minute 30-60: Lock Down Blast Radius
- Identify all resources accessible via leaked credential
- Apply temporary deny-all policies to compromised service account
- Enable enhanced logging on affected resources
- Notify incident response team
Hour 2-3: Scope Assessment
Query CloudTrail / Cloud Audit Logs:
# AWS: Find all API calls using compromised access key
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=AccessKeyId,AttributeValue=AKIA... \
--max-results 1000 \
--start-time 2026-01-01T00:00:00Z
# Focus on:
# - Data access (GetObject, DescribeTable, Query)
# - Privilege escalation (AttachUserPolicy, CreateAccessKey)
# - Persistence (CreateUser, CreateRole)
Critical Questions:
- What resources were accessed?
- Were any resources modified or created?
- Did attacker establish persistence mechanisms?
- Was data exfiltrated? (Look for large data transfer volumes)
Hour 3-4: Remediation
If Data Access Occurred:
- Inventory accessed data (databases, S3 buckets, API endpoints)
- Determine data classification (PII, PHI, financial, intellectual property)
- Assess regulatory notification requirements (GDPR, CCPA, HIPAA)
- Preserve logs for forensic analysis
If Persistence Detected:
- Enumerate all IAM users/roles created by compromised credential
- Check for new access keys, SSH keys, OAuth applications
- Review newly created Lambda functions or automation
- Scan for backdoored code commits if repository write access existed
Communication:
- Notify security leadership
- Prepare incident report for legal/compliance
- If data breach confirmed, engage incident response retainer
The Non-Human Identity Problem
Cloud Security Alliance’s 2025 State of SaaS Security Report found that 46% of organizations struggle to monitor non-human identities. This is the root cause of prolonged secret exposure.
What Are Non-Human Identities?
- Service accounts
- API keys
- OAuth applications
- CI/CD pipeline credentials
- Bot accounts
- Machine-to-machine authentication tokens
Why They’re Dangerous
Human accounts have clear owners, get offboarded when employees leave, and trigger MFA prompts during suspicious activity.
Non-human accounts:
- No MFA (often)
- No clear ownership (which team manages this service account?)
- Long-lived or never-expiring credentials
- Overprivileged by default (“just give it admin to make it work”)
Fixing the Gap
Implement Non-Human Identity Management:
- Inventory all service accounts (AWS IAM, GCP service accounts, Azure service principals)
- Assign ownership (which team/person is responsible for each)
- Enforce TTLs (no credentials older than 90 days)
- Require justification (why does this service account need this permission?)
- Audit quarterly (remove unused service accounts)
Tool: AWS IAM Access Analyzer
# Find unused service accounts
aws accessanalyzer list-analyzers
aws accessanalyzer get-finding --analyzer-arn "arn:..." --id "..."
# Check last used date for all IAM users
aws iam get-credential-report
Summary
Key Findings:
- 65% of Forbes AI 50 companies leaked credentials on GitHub—this is not a fringe problem
- 94-day median remediation time gives attackers a full quarter to exploit leaked secrets
- 71% of leaks involve web app infrastructure and CI/CD pipelines, creating direct production access
- 46% of organizations cannot effectively monitor non-human identities
Defensive Actions:
- Pre-commit scanning (git-secrets, Talisman) blocks secrets before they enter history
- Repository scanning (TruffleHog, GitGuardian) detects existing exposures
- Secret managers (AWS Secrets Manager, Vault) eliminate hardcoded credentials
- Short-lived tokens (STS, OIDC) reduce blast radius of compromised credentials
- Automated rotation (90-day max for service accounts) limits exposure window
- Non-human identity governance ensures service accounts have owners and expiration
Incident Response Checklist:
- Rotate compromised credential within 15 minutes
- Kill active sessions immediately
- Query cloud audit logs for attacker activity
- Assess data access and privilege escalation
- Remove attacker persistence mechanisms
- Preserve logs for forensic analysis
Repository secret leaks are not inevitable. They’re a process failure. Fix the process.
Sources
Important Links
TruffleHog v3 - Secret Scanner - Open-source credential scanner with API verification
git-secrets - Pre-commit Hook - AWS tool to prevent secrets in Git commits
GitGuardian - Repository Scanning - Commercial secrets detection with developer remediation workflow
AWS Secrets Manager - Managed secret storage and rotation service
HashiCorp Vault - Self-hosted secrets management and dynamic credential generation
Doppler SecretOps Platform - Developer-friendly secrets management for applications
GitHub Actions OIDC Guide - Keyless authentication for CI/CD
AWS IAM Access Analyzer - Identify unused credentials and overprivileged access
Semgrep - Code Security Scanning - Static analysis with custom rules for secret patterns
Infisical - Open Source Secrets Management - Self-hosted alternative to commercial solutions
