zipguard: Safe ZIP Extraction With Zero Dependencies

You download a ZIP from a ticket, a CI artifact, or a vendor — and extract it. Two seconds later, a file has landed in C:\Windows\System32. You never noticed.

That’s ZipSlip. It’s been in the wild since 2018 and tools are still falling for it in 2026.

TL;DR

ZIP archives carry several serious attack vectors beyond their obvious content

Standard tools (7-Zip, WinRAR, Python’s zipfile) trust the archive by default

zipguard is a zero-dependency Python CLI that enforces a security policy before writing anything to disk

It blocks ZipSlip, ZIP bombs, executable drops, RTLO spoofing, ZIP64 manipulation, and more

Install: pip install git+https://github.com/Mhacker1020/zipguard.git

Why ZIP Is a Security Problem

ZIP is everywhere. Build artifacts, email attachments, software packages, CTF challenges, malware samples, vendor deliveries. The format itself is 35 years old and was never designed with adversarial inputs in mind.

The problem isn’t ZIP itself — it’s that extraction tools trust the archive. They read filenames, paths, and sizes from the archive’s own metadata. An attacker who controls the archive controls what your tool believes.

Here are the attacks that actually happen:

ZipSlip — Writing Outside the Target Directory

A ZIP entry can contain a filename like ../../etc/cron.d/backdoor. When a naive extractor joins this with the output directory, the file lands outside it entirely.

target_dir = /tmp/extracted
entry.name = ../../etc/cron.d/backdoor
result     = /etc/cron.d/backdoor   ← outside target_dir

This works on Windows too: ..\..\Windows\System32\evil.dll. The original research by Snyk in 2018 found this in thousands of projects including Apache, npm ecosystem packages, and CI tools.

ZIP Bombs — Exhausting Disk or Memory

A ZIP bomb is a small compressed file that decompresses to an enormous amount of data. The classic “42.zip” is 42 KB compressed, 4.5 petabytes uncompressed across nested layers.

The attack works because ZIP’s compression ratio can be extreme for repetitive data. An archive reporting 1 MB of content might actually contain 100 GB. Tools that check file_size from the ZIP metadata before extraction get fooled — that field is attacker-controlled.

The only reliable protection is counting actual bytes during streaming extraction and aborting mid-write when limits are hit.

Executable Drops — Delivering Malware as “Attachments”

This one is simple: you extract a ZIP and it contains setup.exe, update.ps1, or install.bat. If your pipeline, your analyst machine, or your user’s desktop auto-runs or accidentally double-clicks these, you have a problem.

Defenders handling malware samples, CI pipelines processing build artifacts, and SOC analysts receiving suspicious files all benefit from a policy that blocks or quarantines executables before they reach disk in runnable form.

RTLO Filename Spoofing

Unicode’s Right-to-Left Override character (U+202E) reverses the display direction of text. A file named document‮gpj.exe displays as documentexe.jpg in most file managers. The actual extension is .exe.

This is a classic phishing technique in email attachments, now also seen in archive-based delivery.

ZIP64 Manipulation

ZIP64 is an extension for archives exceeding 4 GB. It adds an “extra field” to each entry that can contain its own size values. A crafted archive can carry a small, innocent-looking size in the main header — to pass pre-extraction checks — and a vastly different size in the ZIP64 extra field, which the decompressor actually uses.

This is how some archive bomb variants bypass scanners that only check metadata before decompressing.

Double Extensions

document.pdf.exe is not a PDF. But in Windows Explorer with “hide known file extensions” enabled (the default), it displays as document.pdf. Combine this with a ZIP that triggers extraction on open, and the user has run an executable they thought was a document.

zipguard

We built zipguard to enforce a security policy on ZIP extraction. Every entry is validated before anything is written to disk. The result is a clear audit table showing every decision made.

$ zipguard suspicious.zip --out ./analysis
Extracting suspicious.zip → ./analysis

  Decision    File                        Reason
 ──────────────────────────────────────────────────────────────────────────
  BLOCKED     ../../evil.txt              Path traversal detected
  RENAMED     dropper.exe                 executable extension blocked by policy (.exe)
  BLOCKED     document.pdf.exe            Double extension spoofing detected
  BLOCKED     report‮gpj.exe              Unsafe Unicode bidirectional character

  1 allowed  1 renamed  2 blocked

Exit code 1 when anything is blocked — useful for CI pipelines that should fail on suspicious archives.

What zipguard Enforces

Attack	zipguard
ZipSlip (path traversal)	Blocked
Absolute paths (`/etc/passwd`, `C:\...`)	Blocked
UNC paths (`\\server\share\...`)	Blocked
Archive bomb (per-file size)	Aborted
Archive bomb (total extraction size)	Aborted
Archive bomb (file count)	Aborted
Forged metadata (real bytes counted during streaming)	Aborted
ZIP64 extra field inconsistency	Aborted
Duplicate entry names	Aborted
Executable extensions (`.exe`, `.ps1`, `.bat`, `.vbs`…)	Renamed/blocked
RTLO filename spoofing	Blocked
Double extension (`document.pdf.exe`)	Blocked
Symlinks and hardlinks	Blocked
Encrypted entries	Clear error

Rename mode is the default for executables: instead of deleting dropper.exe, zipguard renames it to dropper.exe.blocked. The file is preserved for analysis — it just can’t be accidentally executed.

Atomic Writes

One detail worth highlighting: zipguard uses atomic writes for every file. Extraction goes to a temporary file in the same directory. The final filename only appears once all bytes have been written and all streaming checks have passed.

If extraction is aborted mid-stream (size limit hit, error thrown), no partial file is left at the destination. The temporary file is cleaned up in a finally block.

This matters in pipelines where partial files could be picked up by a downstream process before the extraction completes.

Usage

Install

pip install git+https://github.com/Mhacker1020/zipguard.git

PyPI release coming soon.

Basic extraction

# Extract to ./extracted/ (default)
zipguard archive.zip

# Specify output directory
zipguard archive.zip --out ./output

Dry run — inspect without extracting

zipguard archive.zip --dry-run --verbose

Useful for pre-screening archives before committing to extraction.

JSON output for automation

zipguard archive.zip --format json | jq '.entries[] | select(.decision == "blocked")'

Save audit log

zipguard archive.zip --log audit-$(date +%Y%m%d).json

Custom policy

# Extra size limit
zipguard archive.zip --max-size 50MB

# Block additional extensions
zipguard archive.zip --block-ext .py,.sh

# Load full policy from file
zipguard archive.zip --config policy.json

Exit codes

Code	Meaning
`0`	All entries allowed
`1`	One or more entries blocked or renamed
`2`	Extraction aborted (archive bomb, malformed archive)

Use as a Library

zipguard also works as a Python library — useful for integrating into your own tools or pipelines:

from pathlib import Path
from zipguard import SafeExtractor, ExtractionPolicy

policy = ExtractionPolicy(
    max_file_size=50 * 1024 * 1024,   # 50 MB per file
    block_extensions=[".exe", ".ps1", ".bat"],
    rename_blocked=True,               # rename instead of hard block
)

report = SafeExtractor(policy).extract(
    Path("artifact.zip"),
    Path("./staging")
)

if report.aborted or report.blocked_count > 0:
    raise RuntimeError(f"Unsafe archive: {report.abort_reason or 'blocked entries'}")

print(report.to_json())

This is how we integrated it with gate, our supply chain scanner — gate validates packages from registries, zipguard safely unpacks them for inspection.

zipguard vs safezip

safezip is a Python library that adds ZipSlip and ZIP bomb protection as a drop-in replacement for zipfile. It’s a solid choice if you’re building an application and want safe extraction in your code.

zipguard has a different focus:

Feature	safezip	zipguard
Interface	Python library	CLI + library
Executable blocking	❌	✅
RTLO detection	❌	✅
Double extension detection	❌	✅
SHA-256 audit log	❌	✅
Rename-vs-block mode	❌	✅
Dry run / JSON output	❌	✅
Human-readable decision table	❌	✅
Atomic writes	✅	✅
ZIP64 consistency checks	✅	✅
ZipSlip / ZIP bomb protection	✅	✅
Recursive nested ZIP extraction	✅	❌
Zero dependencies	✅	✅

Use safezip if you need a lightweight, zero-dependency library embedded in your application. Use zipguard if you’re a security analyst, DevOps engineer, or CI pipeline working with untrusted archives and need full visibility into every decision.

What You Can Do Today

If you work with untrusted ZIPs (malware samples, CTF challenges, vendor deliveries, user uploads):

Install zipguard and replace your current extraction workflow
Use --dry-run --verbose before extracting anything suspicious
Save audit logs with --log for incident documentation

If you run CI/CD pipelines that process build artifacts or downloaded packages:

- name: Safe artifact extraction
  run: |
    pip install git+https://github.com/Mhacker1020/zipguard.git
    zipguard artifact.zip --out ./artifact --format json --log audit.json
    # Pipeline fails (exit 1) if anything is blocked

If you build Python tools that extract archives:

# Replace this:
import zipfile
zipfile.ZipFile("archive.zip").extractall("./output")

# With this:
from zipguard import SafeExtractor
SafeExtractor().extract(Path("archive.zip"), Path("./output"))

Zombie ZIP: How a Malformed Archive Header Blinds 98% of Antivirus Engines — how crafted ZIP headers evade AV scanners, directly relevant to why metadata can’t be trusted
We Built a Supply Chain Scanner — Here’s What We Learned — gate, our supply chain scanner that pairs with zipguard for artifact inspection
The Package You Trusted: How the Axios Supply Chain Attack Happened — why you can’t trust downloaded packages or their contents