
Extend EPM policies to HuggingFace artifacts

By now it’s clear the use of GenAI tooling like Cursor and Claude has fundamentally changed how code is written. This shift, which we explored in depth in our previous post, moves the security perimeter beyond just the generated code. Today, every engineering team building an AI-native application must equally prioritise securing the AI models and LLMs they pull into their tech stack. This forces us to rethink supply chain security for the AI era.
This post will dive deep into a concrete solution: leveraging Cloudsmith's Enterprise Policy Manager (EPM) to enforce rigorous security and governance policies on LLMs and datasets sourced from the Hugging Face Hub and consumed via Cloudsmith.
A Custom Data Model for Hugging Face
Effective governance hinges on detailed metadata. Cloudsmith addresses this by providing a customised data model specifically for Hugging Face models and datasets. This is the crucial enabler, allowing developers and SecOps teams to write granular policies in Rego that target attributes unique to this package type.
The following examples showcase how EPM policies can be constructed to tackle common, yet critical, supply chain risks associated with adopting LLMs.
| Policy Name | Security Objective | Rego |
|---|---|---|
| Whitelist of trusted publishers | Isolates risk by enforcing a pre-approved list of model creators (eg: NVIDIA, Microsoft) | https://github.com/cloudsmith-io/rego-recipes/blob/main/huggingface-recipes/trusted_publishers.rego |
| Block models with unsafe files | Quarantines models based on aggregated security scan data from Hugging Face Hub (Clam AV, Protect AI, etc.) | https://github.com/cloudsmith-io/rego-recipes/blob/main/huggingface-recipes/security_scan.rego |
| Policy based on model card data | Enables governance based on model provenance (eg: blocking models trained on a specific, prohibited dataset) | https://github.com/cloudsmith-io/rego-recipes/blob/main/huggingface-recipes/model_card.rego |
| Block models with risky file formats | Mitigates deserialisation attacks by quarantining models containing inherently risky formats like pickle. | https://github.com/cloudsmith-io/rego-recipes/blob/main/huggingface-recipes/risky_files.rego |
Whitelisting trusted publishers to enforce provenance
For mature organisations, time-to-market often depends on leveraging high-quality, pre-vetted artifacts from established entities like NVIDIA, Microsoft, or Apple. Instead of subjecting these known-good sources to redundant policy checks, you can use an EPM policy to create a allowlist.
The Strategy:
- Create a terminal policy with the highest precedence (
precedence: 0). - If the model's publisher is on the approved list, the policy immediately sets the package state to '
AVAILABLE' and applies a 'trusted-publisher' tag package action. All subsequent policies are bypassed, dramatically streamlining ingestion for high-confidence artifacts.
Below is the trusted_publishers.rego and the associated policy payload for the implementation:
wget https://raw.githubusercontent.com/cloudsmith-io/rego-recipes/refs/heads/main/huggingface-recipes/trusted_publishers.rego
escaped_policy=$(jq -Rs . < trusted_publishers.rego)
cat <<EOF > payload.json
{
"name": "Huggingface Trusted Publishers",
"description": "A whitelist for models & datasets from trusted publishers on Hugging Face Hub.",
"rego": $escaped_policy,
"enabled": true,
"is_terminal": true,
"precedence": 0
}
EOFNote: The trusted_publishers.rego policy primarily targets packages pulled via a Hugging Face upstream and ignores packages that are pushed directly into Cloudsmith. This ensures reliable traceability, as packages pushed directly to Cloudsmith lack the verified publisher metadata from the Hub.
Blocking Models with Unsafe Files via Security Scans
The concept of a security vulnerability extends beyond traditional CVEs when dealing with LLMs. Models can contain malicious code or file formats. Hugging Face Hub has mitigated this by providing public security data powered by tools like Clam AV, Pickle Scan, and Protect AI.
Cloudsmith captures this crucial upstream security data and exposes it to EPM. This allows you to write a policy that enforces a zero-tolerance stance on models flagged as unsafe by any of the integrated scanners.
Download the security_scan.rego and create the associated payload.json with the below command:
wget https://raw.githubusercontent.com/cloudsmith-io/rego-recipes/refs/heads/main/huggingface-recipes/security_scan.rego
escaped_policy=$(jq -Rs . < security_scan.rego)
cat <<EOF > payload.json
{
"name": "Huggingface Hub Security Scan",
"description": "Match models & datasets where the security scan data from Hugging Face Hub indicates unsafe content.",
"rego": $escaped_policy,
"enabled": true,
"is_terminal": false,
"precedence": 3
}
EOFAfter the policy has been created, associate an Action with the Policy to SetPackageState to QUARANTINE.
If a package matches the policy, you can use the decision logs to view the detailed results of the security scan for the package. The decision log will contain the full output of each scanner from Hugging Face.
Policy based on Model Card data
Hugging Face models and datasets come with Model Cards. Think of Model Cards as the Software Bill of Materials (SBOM) for LLMs. It provides essential metadata about the model’s intended use, limitations, and, critically, its training data provenance. To confirm, Model Cards are metadata created by the publisher of a model itself to provide better documentation and transparency about the characteristics of the model. Cloudsmith is able to parse this information and expose it to EPM so you can write policies with this data.
Model Cards currently come in two forms.
One for datasets and one for models.
In EPM's Open API spec, the types PolicyHuggingfaceModelCard and PolicyHuggingfaceDatasetCard describe what data these can contain.
As an example, Hugging Face publishes a language model called SmolLM-135M. If you visit the README.md of the model, you will see a metadata section encoded in YAML that states the model was trained on the dataset HuggingFaceTB/smollm-corpus. Perhaps you wish to block models that use this training set.
Imagine an organisation needs to block any model trained using the smollm-corpus dataset due to licensing or compliance concerns. The below model_card.rego policy uses the data exposed in the PolicyHuggingfaceModelCard (or PolicyHuggingfaceDatasetCard) type to look for that specific training set reference and quarantines the package if a match is found.
wget https://raw.githubusercontent.com/cloudsmith-io/rego-recipes/refs/heads/main/huggingface-recipes/model_card.rego
escaped_policy=$(jq -Rs . < model_card.rego)
cat <<EOF > payload.json
{
"name": "Huggingface Hub Model Card Training Set",
"description": "Prohibit models trained with a smollm-corpus dataset.",
"rego": $escaped_policy,
"enabled": true,
"is_terminal": false,
"precedence": 2
}
EOFIn short, this EPM policy example prohibits the use of SmolLM-135M because its Model Card metadata references a prohibited training dataset.
Mitigating serialisation attacks by blocking risky file formats
This is arguably the most critical security policy for LLM adoption. Many of the file formats used by models and datasets suffer from serialisation attacks. For example, Pickle is a popular file format used in Hugging Face models that has well-known exploits. Further, some formats, such as Keras can be securely deserialised but can come with embedded code extensions (eg: Keras Lambda layer) that allow for arbitrary code execution. Alternative, safer model file-formats have been developed, such as safetensors from Hugging Face and ONNX that do not suffer from these attacks. For background on these attack vectors see here.
The following risky_files.rego policy will match models coming from Hugging Face Hub that contain risky formats, such as pickle-based formats or other files such as zips, pytorch, keras, and tensorflow h5 models. After the policy has been created, associate an Action with the Policy to SetPackageState to QUARANTINE.
wget https://raw.githubusercontent.com/cloudsmith-io/rego-recipes/refs/heads/main/huggingface-recipes/risky_files.rego
escaped_policy=$(jq -Rs . < risky_files.rego)
cat <<EOF > payload.json
{
"name": "Huggingface Hub Prohibited Formats",
"description": "Prohibit models with risky file formats.",
"rego": $escaped_policy,
"enabled": true,
"is_terminal": false,
"precedence": 1
}
EOFThe shift to AI-native development necessitates a robust approach to software artifact governance. By combining EPM with the rich metadata of the Hugging Face ecosystem, engineering, security, and operations teams can implement fine-grained, automated, and declarative policies (via Rego) to secure the AI supply chain. This integration ensures that only trusted, compliant, and safe models reach production.
To start building your EPM policies in Cloudsmith, explore our easy-to-follow Rego Cookbook, which provides relevant copy-and-paste code snippets for policy-as-code design. Alternatively, we provide a bunch of other useful reports and guides for understanding the state of modern software artifact security.
More articles


Securing the intersection of AI models and software supply chains

Extend EPM policies to HuggingFace artifacts
By submitting this form, you agree to our privacy policy
