Customer stories

DataHub

How DataHub eliminated distribution friction and automated secure software delivery with Cloudsmith. DataHub is the leading open source metadata platform, providing AI-powered discovery and governance solutions for more than 3,000 companies globally.

Profile

    • Founded 2021
    • Cloudsmith customer since 2025
    • Leading open source metadata platform

Industry

    • Technology

Cloudsmith Solution

    • Software distribution

Results

    • Eliminated distribution friction
    • Predictable costs

Executive summary

DataHub, the leading open source metadata platform, faced significant hurdles in distributing its software to a global user base. Utilizing a legacy distribution system resulted in prohibitive costs, rigid access controls, and a fragmented user experience. By migrating to Cloudsmith, DataHub successfully streamlined its software delivery through a unified, fully managed artifact management platform. The implementation of entitlement tokens provided granular control over software access and helped eliminate deployment friction. This transition reduced operational overhead and strengthened the company's security posture through automated image scanning.

Company description

DataHub offers an open source metadata platform designed to unify AI-powered discovery, governance, and observability. The platform serves as a central search engine and organizational map, helping employees at more than 3,000 companies find and trust their internal data. DataHub tracks data lineage, ownership, and usage to ensure organizational accuracy and security. DataHub delivers two products: DataHub Core, a self-managed open source solution, and DataHub Cloud, a fully SOC2-compliant SaaS version used by enterprises and AI-native companies.

The business problem

For an early-stage startup like DataHub, software distribution is a critical business function that directly impacts customer adoption and retention. The company needed to distribute its “Remote Executor” agent, a component that collects metadata locally within a customer's infrastructure, to many enterprise customers.

However, the incumbent software distribution system, Docker Hub, presented a significant financial and operational burden. High per-seat pricing meant that projected growth would eventually make the distribution model unsustainable. To remain fiscally responsible while maintaining a “best in breed” experience, DataHub required a solution that facilitated seamless software updates without cost-prohibitive barriers.

Distributing the Remote Executor in a safe and reliable way was a very important goal.

Esteban Gutierrez

Platform Engineer at DataHub

The technical problem

DataHub originally relied on Docker Hub for software image distribution, but the platform lacked the sophisticated features required for enterprise-grade delivery. The team encountered several technical limitations:

  • Lack of access control: The system lacked entitlement tokens, preventing DataHub from managing specific customer access levels effectively.
  • Insufficient analytics: DataHub could not adequately track usage by software version or individual customer, leaving them with limited visibility into how users consumed their software.
  • Poor user experience: The interface was often confusing for end-users, leading to increased support inquiries and friction during the distribution of the Remote Executor agent.

Facing these obstacles, the engineering team realized they had to either build a custom distribution platform from scratch, which was a massive undertaking, or find a specialized alternative.

We used to have Docker [Hub] as a main mechanism to distribute the Remote Executor, but … Docker [Hub] … did not allow for a large customer base or controls on the artifacts that are being distributed.

Esteban Gutierrez

Platform Engineer at DataHub

The solution

DataHub chose Cloudsmith for its ability to handle complex, global distribution needs through a cloud-native SaaS architecture. The transition focused on several key technical areas:

  • Entitlement tokens: This feature served as the primary differentiator, allowing DataHub to issue unique tokens for granular, per-customer access control to its Remote Executor images.
  • Automated security and SBOMs: The Cloudsmith platform integrated Software Bill of Materials (SBOM) access and automated vulnerability scanning, ensuring that all distributed images were signed and verified.
  • Multi-format support: While DataHub started with containers, Cloudsmith’s support for over 30 package formats provided a future-proof foundation for other software components as DataHub’s distribution needs evolve and grow.
Cloudsmith implemented this feature called entitlement tokens, and that is exactly what we wanted … a simple way to tell our customers how to download the artifacts without adding too much complexity into the … pipelines to deploy the product.

Esteban Gutierrez

Platform Engineer at DataHub

The results

The adoption of Cloudsmith transformed DataHub’s distribution workflow from a point of friction into a strategic advantage.

  • Eliminated deployment friction: Since moving to branded hostnames and Cloudsmith’s global infrastructure, the team received almost no support questions regarding image downloads.
  • Enhanced visibility and control: Entitlement tokens now feed detailed user data back to DataHub, allowing them to track exactly who is using which software version.
  • Strengthened security posture: DataHub now assembles software with a well-understood vulnerability surface. Scanning internal artifacts ensures they do not introduce new attack vectors during the delivery process.
  • Operational efficiency: By choosing a fully managed SaaS solution, DataHub avoided the costs and engineering hours associated with building and maintaining an in-house distribution platform.
[Cloudsmith] has made it much more simple for our customers, which is our primary use case... the main technical challenge which was to distribute artifacts to our customers, primarily the Remote Executor, was resolved.

Esteban Gutierrez

Platform Engineer at DataHub

Next steps

Following the success of the Remote Executor distribution, DataHub plans to further optimize the distribution experience with private broadcasts and a custom portal. Having branded distribution endpoints using its own domain name, will provide a professional and recognizable experience for customers.

DataHub also plans to expand its use of Cloudsmith to manage Chainguard hardened images and libraries internally. Additionally, the team intends to implement automated lifecycle and cleanup policies to manage software versioning in relation to distribution. These controls will allow DataHub to define and enforce formal end-of-life (EOL) guidance, ensuring customers always have access to the most secure and performant versions of the platform.

The goal is to use Cloudsmith as the gold standard for all the artifacts we use to build and distribute the product.

Esteban Gutierrez

Platform Engineer at DataHub

More Customer Stories