Private Package Repositories Part 1: What’s a package again?

Package repositories were never something I thought about as a developer unless something didn’t work. For example, if it was slow, wouldn’t connect, wouldn’t install, or was overly complicated to configure. Mostly I wanted something I barely noticed. Something simple and easy to use.

When I started my career, I was developing Windows apps in the late 2000s. My packages were the exes and dlls produced by my lovely big Visual Studio IDE. Our dependencies were either official windows libraries or proprietary 3rd party licensed libraries. The exes and dlls were stored in folders on an FTP server and could be accessed using Microsoft access controls- so I suppose the FTP server was my package repository. The software ran on on-premise hardware, and the users were an in-house team that I knew well. I owned the application and the entire software workflow.

That software development workflow is very different from a more modern workflow where:

I work on a small part of a bigger project which uses multiple frameworks, languages, and containers.
My code is dependent on open source software (OSS) hosted externally.
Multiple developers work on the same codebase simultaneously and make frequent commits to the code repository.
Code could be built, scanned, and deployed many times a day via a CI/CD pipeline
The output of the build (the package) is deployed to the cloud
Distribution of the software package to customers is tracked and controlled

As a developer, I want my package repository:

To store my packages somewhere securely
To have one place to view and control all my software packages in all formats- a single source of truth
To push/pull packages super fast for all regions my teams are based
To handle distributing my software to customers.
To be able to sign my packages and have that managed for me
To integrate easily with my other build and security tools in an automated way.
Something simple and easy to use with extensive documents
Good support when something goes wrong

This was meant to be one blog- but it turns out package management is a dark horse and needed two blogs!

This blog will give a few explanations that we in the package management industry throw around. Our next article on the topic: Private Package Repositories Part 2: The Influencers will go into the things that have influenced package management.

What is a Software Package?

A software package, artifact, or image is the output of building software- it groups together files containing your software along with the metadata about the software and dependencies in a well-defined format. Packages are typically versioned to provide a better and more manageable understanding of what software is being deployed.

Packages promote the reuse of code as it can be dropped into another application and used easily. Packages are created using a package manager and are usually stored in a repository, like Cloudsmith. The table below details some common packages.

Package Metadata

Package metadata describes the package with information about the author, repository location, repository version, file type, license, package dependencies, and more. Metadata can also have information about the CI/CD build like who triggered the build, the build time, approval information, vulnerability information, or user-created metadata.

What is a Package Manager?

A package manager is software that creates, uploads, installs, upgrades, and configures software packages for a language, container, or OS. Every package type gets its own unique package manager: Debian’s RPMs use apt-get, Node’s packages use NPM, Python’s packages use Pip, etc.

Some packages have more than one Package Manager to choose from. For example, .NET’s NuGet packages can use Chocolately or the native NuGet package manager. Similarly, Java’s Maven packages can use the native Maven package manager, Gradle or Ivy. Some package managers are more stable, easier to use, have faster build times, or have access to different packages available in their repositories.

What is a Package Repository?

A package repository, registry, or feed is a place to store all of your packages.

Package repositories work closely with package managers, and the terms can get mangled when talking about software tools, like Cloudsmith or JFrog’s Artifactory, that support most software packages. The terms become mingled because these tools have the functionality of a package manager to upload, download and configure packages, and they also host all the packages on repositories.

It is quite a task to provide support for many formats as every language/OS has its unique package manager.

Dependencies and Dependency resolution

If a package uses another package, that package is called a dependency. Almost every project uses third-party packages as libraries and/or frameworks.

Resolving dependencies in a package is no joke- specifying and resolving the dependencies and relationships between libraries and packages is one of those NP-hard problems. Version constraints on packages mean package managers have to solve a problem equivalent to SAT solving.

On top of dependency resolution being a complex problem to solve, different packages have different ways to resolve conflicts or missing dependencies and some package types have deep dependency trees (I’m looking at you NPM). All in all, dependency solving is a toughie.

What are Multiformat Repositories?

Multi-format repositories allow you to store packages of different types in one repository. Many package managers don’t let you store different packages in the same repository. Multi-format repositories are especially useful if your tech stack uses multiple languages and containers - read more in our article on why modern tech stacks need multi-format repositories.

Multiformat Repositories means fewer repositories to manage- I think that is a good thing.

Public vs. Private Package Repositories

Many languages and containers provide a Public repository to host your packages. NPM, for example, provides npm public registry, and Python provides the PyPI repository.

Publicly available packages have made it so much easier to use Open Source Software (OSS) and have changed how software is built and deployed forever.

The benefits of OSS for organizations are numerous. Still, community-controlled public repositories cannot guarantee availability, bring an increased risk of introducing security vulnerabilities, generally only host one package format, and cannot control who downloads your package.

Many organizations need a private repository for their packages for security, compliance, availability, or reliability. On top of that, private repositories provide additional features required by enterprises, such as:

Single Sign-On
Custom Domains
Access Controls for Teams and Entitlement tokens
Multiformat packaging
Software Distribution
Logging and metric data
Integration with CI/CD and Security tools
Upstreams
Tech Support
Service Level Agreements guaranteeing uptimes

What are Upstreams?

Package Upstreams allow users to consume packages hosted elsewhere from public repositories like Maven Central, PyPI, NuGet.org, npmjs.com, or Debian’s package registry. When a repository has an upstream configured, the service regularly checks for new upstream packages and stores them in the private repository

The rules around the order and precedence of what repositories to search and what packages to select will determine what packages are used. Generally, packages in the repository itself supersede packages from the upstream.

Upstreams allow dependencies to be isolated from untrusted 3rd party sources to protect you from outages and slowness of external services.

Signing Packages

The whole point of signing a package is to be able to trust that a package is safe to download or use as a dependency.

Many software organizations use their own GPG/RSA key for signing their metadata and packages which are usually managed by the private repository. Signing a package with your organization’s key lets Developers know that this package was written and approved by your organization.

Lately, the software community is coming to grips with how OSS software can be the source of entry in supply chain attacks. It’s a lot harder to trust code that was not written and signed by your organization. Signing OSS packages can help but even if an OSS package is signed it is not clear if you should trust it.

In the absence of using a trusted signed OSS package, package repositories can scan OSS packages for known vulnerabilities and extract metadata information like version, who wrote the code, results from scans, or license information which can provide insight into the provenance of the software package.

Recently, CNCF’s Sigstore Project has been working to improve trust in OSS software by improving transparency and simplifying the signing process by providing a service for package signing similar to Let’s Encrypt’s service to enable HTTPS.

Private package repositories need to be able to manage signing packages and maintain keys, work with a package’s native signing tools, and work towards integrating with new tools to sign and trust OSS packages.

SBOM and SPDX

The Software Bill of Materials (SBOM) is essentially a list of all components, including licenses and dependencies contained in a software product. Most software includes dependencies sourced from the open-source community or commercial software. New regulations have come in to publish the SBOM of software used by US government departments.

Related read: What is an SBOM

The end-user of software can use the SBOM to perform vulnerability and license analysis of their software packages which can evaluate risk in a software product.

The Software Package Data Exchange (SPDX) is an open standard for communicating the SBOM. SPDX has become internationally recognized as the standard for communicating the SBOM. SPDX is integral to generating an SBOM that can be easily shared and automated.

Explore the current state of software supply chains, and why Continuous Packaging is critical to secure CI/CD pipelines in our newly released report with O'Reilly.

📚

Download the O'Reilly report for free: The Rise of Continuous Packaging

Package License Compliance

A software license agreement is a legal document chosen by a software company or developer on how a user can use the software and should be included in the package. There are many software licenses with different legal terms, support agreements, limitations, and costs.

Most software licences fall into two groups either:

Free and Open Source Software (FOSS), e.g., GNU General Public Licence (GPL), Apache, BSD, and MIT
Proprietary software License, e.g., EULA.

A package manager should match the license defined within a package's metadata as accurately as possible. For example, the BSD license specified within this package's metadata is checked against a valid SPDX license. Adding this license reporting functionality gives Developers more visibility, control, and management across all aspects of your package management.

Software Distribution

Software vendors can use private package repositories to distribute their software. Private repositories that can distribute software packages eliminate the need to rehost the package elsewhere for customers and all the management associated with that.

Software vendors distributing software may need their private repository to:

Provide a reliable and fast software package distribution.
Control who downloads your package. Cloudsmith does this with entitlement tokens.
Gather metrics on downloads.
Manage license agreements.
Create custom domains.
Provide Service Level Agreements (SLAs) to guarantee service levels.

Package Delivery Network

A content delivery network (CDN) refers to a geographically distributed group of servers that work together to provide fast delivery of Internet content.

At Cloudsmith, we developed what we call the Package Delivery Network, or PDN. It’s like a highly customized CDN that knows that it deals with packages, package authentication, and client package management tooling. It helps deliver packages faster to distributed users.

Package Repositories

Package management was a lot simpler at the start of my tech career when it was just me, an exe, and an FTP server!

Package management has always been complex, even when dealing with a more straightforward landscape of a small number of dependent packages dealing with only one package format. Modern package repositories need to host many formats, deal with complicated dependencies from many feeds while dealing with the problems of scaling, distribution, and security.

Check out part 2 of this blog below, where we delve into trends in the software landscape that have changed what developers and organizations want from a package repository.