Private Package Repositories Part 2: The Influencers

Jan 25 2022

•

9 min read

Written by

In part 1 of our package repositories series, important terms like packages, metadata, dependencies, and upstreams were explained. In this part 2, we will take it further, diving into trends within the software landscape that have changed what developers and organizations want from a package repository.

In recent years we’ve seen a push to use managed services in the cloud, automation, supply chain security. These practices and challenges have influenced what package repositories and package management means in 2021 and what it will mean for the future of software delivery.

Cloud-Native

The movement towards the cloud is one of the most significant changes in computing for organizations over the last ten years. At a minimum, cloud infrastructure and development has created new package formats such as Docker, Terraform, Helm Charts.

Much more than that, developers and organizations don’t just want package management software- they want a ‘managed’ package management service. A managed package management service will eliminate the cost of supporting an in-house system while improving the reliability of accessing those packages that can scale as they grow.

Organizations and developers don’t want to worry about infrastructure, patching, upgrades, replications, or scaling. They want their package repository service to have high availability and be managed and accessed on the cloud securely. In order for a package management solution to exploit the flexibility, scalability, and resilience of cloud computing, it needs to be architectured to be cloud-native.

Automation

Continuous integration and continuous delivery (CI/CD) is a method to frequently deliver builds by introducing automation into the stages of software development. The whole purpose of this is to release quality code faster.

CI achieves continuous flow for code. CD achieves continuous flow for delivery. But what glues them together? Continuous packaging (CP) is the term to describe maximizing process and flow in software packaging using automation. Without CP, CI/CD is missing continuous flow for the process of packaging (creating, fetching, inspecting, and managing packages). CP means that assets are always traceable, deployable, and built in the same way.

The process of packaging includes creating packages, assembling external packages, inspecting/managing artifacts, token creation, downloading, installing artifacts, event logging, and metadata extraction. For CP to work, developers, CI/CD systems, and scanning tools need to be able to interact with the process of packaging easily and programmatically using well-documented APIs, CLIs, and integrations.

Adding CP to your software process avoids the ad hoc construction or retrieval of assets, and gives a traceable and visible history of promotion from the source (developers and external) right through to delivery (whether internal or external).

Distributed Teams

Distributed teams were always quite common in Software Development, but Covid has supercharged its adoption even in small companies.

How does this affect package repositories? Before joining Cloudsmith, I worked in a few distributed teams where I experienced serious lag when pushing/pulling packages compared to my colleagues in other regions. A typical problem would be having a limited number of licenses for our private repository- the private repository might be deployed on servers in the US, but not in Europe. It was frustrating, affected collaboration, and slowed down testing and building.

It’s not acceptable for some teams to experience low latency while other geographically distributed teams have to put up with significant delays. Package repository tools in the past dealt with this by implementing global replications on servers, but this becomes difficult to manage and troubleshoot as the number of regions increases. Package repositories that are cloud-native deal with this problem more elegantly as they can use techniques such as PDNs with edge caching to store commonly used packages as close to the users as possible - anywhere in the world.

Emphasis on Supply Chain Security

The software supply chain (SSC) is all of the steps that go into deploying or distributing your software from the initial development stage, to testing, packaging, and deployment. It includes your code, scripts, environmental variables, IDEs, plugins, source code repositories, CI/CD tools, scanning tools, and of course package repositories. The attack surface for the software supply chain is vast. Recent attacks like SolarWinds and CodeCov, for example, prompted efforts to improve the security of software supply chains. Where you push and pull your software artifacts from is intrinsic to securing the supply chain and it has highlighted the importance of package repositories.

Robust Security

First things first- package repositories need strong security features to prove they are trustworthy:

Robust access control with 2FA for distribution and development
Event logs
High availability
All communication and storage should be encrypted in-transit and at-rest

A Single Source of Truth

Private repositories that support many formats provide one single place to track, manage, distribute and understand all software pulled into your stack. A central trusted store forces you to apply processes and controls to that ingress/egress of software packages.

Provenance of Packages

Package repositories can secure your packages and interrogate the provenance of packages:

Package metadata includes information on dependencies, licenses, versions, who wrote the code, results from vulnerability scans, information from CI tools. Package repositories need to extract, store and surface all of this data as it is intrinsic to resolving the provenance of software packages.
Attest (prove to outside parties) to the provenance of all the software assets and their dependencies, by signing and verifying every package uploaded.
Provide event logs on package usage.
Provide upstreams for outside packages hosted elsewhere to protect from outages from 3rd party repositories
Provide all of the packages needed in a Software Bill of Materials (SBOM)

Automation

Package repositories should promote automation by applying Continuous Packaging (CP) techniques to integrate programmatically with CI, CD, and scanning tools. Automating as much of the software supply chain as possible and making automation easy can significantly reduce the possibility of human error, improve quality, traceability and help make builds more reproducible.

Your package repository can help you build trust in your software supply chain by giving you visibility and control over every single package in your software in an automated way- the single source of truth for all your software artifacts. Even in situations where the supply chain has been compromised, if you have visibility and control, you’ll be in a much better place to identify the who, how, where, why and what of what is affected, plus a much greater potential of fixing the issue or minimizing impact.

Languages with Community-Based Package Management

Before the adoption of community-based package managers, public language repositories, e.g., PEAR for PHP, were slow to include new packages and subject to a review board populated by a few of the language's elder statesmen. Languages with community-based package management, e.g., npm for javascript and PyPI for Python, make publishing and consuming packages easy. This ease of use has made them popular and accelerated the development process and use of OSS, but it has introduced some security issues.

Popular package repositories such as npm, PyPI, RubyGems, Go, and others have been impacted by malicious attacks such as dependency confusion, or typosquatting. In addition, these public repositories that host the packages can’t guarantee uptime; private repositories with upstreams can protect against outages. These issues are related to the previous section on securing the supply chain.

Node and NPM were the first time I had used a community-hosted OSS package repository. When vetting new NPM packages, I was always worried about adding an unmaintained package or code that could damage the wider project- Is it enough to check the git link, license type, date last updated, the number of downloads, and the listed dependencies? Not really. There needs to be a way to trust that a package and its dependencies are not malicious in an automated, reproducible way.

Signing can be used to build up trust in packages but we discussed in Part 1 of this blog series how signing OSS packages has problems. Work is being done to sign OSS packages in a transparent way. But currently, community-hosted OSS packages are not commonly signed. In the absence of using a trusted signed OSS package, package repositories can scan OSS packages for known vulnerabilities and extract metadata information like version, who wrote the code, results from scans, or license information which can provide insight into the provenance of the software package.

Design Patterns

Design patterns such as REST encouraged developing a strong interface for other programs to use over HTTP. RESTful services made using other web services easy and more reliable. Each web service could potentially be written in a different language as long as the interface was maintained.

More recently, the microservices design pattern gave more teams or individual software developers the confidence to use new languages to develop new services within the same product. One of the possible downsides of the microservices design pattern is that it can produce many packages in different formats. Having many package formats is only a downside if your package repository doesn’t support your chosen package type and you need to manage another repository.

Modern package repositories need to be able to manage and host multiple package formats.

What do I want from a Package Repository?

Package repositories had to evolve as software development changed and has been influenced by cloud adoption, DevOps, OSS, changing software practices, new security threats, and the rise of new package formats.

So, what do I want from a Package repository tool? I want it to:

Store all formats of my packages for languages, OSs, and containers.
Allow me to distribute packages to customers
Be easy to automated with and integrate with CI/CD and security tools
Provide strong and intuitive security access controls
Help me attest to the packages in the software supply chain
Have no loss of speed no matter where my team is
Oh, and be simple to use with great docs and support.

To do this I need a package management solution that:

Is entirely cloud-native
Is universal - can host any package around the world
Can work with dependencies located in other repositories and help make what goes into your software more transparent
Applies Continuous packaging techniques to improve your CI/CD pipeline
Is a central, trusted store that forces you to apply process and controls to that ingress/egress of software packages
Is built by a company that values support.

Dependency confusion and trust boundaries in modern builds

Dependency confusion represents a software supply chain vulnerability where package managers incorrectly favor public repositories over internal sources during build processes. Securing modern builds necessitates establishing explicit trust boundaries and utilizing central artifact repositories to eliminate resolution ambiguity…

Artifact management

5 min read

LLMOps vs DevOps: What LLMOps means for artifact management

LLMOps redefines software operations by managing probabilistic AI artifacts like prompts, embeddings, and fine-tuned models, making traceability, reproducibility, and trust essential for modern AI systems…

Artifact management

6 min read

Why cloud migrations are the best time to re-evaluate your artifact management

Cloud migration isn’t just an infrastructure shift, it’s a chance to modernize artifact management, improve DevOps velocity, and secure the software supply chain with a scalable SaaS repository…

Artifact management

2 min read

AI artifacts: The new software supply chain blind spot

As AI moves into production, software supply chains are becoming non-deterministic. From hallucinated dependencies and executable model formats to vulnerable orchestration layers, organizations must rethink security for AI artifacts and LLMOps. This guide outlines the emerging risks, and how to harden your AI supply chain…

Artifact management

8 min read

Access control & permissions for multi-format repositories

Part 3 of our repository structure series explores access control and permissions for multi-format repositories, and how Cloudsmith secures artifacts without slowing developers down…

Artifact management

11 min read

The true cost of legacy artifact management

In this blog post, we’ll break down the hidden cost of legacy artifact repositories, discuss the importance of modernizing through cloud-native artifact management, and demonstrate how you can leave the old infrastructure that has been slowing your software supply chain…

Private Package Repositories Part 2: The Influencers

Cloud-Native

Automation

Distributed Teams

Emphasis on Supply Chain Security

Robust Security

A Single Source of Truth

Provenance of Packages

Automation

Languages with Community-Based Package Management

Design Patterns

What do I want from a Package Repository?

More articles

Dependency confusion and trust boundaries in modern builds

LLMOps vs DevOps: What LLMOps means for artifact management

Why cloud migrations are the best time to re-evaluate your artifact management

AI artifacts: The new software supply chain blind spot

Access control & permissions for multi-format repositories

The true cost of legacy artifact management