An Introduction to Conda
At Cloudsmith, our mission is to be the universal package management solution for teams and enterprises. As a result, we continually expand and improve the package formats we support. We are delighted to be able to introduce support for public and private Conda package repositories.
In this article, we will help you to understand:
- What is Conda?
- What is a Conda package?
- How to install and use Conda packages
- Common use cases for Conda packages
- Some valuable resources for Conda
- The difference between Anaconda and Conda
What exactly is Conda?
Conda is a multi-platform open-source package management system. It was initially created to solve package management problems for Python data scientists and is now a popular package manager for Python and R packages, but you can also use it to manage packages for any language. Like all package management systems, Conda helps you create, find, and install packages you need.
Conda is also an environment manager. Suppose you need to use a package that requires a different version of Python. In that case, you can easily set up and switch to a development environment using that specific version of Python without needing to change the version of Python you use in your usual environment.
Conda started as part of the Anaconda Python Distribution but gained popularity on its own for things other than just Python and R package management and was then spun out as a separate project under a BSD licence.
What is a Conda Package?
A Conda package is typically a compressed tarball file (.tar.bz2) or, from Conda version 4.7 onwards, it can also be a .conda file. The .conda file format was introduced in conda 4.7 as a smaller and more performant alternative to a tarball. The tarball or .conda file usually contains:
- System-level libraries.
- Python or other modules.
- Executable programs and other components.
- Metadata under the info/ directory.
- A collection of files that are installed directly into an install prefix.
The structure of a Conda package looks like this:
├── bin │ └── pyflakes ├── info │ ├── LICENSE.txt │ ├── files │ ├── index.json │ ├── paths.json │ └── recipe └── lib └── python3.5
bincontains relevant binaries for the package.
libcontains the relevant library files (e.g. the .py files).
infocontains package metadata.
Where can I get Conda packages?
In common with other package managers, collections of conda packages are available. For other package formats and types, terms like “repository”, “feed”, or “registry” are often used, but for Conda, packages are stored in “channels”.
A channel is a location where Conda packages are stored and are URLs to directories containing Conda packages.
The default channel for Conda is https://repo.anaconda.com/pkgs/, which contains thousands of Conda packages, including those maintained by Anaconda. You can also modify the channels you wish to search for packages, and add alternatives such as Conda-Forge (a community that provides Conda packages for a wide range of software) or even your own private or internal Conda channels, such as a private Cloudsmith repository.
For more information on configuring and managing Conda channels, please see the Conda channel documentation.
Who uses Conda?
The most common users of Conda are those people developing and working with large scale data science, analysis and machine learning applications.
Why? Well, these users often face unique problems with package management that general-purpose package managers do not solve. For example, using multiple different versions of a package.
A core difference between Conda and the standard package manager for Python, pip, is in how any package dependencies are managed. Because Conda is also an environment manager, it can install different versions of any requested packages and their dependencies without conflicting with any existing installed packages.
To put this another way, pip is the general-purpose package manager for Python packages, and it installs packages within any environment. Conda installs packages within Conda environments.
While you can also achieve this isolation of environments via other solutions (such as Python virtualenv), it is standard and built-in when using Conda. So that means less to set up.
How do I install a Conda package?
Once you have Conda installed, you can verify that your installation is working correctly with a command such as
conda info displays all the details of your current Conda installation.
Next, you can create a new Conda environment with the
conda create command like:
conda create --name demo-env
You can then activate this environment with the Conda activate command:
conda activate demo-env
Activating an environment is vital to ensure that any installed Conda packages work correctly and are isolated from other package versions that you may have on your system or in other projects. If environments are not active, specific libraries and packages may not be found and is a common source of errors.
When your environment is active, you can now install Conda packages with the Conda install command. For example:
conda install numpy
This command will install the numpy package and its associated dependencies into the environment. You are now ready to go and use the functionality provided by the package, and the specific version of this package you are using is isolated from other packages outside of this environment.
Conda Communities & Tooling
Several distinct communities utilize the Conda package format. Each of these communities has its own unique use case and tooling, and we've listed some of our favorites below:
PyData is an international community of users and developers of data analysis tools.
Anaconda is the home of the Anaconda Python distribution.
The Anaconda repository is the default channel for conda packages
Conda-Forge is a community-led collection of recipes, build infrastructure and distributions for the conda package manager.
Conda is a versatile package manager, suitable for a wide variety of uses but particularly suited to data analysis, data science and machine learning applications in Python or R. Conda is widely supported and has an active community of users and contributors.
The combination of package management and environment management reduces package version conflicts and enables effective isolation when working on multiple projects or tasks.
If you’re curious about Conda package management, start a free trial with Cloudsmith today. You can open your first repository within a matter of minutes.
Securely host your Conda packages alongside other package formats (in the same single repository! Read more about our multi-format repositories) and don't forget take a look at our detailed Conda documentation.
If you're intrigued to learn more about specific package formats, delve into our series below:
Is Conda a Language?
No - it’s a package format and the name of the associated command-line tool for interacting with conda repositories.
What is the difference between Anaconda and Conda?
Anaconda is a distribution of the Python and R languages with a specific focus on scientific computing applications like data science and machine learning. A software distribution is a pre-configured collection of packages you can use on a system.
Anaconda is a full distribution of the primary software in the PyData ecosystem, and it includes Python and packages for hundreds of the most popular open-source projects. It is developed and maintained by Anaconda, Inc. It is available in both free editions for personal use and Team and Enterprise editions for commercial use.
Conda is a distinct entity. It is the tool you use to interact with Conda repositories, build, push, install packages, and manage your local conda environments.
What is Miniconda?
Miniconda is a bare-bones version of the Anaconda Python Distribution. The Anaconda Python Distribution consists of conda, Python, over 250 conda packages and the Anaconda Navigator (a GUI alternative to the command line tooling); Miniconda just contains conda, Python (and their dependencies) and a small number of additional packages.