🤖 Bridging Two Worlds: How portage-pip-fuse Brings 750,000+ Python Packages to Gentoo

Choose your reading length

Every Gentoo user who works with Python knows the friction: a package exists on PyPI but not in ::gentoo, so you either write an ebuild or reach for pip install --user and accept that Portage will never know about it. I got tired of this and built something to fix it.

portage-pip-fuse is a FUSE-based virtual filesystem that presents the entire PyPI ecosystem as a Portage overlay. Mount it, and emerge dev-python/requests works — for any of the 750,000+ packages on PyPI that ship source distributions. Ebuilds are generated on the fly from PyPI metadata, either by querying the API directly or from an optional local SQLite database. Nothing is stored on disk. The overlay looks like any other Portage repository, but its contents are computed at the moment Portage requests them.

This is also a security advantage. When pip installs a package, the build runs with your full privileges and network access. When Portage installs the same package, its sandbox constrains the build: isolated chroot, controlled filesystem, blocked network. The isolation the security community is asking for in Python packaging already exists in Portage.

The tool includes a pip subcommand that translates pip syntax to emerge, a virtual/{project} ebuild generator for managing project dependencies without installing the project itself, and a .sys/ virtual control plane for declaratively patching generated ebuilds — fixing compatibility issues, adjusting dependencies, configuring builds — without maintaining a local overlay.

Source code at github.com/Miriup/portage-pip-fuse. GPL-2.0, alpha-stage, contributions welcome.

Ebuilds that don’t exist until you look

Rather than pre-generating ebuilds — the approach taken by g-sorcery and gs-pypi, which doesn’t scale — portage-pip-fuse generates them at the moment Portage requests them, through a FUSE virtual filesystem mounted at /var/db/repos/pypi. The overlay looks and behaves like any other Portage repository, but its contents are computed, not stored.

By default, the tool queries PyPI’s JSON API directly. On a reasonably fast connection, Portage operates over the full 750,000+ packages without any local database. For faster lookups or offline use, you can optionally download PyPI’s daily SQLite metadata dump (~1 GB compressed) — but this is an optimisation, not a requirement.

The tool filters out packages that ship only binary wheels (Gentoo builds from source) and packages with no overlap with Portage’s allowed PYTHON_TARGETS. Everything else is translated on the fly: PYTHON_COMPAT from classifiers, dependencies mapped to Gentoo atoms, version numbers converted, Manifest files generated with PyPI checksums.

Security through Portage’s sandbox

This is also a security story. Installation time is the critical vulnerability in the PyPI supply chain — setup.py execution, PEP 517 build backends, even metadata extraction can trigger arbitrary code. When you pip install a package, pip runs the build with your user’s full privileges and network access. When you emerge that same package through portage-pip-fuse, Portage’s sandbox constrains what the build process can do: isolated chroot, controlled filesystem access, blocked network. The isolation the security community is asking for in Python packaging already exists in Portage.

Developer workflow with virtual/ packages

The pip subcommand translates pip syntax to emerge commands. For requirements files, it creates a virtual/{project} ebuild in a separate overlay — emerge virtual/odoo installs all of Odoo’s dependencies through Portage while Odoo itself stays out of the system. You work on the application in your development tree with all dependencies satisfied system-wide. Only the dependency footprint gets installed; the application you’re developing never does.

Declarative patching through .sys/

The mounted filesystem exposes a .sys/ directory for modifying generated ebuilds at runtime. Need to remove Python 3.13 compatibility from a package? Fix a dependency conflict caused by Gentoo revision bumps? Configure a package to use system libraries instead of bundled ones? Write to the appropriate path under .sys/ and the generated ebuild changes accordingly — no local overlay, no forked ebuilds, no waiting for upstream.

This gives you the same kind of declarative control over PyPI packages that /etc/portage/patches/ gives you over Portage packages. The modifications are transparent, reproducible, and version-controllable.

The source code is at github.com/Miriup/portage-pip-fuse. GPL-2.0, alpha-stage, contributions welcome.

The core idea: ebuilds that don’t exist until you look

The predecessor tools — g-sorcery and gs-pypi — attempted this bridge by generating ebuilds and storing them on disk. That approach doesn’t scale. You cannot pre-generate ebuilds for three quarters of a million packages and keep them current.

portage-pip-fuse takes a different path. It generates ebuilds at the moment Portage requests them, through a FUSE virtual filesystem mounted at /var/db/repos/pypi. When Portage reads a directory like dev-python/requests/, the filesystem constructs the ebuild files from PyPI metadata on the fly. When Portage moves on, nothing persists on disk. The overlay looks and behaves like any other Portage repository, but its contents are computed, not stored.

The default mode requires no preparation at all. When Portage requests a package, portage-pip-fuse queries PyPI’s JSON API directly, fetches the metadata, and generates the ebuild on the spot. I developed this on a server with a reasonably fast internet connection, and it works well — Portage operates over the full collection of 750,000+ packages without any local database.

For systems where you want faster lookups or offline capability, there is a second mode. PyPI publishes its entire metadata catalogue as a daily SQLite dump through the pypi-data project — roughly 1 GB compressed, 10 GB uncompressed. Download it once with portage-pip-fuse sync, and all lookups become local. But this is an optimisation, not a requirement. The tool works out of the box with nothing but a network connection.

Not everything on PyPI makes sense on Gentoo

Gentoo builds from source. A large portion of PyPI packages ship only binary wheels — precompiled artifacts that bypass the build process entirely. portage-pip-fuse filters these out, showing only packages with source distributions. It further filters out packages that have no overlap at all with the Python versions allowed by Portage’s PYTHON_TARGETS — if a package supports none of the Python versions Portage recognises, there is no point showing it.

The dynamic ebuild generation handles the translation work between PyPI’s conventions and Gentoo’s: PYTHON_COMPAT declarations derived from package classifiers, dependency atoms mapped from PyPI’s requirement syntax to Portage’s, version numbers converted (PyPI’s 2.0a1 becomes Gentoo’s 2.0_alpha1), and Manifest files with checksums sourced from PyPI’s API.

Why this is also a security story

I recently published a taxonomy of over 100 attack vectors against the Python/PyPI supply chain. One structural insight from that research is that installation time is the critical vulnerability — setup.py execution, PEP 517 build backends, even metadata extraction can trigger arbitrary code. A single source distribution anywhere in the dependency graph is sufficient.

This is where Portage’s architecture becomes a security advantage. Portage already builds packages in a sandboxed environment with FEATURES="sandbox usersandbox network-sandbox". The build happens in an isolated chroot with controlled filesystem access and blocked network. When you pip install a package, pip runs the build with your user’s full privileges and network access. When you emerge that same package through portage-pip-fuse, Portage’s sandbox constrains what the build process can do. The isolation that the security community is asking for in Python packaging already exists in Portage — it has existed for over two decades.

This doesn’t eliminate all risk. The attack vectors article documents threats that survive sandboxing: side-channel attacks, data-only exploits, kernel bypass techniques. But it does address the most common class of PyPI malware — packages that phone home during installation, exfiltrate credentials, or establish persistence. Portage’s network sandbox stops these cold.

The pip command translator

One of the features I’m most pleased with is the pip subcommand. Python documentation and tutorials universally use pip install syntax. portage-pip-fuse translates these commands into their Portage equivalents:

portage-pip-fuse pip install django flask
# → emerge --ask dev-python/django dev-python/flask

portage-pip-fuse pip install "django>=4.0" "celery~=5.3.0"
# → emerge --ask ">=dev-python/django-4.0" ">=dev-python/celery-5.3.0"

The more powerful case is requirements files. When you pass -r requirements.txt, the tool creates an actual ebuild in a separate “manual” overlay as virtual/{project} — for example, virtual/odoo. This is a deliberate design choice: Portage sets don’t support the full range of dependency declarations you need in practice, like USE dependencies. A real ebuild does. The generated virtual package pulls in all your project’s Python dependencies with proper version constraints, USE flag requirements, and dependency atoms.

The purpose is a clean developer workflow: emerge virtual/odoo installs all of Odoo’s dependencies through Portage — sandboxed, tracked, updateable — while Odoo itself stays out of the system. You work on it in your development tree, with all its dependencies satisfied system-wide. The application you’re developing never gets installed into Portage; only its dependency footprint does.

Declarative patching through .sys/

This is where portage-pip-fuse goes beyond convenience into something I consider genuinely important: a declarative interface for modifying how PyPI packages are built, without maintaining a local overlay.

The mounted filesystem exposes a .sys/ directory — a virtual control plane. Writing to specific paths under .sys/ modifies the ebuilds that portage-pip-fuse generates, without persisting any ebuilds on disk. The modifications themselves live in the .sys/ namespace and can be version-controlled, shared, and reasoned about.

For example, when psycopg2-2.9.5 fails to build with Python 3.13 because of removed C API functions:

echo '-- python3_13' > /var/db/repos/pypi/.sys/python-compat-patch/dev-python/psycopg/2.9.5.patch

When a dependency conflict arises because PyPI specifies exact versions but Gentoo has revision bumps:

echo '-> =dev-python/httpx-0.28.1[${PYTHON_USEDEP}] >=dev-python/httpx-0.28.1[${PYTHON_USEDEP}]' 
  > /var/db/repos/pypi/.sys/RDEPEND-patch/dev-python/open-webui/0.8.5.patch

When a package bundles libraries that Gentoo provides system-wide and you want to use the system versions:

touch /var/db/repos/pypi/.sys/iuse/dev-python/gevent/_all/embed_cares
touch '/var/db/repos/pypi/.sys/DEPEND/dev-python/gevent/_all/net-dns::c-ares'
echo 'export GEVENTSETUP_EMBED_CARES=0' > 
  /var/db/repos/pypi/.sys/ebuild-append/dev-python/gevent/_all/src_configure

The .sys/ interface covers dependency modification (RDEPEND, DEPEND), Python version compatibility, USE flags, custom ebuild phase functions, and PEP 517 backend overrides. Everything is addressable by package, version (or _all for all versions), and concern.

This matters because it gives you the same kind of declarative control over PyPI packages that /etc/portage/patches/ gives you over Portage packages. You can fix things without forking, without maintaining overlay ebuilds, and without waiting for upstream. The modifications are transparent and reproducible.

Practical setup

Getting started takes a few commands:

git clone https://github.com/Miriup/portage-pip-fuse
cd portage-pip-fuse
pip install -e .

sudo portage-pip-fuse install    # creates repos.conf entry
portage-pip-fuse mount           # mounts the virtual overlay

emerge -av dev-python/requests   # done

That’s it. No sync step required — the tool queries PyPI directly. If you want local lookups, run portage-pip-fuse sync to download the SQLite database. For memory-constrained systems, the sync process can be split: download the compressed database (~1 GB), mount an overlayfs with tmpfs on top, then decompress into the ephemeral layer.

What I built this with and why it matters

I wrote portage-pip-fuse in one week with Claude Code. The project has 99 commits and covers FUSE filesystem operations, SQLite query optimisation, PyPI metadata parsing, Portage ebuild generation, dependency resolution mapping, and a complete CLI with subcommands for mount, sync, install, and pip translation.

I mention this not as an aside. The intersection of deep system knowledge and AI-assisted development is exactly the kind of capability I work with. Understanding Portage internals well enough to generate correct ebuilds, understanding PyPI’s metadata model well enough to map it, and understanding FUSE well enough to make the filesystem reliable — that knowledge had to be there. Claude Code accelerated the implementation, but the architecture came from two decades of Gentoo experience.

The source code is at github.com/Miriup/portage-pip-fuse. It’s GPL-2.0, alpha-stage, and contributions are welcome.

🤖 Bridging Two Worlds: How portage-pip-fuse Brings 750,000+ Python Packages to Gentoo

Further reading

Ebuilds that don’t exist until you look

Security through Portage’s sandbox

Developer workflow with virtual/ packages

Declarative patching through .sys/

Further reading

The core idea: ebuilds that don’t exist until you look

Not everything on PyPI makes sense on Gentoo

Why this is also a security story

The pip command translator

Declarative patching through .sys/

Practical setup

What I built this with and why it matters

Further reading

Leave a ReplyCancel Reply