LiteLLM: One Interface for 100+ LLMs — And a Cautionary Supply Chain Tale
If you’ve ever had to wrangle multiple LLM providers — switching between OpenAI, Anthropic, Google, and Azure depending on the task — you know the pain. Different SDKs, different response formats, different retry logic, different billing dashboards. It’s a mess.
LiteLLM solves exactly that problem. One interface, 100+ models, and it all looks like OpenAI.
This week it also became the target of a sophisticated supply chain attack that’s worth understanding — not just as a security story, but as a window into how fragile the open-source AI ecosystem really is.
What LiteLLM Actually Does
At its core, LiteLLM is a translation layer. You write code using the standard OpenAI SDK format, and LiteLLM routes it to whatever model you want — GPT-4o, Claude 3.5, Gemini 1.5, Llama 3, Mistral, Bedrock, Groq, Cohere, and 100+ others.
from litellm import completion
# Switch between models with one line change
response = completion(model="gpt-4o", messages=[...])
response = completion(model="claude-3-5-sonnet", messages=[...])
response = completion(model="gemini/gemini-1.5-pro", messages=[...])
Every model returns the same response format. Your application code doesn’t change — only the model string does.
But the real power is in what you get beyond basic routing.
Built-in Reliability
LiteLLM handles retries and fallbacks automatically. If OpenAI goes down or rate-limits you, it can automatically switch to Claude or Gemini. You define the priority order; LiteLLM handles the rest.
response = completion(
model="gpt-4o",
fallbacks=["claude-3-5-sonnet", "gemini/gemini-1.5-pro"],
messages=[...]
)
Cost Tracking and Budget Caps
This is where it gets genuinely useful for teams. LiteLLM tracks spend per user, per team, per project — and lets you enforce hard limits.
Set a $500/month cap per team. Once they hit it, requests stop (or fall back to a cheaper model). No surprise bills. No angry Slack messages from your CFO.
Virtual API Keys
Instead of sharing your master OpenAI key with every developer, LiteLLM issues virtual keys. Each key can have its own model access, rate limits, and budget. Revoke one without touching the others.
Load Balancing
Spread traffic across multiple deployments — Azure East, Azure West, OpenAI direct — with round-robin or latency-based routing. At scale this matters: LiteLLM claims 8ms P95 latency at 1,000 requests per second.
The Proxy Gateway
You can run LiteLLM as a self-hosted proxy server. Your entire team points their OpenAI SDK at http://your-litellm-server instead of https://api.openai.com, and you get centralized logging, cost tracking, guardrails, PII redaction, and caching — without changing a line of application code.
# Run the proxy
litellm --model gpt-4o --port 8000
# Your app now talks to localhost instead of OpenAI
client = OpenAI(base_url="http://localhost:8000", api_key="your-virtual-key")
There’s also an admin dashboard UI for monitoring all of this.
Why This Matters
The “single interface” pattern isn’t new — but LiteLLM has executed it well enough that it’s become infrastructure for thousands of AI applications. With 3.4 million daily PyPI downloads, it’s in a lot of production pipelines.
That ubiquity is exactly what made it a target.
The Supply Chain Attack (March 24, 2026)
Here’s where the story gets dark — and instructive.
On March 24, 2026, two versions of LiteLLM on PyPI (1.82.7 and 1.82.8) were found to contain malicious code. They were live for approximately three hours before PyPI quarantined them.
The attack wasn’t a direct breach of LiteLLM. It was a cascade:
Step 1: Compromise Trivy
Trivy is a popular open-source security scanner. In late February 2026, an attacker submitted a malicious pull request against Trivy’s CI pipeline, exploiting a pull_request_target workflow to exfiltrate credentials from aqua-bot, Trivy’s CI service account.
By March 19, threat actor TeamPCP had rewritten Trivy’s GitHub Action tags to point to a malicious release — meaning any project using aquasecurity/trivy-action in their CI/CD was now running attacker-controlled code.
Step 2: Steal LiteLLM’s PyPI Credentials LiteLLM used Trivy in its CI/CD security scanning workflow. When the pipeline ran with the compromised Trivy action, TeamPCP harvested LiteLLM’s PyPI publishing credentials.
Step 3: Publish Backdoored Packages
With PyPI credentials in hand, TeamPCP published litellm==1.82.7 and 1.82.8 containing a three-stage payload:
- Credential harvester (cloud keys, API tokens, environment variables)
- Encrypted exfiltration to
models.litellm.cloud(a domain registered the day before, March 23) - Persistent backdoor via Python startup hooks (
litellm_init.pth) - Kubernetes worm for lateral movement
Wiz’s head of threat exposure summarized it bluntly: “Trivy gets compromised → LiteLLM gets compromised → credentials from tens of thousands of environments end up in attacker hands → and those credentials lead to the next compromise.”
A recursive security failure: the tool you use to find vulnerabilities becomes the vector for introducing them.
If you’re affected: Versions ≤ 1.82.6 are safe. The malicious versions have been removed from PyPI. If you ran 1.82.7 or 1.82.8, treat your environment as compromised — rotate all API keys, cloud credentials, and secrets that were accessible in that environment.
LiteLLM has paused new releases pending a full supply chain review. Their official post-mortem is at docs.litellm.ai/blog/security-update-march-2026.
The Broader Lesson
This attack illustrates something that’s been true for years but is increasingly dangerous in the AI era: open source supply chains are deeply interconnected, and that interdependence is a liability.
LiteLLM didn’t do anything obviously wrong. They used a widely trusted security tool in their CI/CD. That tool got compromised. Their credentials got stolen. Their package got backdoored.
A few practices that would have helped:
- Pin dependencies in CI/CD — use commit SHAs instead of mutable tags for GitHub Actions
- Separate publishing credentials — use short-lived OIDC tokens for PyPI rather than long-lived API keys
- Monitor PyPI publish events — alert when a new package version is published
- Use Docker images over pip for production — LiteLLM’s official Docker image was not affected because it pins dependencies in
requirements.txt
That last point is worth noting: users running the official LiteLLM Proxy Docker image were protected. The attack targeted pip installs specifically.
Getting Started
If you want to try LiteLLM:
pip install litellm # make sure you're on ≥ 1.82.9 when releases resume
Or run the proxy via Docker (the safer production path):
docker run -e OPENAI_API_KEY=your-key \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-latest \
--model gpt-4o --port 4000
The GitHub repo is github.com/BerriAI/litellm — 20k+ stars, Apache 2.0 licensed.
The irony isn’t lost: a tool designed to make AI infrastructure simpler and more manageable got hit through the exact kind of complexity and trust that security tooling is supposed to guard against. It’s a good reminder that in open source, your security posture is only as strong as the weakest link in your dependency chain — including the tools you use to find weak links.
Use LiteLLM. It’s excellent. Just pin your versions.