apfel: The LLM Was Already on Your Mac — Now You Can Actually Use It

By Prahlad Menon 3 min read

Every Apple Silicon Mac running macOS 26 Tahoe ships with a built-in LLM. It’s the same model that powers Apple Intelligence — Siri, Writing Tools, notification summaries, all of it. Apple provides developer access through the FoundationModels framework, but only for building features within Apple’s ecosystem. There’s no way to just… use it from the terminal.

apfel changes that. It’s a native Swift CLI and OpenAI-compatible server that wraps Apple’s on-device foundation model and exposes it as a first-class tool. All inference runs on your machine. No network calls. No API keys. No per-token billing. The model is already there — apfel just lets you use it.

What It Exposes

CLI — pipe-friendly Unix tool:

# Single prompt
apfel "What is the capital of Austria?"

# Stream output
apfel --stream "Write a haiku about code"

# Pipe input (works with any Unix tool)
echo "Summarize: $(cat README.md)" | apfel

# Attach files
apfel -f README.md "Summarize this project"

# Multiple files — diff review against conventions
git diff HEAD~1 | apfel -f CONVENTIONS.md "Review this diff against our conventions"

# JSON output for scripting
apfel -o json "Translate to German: hello" | jq .content

# Quiet mode for shell scripts
result=$(apfel -q "Capital of France? One word.")

OpenAI-compatible server — drop-in replacement:

apfel --serve
# Listens at localhost:11434

Then any OpenAI SDK works unchanged, pointed at localhost:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="unused")
resp = client.chat.completions.create(
    model="apple-foundationmodel",
    messages=[{"role": "user", "content": "What is 1+1?"}],
)
print(resp.choices[0].message.content)

The same endpoint works with any tool that speaks OpenAI’s API — LangChain, LlamaIndex, Continue, any MCP client that accepts an OpenAI-compatible backend.

Context Management

The 4096-token context window is managed automatically with configurable strategies — useful for long chat sessions:

apfel --chat --context-strategy newest-first    # default: keep recent turns
apfel --chat --context-strategy summarize        # compress old turns via on-device model
apfel --chat --context-strategy sliding-window --context-max-turns 6
apfel --chat --context-strategy strict           # error on overflow, no trimming

The summarize strategy is particularly clever — it uses the on-device model to compress older turns in a long conversation, keeping the context window useful without truncating history. Compression itself runs locally.

Tool Calling

apfel supports function calling with schema conversion and full round-trip support — the same tool-use pattern you’d use with Claude or GPT. This means you can use Apple’s on-device model as the LLM backend for agents that rely on tool calling, entirely locally.

Installation

brew tap Arthur-Ficial/tap
brew install apfel

Build from source requires the macOS 26.4 SDK (ships Swift 6.3). No Xcode required — just Command Line Tools.

Requirements and Honest Limitations

Requirements:

  • Apple Silicon Mac (M1 or later)
  • macOS 26 Tahoe or newer
  • Apple Intelligence enabled in System Settings

This is the main constraint. macOS 26 Tahoe is current as of 2026 but not universal — if you’re on macOS 15 or earlier, this doesn’t apply to you yet. Apple Intelligence also requires a device language/region setting that Apple supports (US English and others, expanding).

The model itself: Apple hasn’t published detailed specs on the foundation model’s capabilities or parameter count. From practical use, it handles summarization, code review, translation, and Q&A well. It’s not Claude Sonnet or GPT-4o — it’s an on-device model optimized for the thermal and power constraints of a MacBook. But it’s capable, it’s fast, and it’s free.

Why This Matters

The pattern here is increasingly common: AI capabilities get shipped to consumer hardware — on-device models on iPhones and Macs, NPUs in Windows Copilot+ PCs, local inference via Ollama — but the API surface for developers is deliberately constrained by the platform vendor.

apfel is in the same lineage as tools that crack open closed-by-default capabilities: Ollama makes local models accessible; apfel makes Apple’s built-in model accessible. The difference is there’s nothing to download — the model is already installed, already running, already trained on Apple’s infrastructure. You’re not adding an AI; you’re accessing one that was already there.

For developers on Apple Silicon who want a zero-cost local inference backend for prototyping, scripting, or running agents that don’t need frontier model capability, this is worth knowing about. It’s already on your machine.

MIT license. GitHub → | Docs →