SoulSearch v0.3: Ollama Support, Session Memory, and Web Search — Open Source Only

By Prahlad Menon 5 min read

TL;DR: SoulSearch v0.3 brings Ollama support (run local LLMs), session-specific memory (separate from global Git memory), and Brave Search as a proper tool. These features are open-source only — the Chrome Web Store version stays at v0.2 for stability. Get the new features from GitHub.

Video Demo

Watch SoulSearch v0.3 in action — browsing pages, finding certifications, and using the new Side Panel agent mode:

Watch on Google Drive if the embed doesn’t load.


This is a follow-up to our original SoulSearch announcement. If you haven’t read that yet, start there for the full overview.

What’s New in v0.3?

Version 0.3 adds six major features, all focused on privacy, flexibility, and local-first AI:

FeatureWhat It Does
Ollama SupportRun any local LLM — no cloud API required
Session MemoryKeep notes per-session, separate from global
Memory StrategyChoose Truncate (fast) or RLM (compress)
Brave Search ToolAgent can search the web when needed
Separate Agent ModelUse different models for chat vs agent
Separate Agent ProviderMix providers — e.g., Ollama for chat, Claude for agent
Side Panel AgentPersistent agent UI that stays open during page interactions
Tabbed Memory PanelView Session and Global memory separately

Why Open Source Only?

The Chrome Web Store version (v0.2) uses cloud LLMs like Claude and GPT-4. It’s stable, tested, and works out of the box.

The open-source version on GitHub is where we experiment. Ollama support, session memory, and the new agent features require more technical setup — running Ollama, configuring CORS, choosing tool-capable models. We want users who grab these features to know what they’re signing up for.

If you want plug-and-play: Install from the Chrome Web Store.

If you want cutting-edge + local: Clone from GitHub and run the feat/ollama-support branch (soon to merge to main).

How Does Ollama Support Work?

SoulSearch now supports Ollama as a provider. Configure it in Settings:

  1. Provider: Ollama
  2. Ollama URL: http://localhost:11434 (default)
  3. Model: llama3.2, qwen2.5, or any installed model

No API key needed. Your queries stay entirely on your machine.

Important: Start Ollama with CORS enabled for Chrome extensions:

OLLAMA_ORIGINS="chrome-extension://*" ollama serve

For agent mode (browser automation), you need a tool-capable model. Vision models like llama3.2-vision don’t support tool calling. Use llama3.2 or qwen2.5 for agent tasks.

What Is Session-Specific Memory?

Previously, all saved memories went to MEMORY.md — your global, Git-backed memory shared across all sessions.

Now you have two options when saving an AI response:

  • 💾 Session — Saves to this session only. Not synced to Git. Disappears when you delete the session.
  • 🌐 Global — Saves to MEMORY.md. Synced to your Git repo. Available in all sessions.

The memory panel now has tabs:

  • 📝 Session — Shows memory for the current session
  • 🌐 Global — Shows your full MEMORY.md

The session dropdown also shows a 🧠 indicator with memory count (e.g., “Session 1 (5) 🧠3” means 5 messages, 3 saved memories).

How Does the Memory Strategy Setting Work?

When your memory gets too long, it can exceed the model’s context window. v0.3 adds a toggle:

  • Truncate (fast) — Keeps newest memories, drops oldest. No extra API calls. Default.
  • RLM (thorough) — Compresses memory using the LLM before sending. Slower but preserves more context.

The threshold is 6000 characters. Below that, full memory is included. Above that, the strategy kicks in.

How Does Brave Search Work?

If you add a Brave Search API key in Settings, the agent gains a web_search tool.

When you ask about current events, news, or anything not on the current page, the agent can search the web:

User: What's the latest news on the Iran situation?
Agent: [calls web_search("Iran news latest")]
Agent: Here's what I found: ...

No regex pattern matching. The model decides when to search based on your question.

Get a free API key at api.search.brave.com.

What About Vision Models?

Ollama offers vision models like llama3.2-vision that can “see” images. However, these models don’t support tool calling — which agent mode requires.

Solution: v0.3 adds a separate Agent Model setting.

  • Chat Model: llama3.2-vision — for visual understanding
  • Agent Model: llama3.2 — for browser automation with tools

If you leave Agent Model empty, it uses your chat model. If that model doesn’t support tools, you’ll get a clear error message telling you to set a tool-capable agent model.

How Do I Get v0.3?

Clone the repo and checkout the feature branch:

git clone https://github.com/menonpg/soulsearch.git
cd soulsearch
git checkout feat/ollama-support

Load unpacked in Chrome:

  1. Go to chrome://extensions/
  2. Enable Developer Mode
  3. Click “Load unpacked” → select the soulsearch folder

Start Ollama with CORS:

OLLAMA_ORIGINS="chrome-extension://*" ollama serve

Configure in Settings:

  • Provider: Ollama
  • Model: llama3.2 (or your preferred model)
  • Agent Model: llama3.2 (if using a vision model for chat)
  • Brave API Key: (optional, for web search)

Frequently Asked Questions

Does SoulSearch v0.3 work offline?

Yes, with Ollama. Your LLM runs locally, memory is stored locally (and optionally in your Git repo). No internet required except for initial model download.

Which Ollama models support agent mode?

Most text models support tool calling: llama3.2, qwen2.5, mistral. Vision models like llama3.2-vision do NOT support tools. Use a text model for agent tasks.

Is my data sent to any servers?

Only if you choose a cloud provider (Anthropic, OpenAI). With Ollama, everything stays on your machine. Git sync only happens when you explicitly push.

Will v0.3 come to the Chrome Web Store?

Eventually, yes — once it’s battle-tested. For now, we’re keeping the store version stable while the open-source version evolves faster.

Can I use both session and global memory together?

Yes. Session memory is always included in that session’s context. Global memory is included based on your memory strategy setting.

What happened to the regex-based search detection?

Removed. The old version tried to detect search intent with pattern matching (“search for X”, “look up Y”). Now the model decides via proper tool calling.

Links: