PaperBanana: Google's Multi-Agent System for Publication-Ready Academic Illustrations

By Prahlad Menon 3 min read

If you’ve ever spent hours wrestling with Figma, TikZ, or PowerPoint trying to create that perfect methodology diagram for a paper submission, Google’s new tool might change your workflow entirely.

The Illustration Bottleneck

Despite the explosion of AI tools that can help researchers write, review, and even generate code, one critical bottleneck has remained stubbornly manual: creating publication-ready figures.

Every researcher knows the pain:

  • Spending 3-4 hours on a single architecture diagram
  • Iterating endlessly on colors, spacing, and alignment
  • Realizing the camera-ready deadline is tomorrow and your figures still look amateur

PaperBanana tackles this head-on with a multi-agent approach that transforms methodology descriptions into polished academic illustrations.

How It Works: The Agent Orchestra

PaperBanana isn’t a single model—it’s an orchestrated system of specialized agents, each handling a distinct part of the illustration pipeline:

  1. Reference Retrieval Agent: Searches for relevant diagram examples from existing literature to establish style baselines
  2. Content Planning Agent: Structures the visual hierarchy and determines what elements need to appear
  3. Style Planning Agent: Decides on layout, color schemes, and visual consistency
  4. Image Generation Agent: Renders the actual illustration using state-of-the-art VLMs and image generation models
  5. Critique Agent: Reviews the output and triggers refinement iterations

This division of labor mirrors how a human designer might approach the task—but executes in minutes rather than hours.

PaperBananaBench: A Rigorous Evaluation

The Google team didn’t just release a tool—they created a proper benchmark to evaluate it. PaperBananaBench includes:

  • 292 test cases for methodology diagrams
  • Curated from NeurIPS 2025 publications
  • Covering diverse research domains and illustration styles

In blind evaluations, human reviewers preferred PaperBanana outputs 75% of the time compared to leading baselines. The framework excelled across four key dimensions:

MetricWhat It Measures
FaithfulnessDoes the diagram accurately represent the methodology?
ConcisenessIs it free of unnecessary visual clutter?
ReadabilityCan viewers quickly understand the flow?
AestheticsDoes it look publication-ready?

Beyond Methodology Diagrams

While architecture diagrams are the headline feature, PaperBanana also handles statistical plots—bar charts, line graphs, confusion matrices, and the visualizations that fill results sections. This suggests potential for end-to-end figure generation across an entire paper.

Practical Implications

For researchers, the implications are significant:

Time savings: What currently takes hours could take minutes. More time for actual research, less for figure formatting.

Democratization: Researchers without design skills can produce professional-quality figures. No more apologizing for ugly diagrams in otherwise strong papers.

Consistency: Multi-figure papers often suffer from inconsistent styling. An agentic system can enforce visual coherence automatically.

Iteration speed: Quick regeneration means you can actually explore different visual approaches rather than committing to the first thing that works.

The Catch?

The paper is available on arXiv, but there’s no public release or API yet. Given Google’s track record, this could remain a research contribution rather than a shipped product—or it could eventually surface in Google’s AI Studio or Colab ecosystem.

Read the Paper

The full 49-page PDF dives deep into the agent architecture, ablation studies, and failure case analysis:

📄 arXiv: arxiv.org/abs/2601.23265


The Menon Lab tracks emerging AI tools for researchers and builders. For more coverage of academic AI tools, check out our posts on AutoFigure and Paper-QA.