Google IO 2025 Update for Gemma 3n - The Next-Gen On-Device AI Model

May 29, 2025

首先感謝 Google Developer Expert program 的贊助，讓我能夠前往現場，參與 Google IO 2025，並與產品開發團隊進行深入交流，了解更多更新與產品回饋。

前一篇介紹了 GenAI SDK 的更新，包括新增 Java 支援，以及如何呼叫 Gemini-2.5-flash-preview-05-20。

這一篇介紹 Google IO 2025 推出的 Gemma 3n，以及應用場景。

Gemma 3n 是 Google IO 2025 推出的最新一代 on-device AI 模型，主要應用場景在於 on-device AI 應用，特別是在移動裝置上。(ref: https://ai.google.dev/gemma/docs/gemma-3n)

Key Features

Multimodal Input

音訊輸入：支援語音識別、翻譯和音頻數據分析
視覺與文字輸入：多模態能力可同時處理視覺、聲音和文字，幫助理解和分析周圍環境

Efficient Architecture

PLE Caching：支援 Per-Layer Embedding (PLE) 參數快取，可將參數儲存在快速本地存儲中，降低運行記憶體成本
MatFormer Architecture：採用 Matryoshka Transformer 架構，可根據請求選擇性激活模型參數，降低計算成本和響應時間
Conditional Parameter Loading：可選擇性跳過載入視覺和音頻參數，減少總載入參數量，節省記憶體資源

其他特色

廣泛語言支援：支援超過 140 種語言
32K token 上下文：提供充足的輸入上下文，便於數據分析和處理任務

Model Parameter Management & Efficiency

Effective Parameters & Model Size

Gemma 3n 採用 E2B 和 E4B 等參數規格，這些數字代表模型在高效運作時的有效參數量。例如：

E2B 模型在標準執行時會載入超過 5 billion 參數
但透過參數跳過和 PLE 快取技術，可以將記憶體佔用降低至約 19.1 billion 參數
這種設計讓模型能在資源有限的設備上高效運行

Parameter Types & Selective Loading

Gemma 3n 的參數分為四大類：

Text Parameters
Visual Parameters
Audio Parameters
Per-Layer Embedding Parameters

Dynamic Resource Management

Conditional Parameter Loading: 可根據任務需求動態載入特定類型的參數
Resource-aware Execution: 根據設備能力自動調整載入的參數量

Advanced Technical Highlights

MatFormer 架構:採用 Matryoshka Transformer / MatFormer 架構設計，內含多個嵌套子模型

這些技術讓 Gemma 3n 能夠大幅降低對硬體資源的需求，特別適合在移動設備和邊緣計算環境中部署。

相關的 Developer Keynote 是由 Gemma Models Product Manager Gus Martins 演講:

亮點 1: "A model that can run on as little as 2GB of RAM"

Shares the same architecture as Gemini Nano

亮點 2: "Unmatched AI performance"

Much faster and learner mobile hardware compared to Gemma 3

亮點 3: "Truly multimodal"

Added audio understanding

目前直接在 Google AI Studio and Google AI Edge. 也放在 HuggingFace, Ollama, and Unsloth

亮點 4: "A powerful foundation for new domains"

ex. Healthcare -> MedGemma, SignGemma (手語)

接著，Gus 開始利用 Unsloth 搭配 Gemma 3 1B 模型 (偷偷喵了一下，應該是直接用 Unsloth unsloth/gemma-3-1b-it-unsloth-bnb-4bit)，範例使用 Emoji 作為 Fine-Tuning 的資料，來訓練一個可以理解 Emoji 的模型。