Ollama RAG Demonstration with EmbeddingGemma and Gemma3n

Built by: JimmyLiao and Gemini-CLI

Sep 05, 2025

Repo: https://github.com/jimmyliao/lab-embeddinggemma

This project demonstrates a basic Retrieval Augmented Generation (RAG) pipeline using Ollama for both embedding and large language model (LLM) inference. It showcases the use of `EmbeddingGemma` for generating embeddings from Chinese text and `gemma3n:e2b` for augmented generation, all managed with `uv` for Python environment setup.

Features

Ollama Integration: Interact with Ollama's local API for text embeddings and LLM generation.
Embedding Generation: Utilize `EmbeddingGemma` to convert text into numerical vector representations.
Simple RAG Pipeline: Implement a basic RAG workflow:
1. Retrieval: Find semantically similar sentences from a knowledge base based on a query.
2. Augmented Generation: Use a large language model (`gemma3n:e2b`) to generate a coherent response, grounded in the retrieved context.
Chinese Language Support: Demonstrates the capability of `EmbeddingGemma` and `gemma3n:e2b` with Chinese text.

Prerequisites

Before running this project, ensure you have the following installed and set up:

Ollama: Download and install Ollama from [ollama.ai](https://ollama.ai/).
Ollama Models: Pull the necessary models using the Ollama CLI:

    ollama pull embeddinggemma

    ollama pull gemma3n:e2b

    # Ensure your Ollama server is running: ollama serve

`uv`: Install `uv` (a fast Python package installer and resolver):

pip install uv

Setup

Clone the repository:

git clone https://github.com/jimmyliao/lab-embeddinggemma
cd lab-embeddinggemma

Create a virtual environment and install dependencies:

Project Structure

`main.py`: A simple script demonstrating how to get embeddings from Ollama.
`embedding_utils.py`: Contains the `get_embedding` function, extracted for modularity.
`lang_test.py`: Implements the RAG pipeline, including text chunking, similarity search, and augmented generation.
`intro_embeddinggemma.txt`: A sample Chinese text document used as the knowledge base for the RAG system.
`requirements.txt`: Lists the Python dependencies (`requests`, `numpy`).

Usage

Running the Embedding Test (`main.py`)

This script demonstrates how to get an embedding for a simple English prompt.

uv run python main.py

Running the RAG Demonstration (`lang_test.py`)

This script performs the RAG pipeline, including embedding a Chinese knowledge base, querying it, retrieving relevant sentences, and generating a response using `gemma3n:e2b`.

uv run python lang_test.py

Example log


正在為知識庫中的句子生成嵌入向量...
處理句子 1/16: EmbeddingGemma 模型總覽...
處理句子 2/16: EmbeddingGemma 是以 Ge...
處理句子 3/16: 這項技術經過最佳化，適用於手機、筆電和平...
處理句子 4/16: 模型會產生文字的數值表示法，用於資訊檢索...
處理句子 5/16: EmbeddingGemma 包含下列重...
處理句子 6/16: 支援多種語言：可理解多種語言的資料，並以...
處理句子 7/16: 彈性輸出尺寸：使用 Matryoshka...
處理句子 8/16: 2K 權杖內容：提供大量輸入內容，可直接...
處理句子 9/16: 節省儲存空間：透過量化在 RAM 不到 ...
處理句子 10/16: 低延遲：在 EdgeTPU 上生成嵌入內...
處理句子 11/16: 離線安全：直接在硬體上生成文件嵌入內容，...
處理句子 12/16: 提示： 使用 Gemma 3n 部署 E...
處理句子 13/16: 如要開始使用，請參閱快速入門 RAG 筆...
處理句子 14/16: 在 Hugging Face 上取得 在...
處理句子 15/16: 與其他 Gemma 模型一樣，Embed...
處理句子 16/16: source: https://ai.g...
知識庫嵌入完成。

--- 處理查詢: EmbeddingGemma 支援幾種語言 ---
最相關的句子:
  相似度: 0.8920 - EmbeddingGemma 模型總覽
  相似度: 0.8073 - EmbeddingGemma 包含下列重要功能：
  相似度: 0.7409 - source: https://ai.google.dev/gemma/docs/embeddinggemma?hl=zh-tw

正在生成回應...
生成的回應:
根據提供的上下文，我無法得知 EmbeddingGemma 支援幾種語言。 上下文只提到它是一個模型，並指向一個包含更多資訊的連結。

因此，我不知道 EmbeddingGemma 支援幾種語言。


--- 處理查詢: EmbeddingGemma 與 Gemma3n 的關聯是什麼 ---
最相關的句子:
  相似度: 0.8633 - EmbeddingGemma 模型總覽
  相似度: 0.7809 - EmbeddingGemma 包含下列重要功能：
  相似度: 0.7601 - EmbeddingGemma 是以 Gemma 3 為基礎的 3.08 億參數多語言文字嵌入模型

正在生成回應...
生成的回應:
EmbbeedingGemma 是以 Gemma 3 為基礎的 3.08 億參數多語言文字嵌入模型。

因此，EmbbeedingGemma 與 Gemma3n 的關聯是：**EmbbeedingGemma 是以 Gemma 3 為基礎的。** 

更具體地說，EmbbeedingGemma 的設計和參數來自於 Gemma 3 模型。


--- 處理查詢: EmbeddingGemma 的主要功能是什麼 ---
最相關的句子:
  相似度: 0.9186 - EmbeddingGemma 包含下列重要功能：
  相似度: 0.8369 - EmbeddingGemma 模型總覽
  相似度: 0.6742 - 2K 權杖內容：提供大量輸入內容，可直接在硬體上處理文字資料和文件

正在生成回應...
生成的回應:
EmbbeedingGemma 的主要功能是提供大量輸入內容，可直接在硬體上處理文字資料和文件。

License

This project is open-sourced under the MIT License. See the `LICENSE` file for more details.