ELI5 GenAI Day 08 - The Developer Perspective of using Gemma part 1: Hugging Face

Aug 27, 2024

名詞解釋

身為開發者，在挑選 Generative AI 的來源與使用方式時，常會聽到幾個名詞:

Hugging Face: 可以視為科學家的 GitHub，提供了許多 NLP 相關的模型、資料集、工具等。當然也提供 Generative AI 的模型。
OpenAI: 有名的 GPT 系列模型，包括 GPT-2、GPT-3 等。是目前最有名的 Generative AI 之一。目前僅提供 API 服務。
Google Vertex AI: 也有提供一些 Generative AI 的模型，Gemini, Gemma 等。
Azure AI: 也有提供一些 Generative AI 的模型，如 Azure Cognitive Services 等。
Claudia AI: 另一個主流之一個 Generative AI 模型提供者。

通常，模型發佈會先以 Hugging Face 為主，因此我們先來看看如何使用 Hugging Face 的模型。

先以這個[範例](https://github.com/google-gemini/gemma-cookbook/blob/main/Gemma/RAG_PDF_Search_in_multiple_documents_on_Colab.ipynb) 來拆解幾個部分。

範例說明

記得先有 Hugging Face Token

通常會有幾個步驟:

1. 透過 Hugging Face Token 來取得模型

2. 設定模型與 tokenizer

3. 載入模型

4. 載入 tokenizer

5. 設定模型的輸入

6. 輸入資料

步驟 1 ~ 4

Setting up the model and tokenizer

HUGGING_FACE_ACCESS_TOKEN = userdata.get('HF_TOKEN')

model_name = 'google/gemma-2-2b-it'

model = AutoModelForCausalLM.from_pretrained(

 model_name,

 torch_dtype=torch.float16,

 token=HUGGING_FACE_ACCESS_TOKEN

 ).to('cuda')

tokenizer = AutoTokenizer.from_pretrained(model_name, token=HUGGING_FACE_ACCESS_TOKEN)

設定模型的輸入與回應 helper function

def generate_response(query, context, max_length=1000):

 prompt = f"Context: {context}\n\nQuestion: {query}\n\nAnswer:"

 input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to('cuda')

with torch.no_grad():

 output = model.generate(input_ids, max_new_tokens=max_length, num_return_sequences=1)

 decoded_output = tokenizer.decode(output[0], skip_special_tokens=True)

# Extracting the answer part by removing the prompt portion

 answer_start = decoded_output.find("Answer:") + len("Answer:")

 answer = decoded_output[answer_start:].strip()

return answer

步驟 5: 設定查找文件的 helper function - 透過模型回答問題

def query_documents(query):

 similar_chunks = find_most_similar_chunks(query)

 context = " ".join([result['chunk'].replace("\n", "") for result in similar_chunks])

 response = generate_response(query, context)

return response, similar_chunks

步驟 6: 輸入資料

query = "How many types of regular Train Car cards are there?"

answer, relevant_chunks = query_documents(query)

print(f"Query: {query}\n\n-----\n")

print(f"Generated answer: {answer}\n\n-----\n")

print("Relevant chunks:")

for chunk in relevant_chunks:

print(f"Document: {chunk['document']}")

print(f"Chunk: {chunk['chunk']}".replace("\n", ""))

print(f"Distance: {chunk['distance']}")

print()

結論

透過 Hugging Face 載入模型會是身為開發者的第一種方式，因為 Hugging Face 提供了許多模型與資料集，並且提供了許多工具，讓開發者可以快速的使用。然而，對於 Software Engineer 來說，載入，讀取的效能將會取決於 Hubbing Face Library 以及 Hugging Face 網路速度。下一篇我們會介紹透過常見的 RESTful API 來使用 Generative AI 模型。

本篇同步發表於 [iThome] 與 [個人電子報]

#Gemma #GemmaSprint