On July 31, 2024, Google released Gemma 2B – a compact yet powerful language model that’s changing the game for developers and researchers worldwide. It’s part of the Gemma family of open-source language models derived from the technology behind Google’s Gemini series.
In this article, we’re taking a closer look at this 2-billion parameter model.
Important links:
- Gemma: Open Models Based on Gemini Research and Technology (the paper)
- https://huggingface.co/google/gemma-2b (Hugging Face model card)
- https://ai.google.dev/gemma/docs/model_card_2 (AI for developers Google model card)
- https://www.kaggle.com/models/google/gemma-2 (Kaggle model card)
Overview of Gemma 2B
Gemma 2B is a decoder-only language model with 2 billion parameters. It’s designed to be lightweight enough for deployment on consumer hardware while still offering impressive performance across a wide range of natural language processing tasks.
Key features:
- 2 billion parameters
- 8,192 token context length
- Trained on approximately 2 trillion tokens
- Available in both pretrained (PT) and instruction-tuned (IT) variants
- Open-source with publicly available weights
Architecture and training
Gemma 2B utilizes a transformer decoder architecture with several optimizations:
- Multi-Query Attention: Unlike its larger 7B sibling, Gemma 2B uses multi-query attention with num_kv_heads = 1, which has been shown to work well at smaller scales.
- Rotary Positional Embeddings (RoPE): Used instead of absolute positional embeddings.
- GeGLU Activations: Replaces the standard ReLU non-linearity.
- RMSNorm: Used for layer normalization to stabilize training.
Training details:
- Hardware: TPUv5e pods (512 TPUv5e chips across 2 pods)
- Software: JAX and ML Pathways
- Training data: Web documents, code, and mathematics (primarily English)
Performance benchmarks
Despite its compact size, Gemma 2B demonstrates impressive capabilities across various benchmarks:
Benchmark | Metric | Gemma 2B Score |
---|---|---|
MMLU | 5-shot, top-1 | 42.3 |
HellaSwag | 0-shot | 71.4 |
PIQA | 0-shot | 77.3 |
SocialIQA | 0-shot | 49.7 |
BoolQ | 0-shot | 69.4 |
WinoGrande | partial score | 65.4 |
CommonsenseQA | 7-shot | 65.3 |
OpenBookQA | – | 47.8 |
ARC-e | – | 73.2 |
ARC-c | – | 42.1 |
TriviaQA | 5-shot | 53.2 |
Natural Questions | 5-shot | 12.5 |
HumanEval | pass@1 | 22.0 |
MBPP | 3-shot | 29.2 |
GSM8K | maj@1 | 17.7 |
MATH | 4-shot | 11.8 |
AGIEval | – | 24.2 |
BIG-Bench | – | 35.2 |
These results showcase Gemma 2B’s applicability across tasks like question answering, common sense reasoning, and even basic coding challenges.
Responsible AI and safety
Google has implemented several measures to ensure Gemma 2B is developed and used responsibly:
1. Data preprocessing:
- Rigorous filtering for CSAM (Child Sexual Abuse Material)
- Removal of personal information and sensitive data
- Content quality and safety filtering
2. Safety evaluations: The Gemma 2B IT (Instruction Tuned) model underwent extensive safety testing. Here are the results from key safety benchmarks:
Benchmark | Metric | Gemma 2 IT 2B |
---|---|---|
RealToxicity | average | 8.16 |
CrowS-Pairs | top-1 | 37.67 |
BBQ Ambig | 1-shot, top-1 | 83.20 |
BBQ Disambig | top-1 | 69.31 |
Winogender | top-1 | 52.91 |
TruthfulQA | – | 43.72 |
Winobias 1_2 | – | 59.28 |
Winobias 2_2 | – | 88.57 |
Toxigen | – | 48.32 |
The results of ethics and safety evaluations are within acceptable Google’s thresholds for meeting policies for categories such as child safety, content safety, representational harms, memorization, large-scale harms.
3. Dangerous capability evaluations: Google assessed Gemma models for potential misuse in areas like offensive cybersecurity, self-proliferation, and persuasion. While specific results for the 2B model aren’t provided, these evaluations inform the model’s development and usage guidelines.
Deployment and usage
One of Gemma 2B’s key advantages is its ability to run on consumer-grade hardware. Here’s how you can get started:
Basic CPU usage
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b")
input_text = "Explain the concept of neural networks in simple terms."
input_ids = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
GPU acceleration
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", device_map="auto")
input_text = "Explain the concept of neural networks in simple terms."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
4-bit Quantization for reduced memory usage
# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", quantization_config=quantization_config)
input_text = "Explain the concept of neural networks in simple terms."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
Fine-tuning Gemma 2B
Google provides resources for fine-tuning Gemma models, which can be adapted for the 2B variant:
- Supervised Fine-Tuning (SFT) on UltraChat dataset using QLoRA
- SFT using FSDP on TPU devices
- A Google Colab notebook for SFT on an English quotes dataset
These resources can be found in the examples/
directory of the google/gemma-7b
repository. When using these for Gemma 2B, simply change the model ID to google/gemma-2b
.
Limitations and ethical considerations
While Gemma 2B is impressive, it’s crucial to understand its limitations:
- Training data biases: The model may reflect biases present in its training data.
- Task complexity: Performance may vary based on the clarity of instructions and task difficulty.
- Language nuances: It may struggle with subtle linguistic features like sarcasm or idioms.
- Factual accuracy: As a language model, it may generate incorrect or outdated information.
- Reasoning capabilities: Complex logical reasoning or true understanding may be limited.
Ethical considerations include:
- Potential for generating biased or harmful content
- Risk of misuse for misinformation
- Privacy concerns related to training data and outputs
Google provides a Responsible Generative AI Toolkit and a Gemma Prohibited Use Policy to guide ethical implementation.
Conclusion
Gemma 2B represents a significant step towards democratizing access to powerful language models. Its compact size, coupled with impressive performance, makes it an excellent choice for researchers, developers, and hobbyists looking to explore AI capabilities without the need for extensive computational resources.
Whether you’re building a chatbot, exploring natural language understanding, or diving into AI research, Gemma 2B offers a powerful and accessible starting point. As the AI community continues to evolve, we can expect to see exciting applications and further refinements of this technology in the coming years.