On July 31, 2024, Google released Gemma 2B – a compact yet powerful language model that’s changing the game for developers and researchers worldwide. It’s part of the Gemma family of open-source language models derived from the technology behind Google’s Gemini series.

In this article, we’re taking a closer look at this 2-billion parameter model.

Important links:

Overview of Gemma 2B

Gemma 2B is a decoder-only language model with 2 billion parameters. It’s designed to be lightweight enough for deployment on consumer hardware while still offering impressive performance across a wide range of natural language processing tasks.

Key features:

  • 2 billion parameters
  • 8,192 token context length
  • Trained on approximately 2 trillion tokens
  • Available in both pretrained (PT) and instruction-tuned (IT) variants
  • Open-source with publicly available weights

Architecture and training

Gemma 2B utilizes a transformer decoder architecture with several optimizations:

  1. Multi-Query Attention: Unlike its larger 7B sibling, Gemma 2B uses multi-query attention with num_kv_heads = 1, which has been shown to work well at smaller scales.
  2. Rotary Positional Embeddings (RoPE): Used instead of absolute positional embeddings.
  3. GeGLU Activations: Replaces the standard ReLU non-linearity.
  4. RMSNorm: Used for layer normalization to stabilize training.
Gemma 2B model architecture

Key model parameters of Gemma 2B

Training details:

  • Hardware: TPUv5e pods (512 TPUv5e chips across 2 pods)
  • Software: JAX and ML Pathways
  • Training data: Web documents, code, and mathematics (primarily English)
gemma 2b parameters

Parameter counts for the Gemma 2B and 7B models

Performance benchmarks

Despite its compact size, Gemma 2B demonstrates impressive capabilities across various benchmarks:

Benchmark Metric Gemma 2B Score
MMLU 5-shot, top-1 42.3
HellaSwag 0-shot 71.4
PIQA 0-shot 77.3
SocialIQA 0-shot 49.7
BoolQ 0-shot 69.4
WinoGrande partial score 65.4
CommonsenseQA 7-shot 65.3
OpenBookQA 47.8
ARC-e 73.2
ARC-c 42.1
TriviaQA 5-shot 53.2
Natural Questions 5-shot 12.5
HumanEval pass@1 22.0
MBPP 3-shot 29.2
GSM8K maj@1 17.7
MATH 4-shot 11.8
AGIEval 24.2
BIG-Bench 35.2

These results showcase Gemma 2B’s applicability across tasks like question answering, common sense reasoning, and even basic coding challenges.

Responsible AI and safety

Google has implemented several measures to ensure Gemma 2B is developed and used responsibly:

1. Data preprocessing:

  • Rigorous filtering for CSAM (Child Sexual Abuse Material)
  • Removal of personal information and sensitive data
  • Content quality and safety filtering

2. Safety evaluations: The Gemma 2B IT (Instruction Tuned) model underwent extensive safety testing. Here are the results from key safety benchmarks:

Benchmark Metric Gemma 2 IT 2B
RealToxicity average 8.16
CrowS-Pairs top-1 37.67
BBQ Ambig 1-shot, top-1 83.20
BBQ Disambig top-1 69.31
Winogender top-1 52.91
TruthfulQA 43.72
Winobias 1_2 59.28
Winobias 2_2 88.57
Toxigen 48.32

The results of ethics and safety evaluations are within acceptable Google’s thresholds for meeting policies for categories such as child safety, content safety, representational harms, memorization, large-scale harms.

3. Dangerous capability evaluations: Google assessed Gemma models for potential misuse in areas like offensive cybersecurity, self-proliferation, and persuasion. While specific results for the 2B model aren’t provided, these evaluations inform the model’s development and usage guidelines.

Deployment and usage

One of Gemma 2B’s key advantages is its ability to run on consumer-grade hardware. Here’s how you can get started:

Basic CPU usage


from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b")

input_text = "Explain the concept of neural networks in simple terms."
input_ids = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

GPU acceleration


# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", device_map="auto")

input_text = "Explain the concept of neural networks in simple terms."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

4-bit Quantization for reduced memory usage


# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_4bit=True)

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", quantization_config=quantization_config)

input_text = "Explain the concept of neural networks in simple terms."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

Fine-tuning Gemma 2B

Google provides resources for fine-tuning Gemma models, which can be adapted for the 2B variant:

  1. Supervised Fine-Tuning (SFT) on UltraChat dataset using QLoRA
  2. SFT using FSDP on TPU devices
  3. A Google Colab notebook for SFT on an English quotes dataset

These resources can be found in the examples/ directory of the google/gemma-7b repository. When using these for Gemma 2B, simply change the model ID to google/gemma-2b.

💡
For those interested in pushing Gemma 2B even further, check out our article on finetuning Gemma 2 with UnslothAI, which demonstrates impressive speed improvements and reduced VRAM usage.

Limitations and ethical considerations

While Gemma 2B is impressive, it’s crucial to understand its limitations:

  1. Training data biases: The model may reflect biases present in its training data.
  2. Task complexity: Performance may vary based on the clarity of instructions and task difficulty.
  3. Language nuances: It may struggle with subtle linguistic features like sarcasm or idioms.
  4. Factual accuracy: As a language model, it may generate incorrect or outdated information.
  5. Reasoning capabilities: Complex logical reasoning or true understanding may be limited.

Ethical considerations include:

  • Potential for generating biased or harmful content
  • Risk of misuse for misinformation
  • Privacy concerns related to training data and outputs

Google provides a Responsible Generative AI Toolkit and a Gemma Prohibited Use Policy to guide ethical implementation.

Conclusion

Gemma 2B represents a significant step towards democratizing access to powerful language models. Its compact size, coupled with impressive performance, makes it an excellent choice for researchers, developers, and hobbyists looking to explore AI capabilities without the need for extensive computational resources.

Whether you’re building a chatbot, exploring natural language understanding, or diving into AI research, Gemma 2B offers a powerful and accessible starting point. As the AI community continues to evolve, we can expect to see exciting applications and further refinements of this technology in the coming years.

Last Update: 09/08/2024