LLM

Count the parameters in LLaMA V1 model

LLM tech

Let’s load the model from transformers import LlamaModel, LlamaConfig model = LlamaModel.from_pretrained("llama-7b-hf-path") def count_params(model, is_human: bool = False): params: int = sum(p.numel() for p in model.parameters() if p.requires_grad) return f"{params / 1e6:.2f}M" if is_human else params print(model) print("Total # of params:", count_params(model, is_human=True)) Print out the layers: LlamaModel( (embed_tokens): Embedding(32000, 4096, padding_idx=0) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaSdpaAttention( (q_proj): Linear(in_features=4096, out_features=4096, bias=False) (k_proj): Linear(in_features=4096, out_features=4096, bias=False) (v_proj): Linear(in_features=4096, out_features=4096, bias=False) (o_proj): Linear(in_features=4096, out_features=4096, bias=False) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=4096, out_features=11008, bias=False) (up_proj): Linear(in_features=4096, out_features=11008, bias=False) (down_proj): Linear(in_features=11008, out_features=4096, bias=False) (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() ) Total # of params: 6607.

Notes on LLM technologies (keep updating)

LLM tech

Brief notes on LLM technologies. Models GPT2 Model structure The GPT model employs a repeated structure of Transformer Blocks, each containing two sub-layers: a Masked Multi-Head Attention (MMHA) layer and a Position-wise Feed-Forward Network. The MMHA is a central component of the model. It operates by splitting the input into multiple ‘heads’, each of which learns to attend to different positions within the input sequence, allowing the model to focus on different aspects of the input simultaneously.