LLM
Count the parameters in LLaMA V1 model
LLM techLet’s load the model from transformers import LlamaModel, LlamaConfig model = LlamaModel.from_pretrained("llama-7b-hf-path") def count_params(model, is_human: bool = False): params: int = sum(p.numel() for p in model.parameters() if p.requires_grad) return f"{params / 1e6:.2f}M" if is_human else params print(model) print("Total # of params:", count_params(model, is_human=True)) Print out the layers: LlamaModel( (embed_tokens): Embedding(32000, 4096, padding_idx=0) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaSdpaAttention( (q_proj): Linear(in_features=4096, out_features=4096, bias=False) (k_proj): Linear(in_features=4096, out_features=4096, bias=False) (v_proj): Linear(in_features=4096, out_features=4096, bias=False) (o_proj): Linear(in_features=4096, out_features=4096, bias=False) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=4096, out_features=11008, bias=False) (up_proj): Linear(in_features=4096, out_features=11008, bias=False) (down_proj): Linear(in_features=11008, out_features=4096, bias=False) (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() ) Total # of params: 6607.
Notes on LLM technologies (keep updating)
LLM techBrief notes on LLM technologies. Models GPT2 Model structure The GPT model employs a repeated structure of Transformer Blocks, each containing two sub-layers: a Masked Multi-Head Attention (MMHA) layer and a Position-wise Feed-Forward Network. The MMHA is a central component of the model. It operates by splitting the input into multiple ‘heads’, each of which learns to attend to different positions within the input sequence, allowing the model to focus on different aspects of the input simultaneously.