Symbol | Description |
---|---|
M | GPU memory expressed in Gigabyte |
P | The amount of parameters in the model. E.g. a 7B model has 7 billion parameters. |
4B | 4 bytes, expressing the bytes used for each parameter |
32 | There are 32 bits in 4 bytes |
Q | The amount of bits that should be used for loading the model. E.g. 16 bits, 8 bits or 4 bits. |
1.2 | Represents a 20% overhead of loading additional things in GPU memory. |
For Llama 70B loaded in 16-bit mode:
70 * 4bytes / (32/16) * 1.2 = 168GB
Equation credit: This equation is attributed to Sam Stoelinga. You can find more about Sam here/