| Symbol | Description |
|---|---|
| M | GPU memory expressed in Gigabyte |
| P | The amount of parameters in the model. E.g. a 7B model has 7 billion parameters. |
| 4B | 4 bytes, expressing the bytes used for each parameter |
| 32 | There are 32 bits in 4 bytes |
| Q | The amount of bits that should be used for loading the model. E.g. 16 bits, 8 bits or 4 bits. |
| 1.2 | Represents a 20% overhead of loading additional things in GPU memory. |
For Llama 70B loaded in 16-bit mode:
70 * 4bytes / (32/16) * 1.2 = 168GB
Equation credit: This equation is attributed to Sam Stoelinga. You can find more about Sam here/