LLM RAM Calculator

Equation

\[M = \frac{P \cdot 4B}{32 / Q} \cdot 1.2\]

Definitions

Symbol Description
M GPU memory expressed in Gigabyte
P The amount of parameters in the model. E.g. a 7B model has 7 billion parameters.
4B 4 bytes, expressing the bytes used for each parameter
32 There are 32 bits in 4 bytes
Q The amount of bits that should be used for loading the model. E.g. 16 bits, 8 bits or 4 bits.
1.2 Represents a 20% overhead of loading additional things in GPU memory.

Example: Llama 70B

For Llama 70B loaded in 16-bit mode:

70 * 4bytes / (32/16) * 1.2 = 168GB

Credit

Equation credit: This equation is attributed to Sam Stoelinga. You can find more about Sam here/