LLM RAM Calculator

Equation

\[M = \frac{P \cdot 4B}{32 / Q} \cdot 1.2\]

Symbol	Description
M	GPU memory expressed in Gigabyte
P	The amount of parameters in the model. E.g. a 7B model has 7 billion parameters.
4B	4 bytes, expressing the bytes used for each parameter
32	There are 32 bits in 4 bytes
Q	The amount of bits that should be used for loading the model. E.g. 16 bits, 8 bits or 4 bits.
1.2	Represents a 20% overhead of loading additional things in GPU memory.

For Llama 70B loaded in 16-bit mode:

70 * 4bytes / (32/16) * 1.2 = 168GB

Equation credit: This equation is attributed to Sam Stoelinga. You can find more about Sam here/