Introduction to Quantization and Sharding for LLMs
Sharding: Dividing the Load
Quantization: Shrinking the Model
Pruning: Compressing the Model
Conclusion