How good is a particular LLM? This is a very subjective question which can be answered differently depending on the evaluation criteria.
In this lecture, we discuss several ways one could evaluate an LLM.
Extrinsic and Intrinsic evaluation
Popular benchmarks
Challenges in evaluating LLMs
Preventing Data contamination with a live evaluation platform
Meta-leaderboard
Other benchmarks
Some libraries for evaluating LLMs
Conclusion