How good is a particular LLM? This is a very subjective question which can be answered differently depending on the evaluation criteria.

In this lecture, we discuss several ways one could evaluate an LLM.

Extrinsic and Intrinsic evaluation

Popular benchmarks

Challenges in evaluating LLMs

Preventing Data contamination with a live evaluation platform

Meta-leaderboard

Other benchmarks

Some libraries for evaluating LLMs

Conclusion