From dd320f7755d582e38f228f97e3ef655765a88b22 Mon Sep 17 00:00:00 2001
From: Kai Wu <kaiwu@meta.com>
Date: Mon, 30 Sep 2024 13:14:51 -0700
Subject: add a line to link the eval reproduce recipe (#123)

---
 models/llama3_1/eval_details.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/models/llama3_1/eval_details.md b/models/llama3_1/eval_details.md
index 0637a62..f126f00 100644
--- a/models/llama3_1/eval_details.md
+++ b/models/llama3_1/eval_details.md
@@ -6,7 +6,8 @@ This document contains some additional context on the settings and methodology f
 
 ## Language auto-eval benchmark notes:
 
-For a given benchmark, we strive to use consistent evaluation settings across all models, including external models. We make every effort to achieve optimal scores for external models, including addressing any model-specific parsing and tokenization requirements. Where the scores are lower for external models than self-reported scores on comparable or more conservative settings, we report the self-reported scores for external models. We are also releasing the data generated as part of evaluations with publicly available benchmarks which can be found on [Llama 3.1 Evals Huggingface collection](https://huggingface.co/collections/meta-llama/llama-31-evals-66a2c5a14c2093e58298ac7f).
+For a given benchmark, we strive to use consistent evaluation settings across all models, including external models. We make every effort to achieve optimal scores for external models, including addressing any model-specific parsing and tokenization requirements. Where the scores are lower for external models than self-reported scores on comparable or more conservative settings, we report the self-reported scores for external models. We are also releasing the data generated as part of evaluations with publicly available benchmarks which can be found on [Llama 3.1 Evals Huggingface collection](https://huggingface.co/collections/meta-llama/llama-31-evals-66a2c5a14c2093e58298ac7f). We have also developed a [eval reproduction recipe](https://github.com/meta-llama/llama-recipes/tree/b5f64c0b69d7ff85ec186d964c6c557d55025969/tools/benchmarks/llm_eval_harness/meta_eval_reproduce) that demonstrates how to closely reproduce the Llama 3.1 reported benchmark numbers using the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/main) library and the datasets in [3.1 evals collections](https://huggingface.co/collections/meta-llama/llama-31-evals-66a2c5a14c2093e58298ac7f) on selected tasks.
+
 
 
 ### MMLU
-- 
cgit v1.2.3-70-g09d2