Basque LLM Evaluation

A public benchmark dashboard for comparing local LLM performance in Basque language tasks.

Table 1. Main comparative results

# Model Quantization Overall Evals (grouped) N
Accuracy reported as mean ± std across random seeds.

Figures

Figure 1. Overall accuracy by model

Figure 2. Accuracy profile by eval

Figure 3. Overall accuracy by release date

Evaluation protocol

Family Benchmark What is measured Metric Label space