CLUE is a Chinese Language Understanding Evaluation benchmark. It consists of different NLU datasets. It is a community-driven project that brings together 9 tasks spanning several well-established single-sentence/sentence-pair classification tasks, as well as machine reading comprehension, all on original Chinese text.
95 PAPERS • 8 BENCHMARKS
The MQ2008 dataset is a dataset for Learning to Rank. It contains 800 queries with labelled documents.
28 PAPERS • NO BENCHMARKS YET
A dataset on asking Questions for Lack of Clarity in open-domain information-seeking conversations. Qulac presents the first dataset and offline evaluation framework for studying clarifying questions in open-domain information-seeking conversational search systems.
18 PAPERS • NO BENCHMARKS YET
The datasets are machine learning data, in which queries and urls are represented by IDs. The datasets consist of feature vectors extracted from query-url pairs along with relevance judgment labels:
7 PAPERS • NO BENCHMARKS YET
DareCzech DaReCzech is a dataset for text relevance ranking in Czech. The dataset consists of more than 1.6M annotated query-documents pairs, which makes it one of the largest available datasets for this task.
2 PAPERS • 1 BENCHMARK
The Istella LETOR full dataset is composed of 33,018 queries and 220 features representing each query-document pair. It consists of 10,454,629 examples labeled with relevance judgments ranging from 0 (irrelevant) to 4 (perfectly relevant). The average number of per-query examples is 316. It has been splitted in train and test sets according to a 80%-20% scheme.
1 PAPER • NO BENCHMARKS YET