Natural Language Processing

nlg evaluation

23 papers with code • 0 benchmarks • 0 datasets

Evaluate the generated text by NLG (Natural Language Generation) systems, like large language models

Benchmarks

Add a Result

These leaderboards are used to track progress in nlg evaluation

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Most implemented papers

Most implemented Social Latest No code

Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets

salesforce/nnd_evaluation • 13 May 2022

Precisely assessing the progress in natural language generation (NLG) tasks is challenging, and human evaluation to establish a preference in a model's output over another is often necessary.

Paper
Code

EffEval: A Comprehensive Evaluation of Efficiency for MT Evaluation Metrics

nl2g/effeval • • 20 Sep 2022

In this work, we provide a comprehensive evaluation of efficiency for MT evaluation metrics.

Paper
Code

Not All Errors are Equal: Learning Text Generation Metrics using Stratified Error Synthesis

xu1998hz/sescore • 10 Oct 2022

Is it possible to build a general and automatic natural language generation (NLG) evaluation metric?

Paper
Code

CLSE: Corpus of Linguistically Significant Entities

google-research-datasets/clse • 4 Nov 2022

Using the CLSE's entities and a small number of human translations, we create a linguistically representative NLG evaluation benchmark in three languages: French (high-resource), Marathi (low-resource), and Russian (highly inflected language).

Paper
Code

Is ChatGPT a Good NLG Evaluator? A Preliminary Study

krystalan/chatgpt_as_nlg_evaluator • 7 Mar 2023

In detail, we regard ChatGPT as a human evaluator and give task-specific (e. g., summarization) and aspect-specific (e. g., relevance) instruction to prompt ChatGPT to evaluate the generated results of NLG models.

Paper
Code

Describe me an Aucklet: Generating Grounded Perceptual Category Descriptions

gu-clasp/describe-me-an-auklet • 7 Mar 2023

Human speakers can generate descriptions of perceptual concepts, abstracted from the instance-level.

Paper
Code

Evaluating Evaluation Metrics: A Framework for Analyzing NLG Evaluation Metrics using Measurement Theory

isle-dev/metriceval • 24 May 2023

We address a fundamental challenge in Natural Language Generation (NLG) model evaluation -- the design and evaluation of evaluation metrics.

Paper
Code

DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question Answering

kepei1106/decompeval • 13 Jul 2023

Existing evaluation metrics for natural language generation (NLG) tasks face the challenges on generalization ability and interpretability.

Paper
Code

LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise Comparisons using Large Language Models

adianliusie/comparative-assessment • 15 Jul 2023

Current developments in large language models (LLMs) have enabled impressive zero-shot capabilities across various natural language tasks.

Paper
Code

Towards Multiple References Era -- Addressing Data Leakage and Limited Reference Diversity in NLG Evaluation

sefazeng/llm-ref • 6 Aug 2023

To address this issue, we propose to utilize \textit{multiple references} to enhance the consistency between these metrics and human evaluations.

Paper
Code

nlg evaluation

Benchmarks Add a Result

Most implemented papers

Content

Benchmarks

Add a Result