Search Results for author: Subhojyoti Mukherjee

Found 12 papers, 0 papers with code

Optimal Design for Human Feedback

no code implementations • 22 Apr 2024 • Subhojyoti Mukherjee, Anusha Lalitha, Kousha Kalantari, Aniket Deshmukh, Ge Liu, Yifei Ma, Branislav Kveton

Learning of preference models from human feedback has been central to recent advances in artificial intelligence.

Paper
Add Code

Experimental Design for Active Transductive Inference in Large Language Models

no code implementations • 12 Apr 2024 • Subhojyoti Mukherjee, Ge Liu, Aniket Deshmukh, Anusha Lalitha, Yifei Ma, Branislav Kveton

We design the LLM prompt by adaptively choosing few-shot examples for a given inference query.

Experimental Design

Paper
Add Code

Efficient and Interpretable Bandit Algorithms

no code implementations • 23 Oct 2023 • Subhojyoti Mukherjee, Ruihao Zhu, Branislav Kveton

We propose CODE, a bandit algorithm based on a Constrained Optimal DEsign, that is interpretable and maximally reduces the uncertainty.

Paper
Add Code

SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits

no code implementations • 29 Jan 2023 • Subhojyoti Mukherjee, Qiaomin Xie, Josiah Hanna, Robert Nowak

In this paper, we study the problem of optimal data collection for policy evaluation in linear bandits.

Experimental Design

Paper
Add Code

Safety Aware Changepoint Detection for Piecewise i.i.d. Bandits

no code implementations • 27 May 2022 • Subhojyoti Mukherjee

We provide regret bounds for our algorithms and show that the bounds are comparable to their counterparts from the safe bandit and piecewise i. i. d.

Paper
Add Code

ReVar: Strengthening Policy Evaluation via Reduced Variance Sampling

no code implementations • 9 Mar 2022 • Subhojyoti Mukherjee, Josiah P. Hanna, Robert Nowak

This paper studies the problem of data collection for policy evaluation in Markov decision processes (MDPs).

Paper
Add Code

Nearly Optimal Algorithms for Level Set Estimation

no code implementations • 2 Nov 2021 • Blake Mason, Romain Camilleri, Subhojyoti Mukherjee, Kevin Jamieson, Robert Nowak, Lalit Jain

The threshold value $\alpha$ can either be \emph{explicit} and provided a priori, or \emph{implicit} and defined relative to the optimal function value, i. e. $\alpha = (1-\epsilon)f(x_\ast)$ for a given $\epsilon > 0$ where $f(x_\ast)$ is the maximal function value and is unknown.

Experimental Design

Paper
Add Code

Chernoff Sampling for Active Testing and Extension to Active Regression

no code implementations • 15 Dec 2020 • Subhojyoti Mukherjee, Ardhendu Tripathy, Robert Nowak

Active learning can reduce the number of samples needed to perform a hypothesis test and to estimate the parameters of a model.

Active Learning Experimental Design +1

Paper
Add Code

Distribution-dependent and Time-uniform Bounds for Piecewise i.i.d Bandits

no code implementations • 30 May 2019 • Subhojyoti Mukherjee, Odalric-Ambrym Maillard

The second strategy \ImpCPD makes use of the knowledge of $T$ to achieve the order optimal regret bound of $\min\big\lbrace O(\sum\limits_{i=1}^{K} \sum\limits_{g=1}^{G}\frac{\log(T/H_{1, g})}{\Delta^{opt}_{i, g}}), O(\sqrt{GT})\big\rbrace$, (where $H_{1, g}$ is the problem complexity) thereby closing an important gap with respect to the lower bound in a specific challenging setting.

Multi-Armed Bandits

Paper
Add Code

A Unified Approach to Translate Classical Bandit Algorithms to the Structured Bandit Setting

no code implementations • 18 Oct 2018 • Samarth Gupta, Shreyas Chaudhari, Subhojyoti Mukherjee, Gauri Joshi, Osman Yağan

We consider a finite-armed structured bandit problem in which mean rewards of different arms are known functions of a common hidden parameter $\theta^*$.

Thompson Sampling

Paper
Add Code

Efficient-UCBV: An Almost Optimal Algorithm using Variance Estimates

no code implementations • 9 Nov 2017 • Subhojyoti Mukherjee, K. P. Naveen, Nandan Sudarsanam, Balaraman Ravindran

We propose a novel variant of the UCB algorithm (referred to as Efficient-UCB-Variance (EUCBV)) for minimizing cumulative regret in the stochastic multi-armed bandit (MAB) setting.

Thompson Sampling

Paper
Add Code

Thresholding Bandits with Augmented UCB

no code implementations • 7 Apr 2017 • Subhojyoti Mukherjee, K. P. Naveen, Nandan Sudarsanam, Balaraman Ravindran

In this paper we propose the Augmented-UCB (AugUCB) algorithm for a fixed-budget version of the thresholding bandit problem (TBP), where the objective is to identify a set of arms whose quality is above a threshold.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.