no code implementations • 6 Jun 2024 • Yihe Dong, Sercan Arik, Nathanael Yoder, Tomas Pfister
Feature engineering has demonstrated substantial utility for many machine learning workflows, such as in the small data regime or when distribution shifts are severe.
1 code implementation • 1 Nov 2023 • Chuizheng Meng, Yihe Dong, Sercan Ö. Arik, Yan Liu, Tomas Pfister
Estimation of temporal counterfactual outcomes from observed history is crucial for decision-making in many domains such as healthcare and e-commerce, particularly when randomized controlled trials (RCTs) suffer from high cost or impracticality.
1 code implementation • 26 May 2023 • Sayna Ebrahimi, Sercan O. Arik, Yihe Dong, Tomas Pfister
To bridge this gap, we propose LANISTR, an attention-based framework to learn from LANguage, Image, and STRuctured data.
no code implementations • 6 Apr 2023 • Yihe Dong, Sercan O. Arik
Feature selection has been widely used to alleviate compute requirements during training, elucidate model interpretability, and improve model generalizability.
1 code implementation • 7 Oct 2022 • Rui Wang, Yihe Dong, Sercan Ö. Arik, Rose Yu
Temporal distributional shifts, with underlying dynamics changing over time, frequently occur in real-world time series and pose a fundamental challenge for deep neural networks (DNNs).
1 code implementation • 5 Mar 2021 • Yihe Dong, Jean-Baptiste Cordonnier, Andreas Loukas
Attention-based architectures have become ubiquitous in machine learning, yet our understanding of the reasons for their effectiveness remains limited.
no code implementations • NeurIPS 2020 • Yihe Dong, Will Sawin
We introduce COPT, a novel distance metric between graphs defined via an optimization routine, computing a coordinated pair of optimal transport maps simultaneously.
1 code implementation • 22 Jun 2020 • Yihe Dong, Will Sawin, Yoshua Bengio
Hypergraphs provide a natural representation for many real world datasets.
3 code implementations • NeurIPS 2020 • Sourav Biswas, Yihe Dong, Gautam Kamath, Jonathan Ullman
We present simple differentially private estimators for the mean and covariance of multivariate sub-Gaussian data that are accurate at small sample sizes.
1 code implementation • 3 May 2020 • Yihe Dong, Yu Gao, Richard Peng, Ilya Razenshteyn, Saurabh Sawlani
We investigate the problem of efficiently computing optimal transport (OT) distances, which is equivalent to the node-capacitated minimum cost maximum flow problem in a bipartite graph.
1 code implementation • 9 Mar 2020 • Yihe Dong, Will Sawin
We introduce COPT, a novel distance metric between graphs defined via an optimization routine, computing a coordinated pair of optimal transport maps simultaneously.
1 code implementation • ICML 2020 • Arturs Backurs, Yihe Dong, Piotr Indyk, Ilya Razenshteyn, Tal Wagner
Our extensive experiments, on real-world text and image datasets, show that Flowtree improves over various baselines and existing methods in either running time or accuracy.
Data Structures and Algorithms
1 code implementation • NeurIPS 2019 • Yihe Dong, Samuel B. Hopkins, Jerry Li
In robust mean estimation the goal is to estimate the mean $\mu$ of a distribution on $\mathbb{R}^d$ given $n$ independent samples, an $\varepsilon$-fraction of which have been corrupted by a malicious adversary.
no code implementations • 3 Apr 2019 • Hao Chen, Ilaria Chillotti, Yihe Dong, Oxana Poburinnaya, Ilya Razenshteyn, M. Sadegh Riazi
In this paper, we introduce SANNS, a system for secure $k$-NNS that keeps client's query and the search result confidential.
1 code implementation • ICLR 2020 • Yihe Dong, Piotr Indyk, Ilya Razenshteyn, Tal Wagner
Space partitions of $\mathbb{R}^d$ underlie a vast and important class of fast nearest neighbor search (NNS) algorithms.