Search Results for author: Zeke Wang

Found 4 papers, 2 papers with code

DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference

no code implementations • 30 Mar 2024 • Jinwei Yao, Kaiqi Chen, Kexun Zhang, Jiaxuan You, Binhang Yuan, Zeke Wang, Tao Lin

Given the increasing demand for tree-structured interactions with LLMs, we introduce DeFT (Decoding with Flash Tree-Attention), an IO-aware tree attention algorithm tailored for tree-structured inference.

Paper
Add Code

MARS: Exploiting Multi-Level Parallelism for DNN Workloads on Adaptive Multi-Accelerator Systems

no code implementations • 23 Jul 2023 • Guan Shen, Jieru Zhao, Zeke Wang, Zhe Lin, Wenchao Ding, Chentao Wu, Quan Chen, Minyi Guo

Along with the fast evolution of deep neural networks, the hardware system is also developing rapidly.

Paper
Add Code

Benchmarking High Bandwidth Memory on FPGAs

2 code implementations • 9 May 2020 • Zeke Wang, Hongjing Huang, Jie Zhang, Gustavo Alonso

FPGAs are starting to be enhanced with High Bandwidth Memory (HBM) as a way to reduce the memory bandwidth bottleneck encountered in some applications and to give the FPGA more capacity to deal with application state.

Hardware Architecture

Paper
Code

Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning (Technical Report)

1 code implementation • 8 Mar 2019 • Zeke Wang, Kaan Kara, Hantian Zhang, Gustavo Alonso, Onur Mutlu, Ce Zhang

Learning from the data stored in a database is an important function increasingly available in relational engines.

Quantization Retrieval

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.