Search Results for author: Zeke Wang

Found 4 papers, 2 papers with code

DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference

no code implementations30 Mar 2024 Jinwei Yao, Kaiqi Chen, Kexun Zhang, Jiaxuan You, Binhang Yuan, Zeke Wang, Tao Lin

Given the increasing demand for tree-structured interactions with LLMs, we introduce DeFT (Decoding with Flash Tree-Attention), an IO-aware tree attention algorithm tailored for tree-structured inference.

Benchmarking High Bandwidth Memory on FPGAs

2 code implementations9 May 2020 Zeke Wang, Hongjing Huang, Jie Zhang, Gustavo Alonso

FPGAs are starting to be enhanced with High Bandwidth Memory (HBM) as a way to reduce the memory bandwidth bottleneck encountered in some applications and to give the FPGA more capacity to deal with application state.

Hardware Architecture

Cannot find the paper you are looking for? You can Submit a new open access paper.