Search Results for author: Sirui Zhao

Found 11 papers, 6 papers with code

Exploring User Retrieval Integration towards Large Language Models for Cross-Domain Sequential Recommendation

no code implementations • 5 Jun 2024 • Tingjia Shen, Hao Wang, Jiaqing Zhang, Sirui Zhao, Liangyue Li, Zulong Chen, Defu Lian, Enhong Chen

To this end, we propose a novel framework named URLLM, which aims to improve the CDSR performance by exploring the User Retrieval approach and domain grounding on LLM simultaneously.

Contrastive Learning Language Modelling +4

Paper
Add Code

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

no code implementations • 31 May 2024 • Chaoyou Fu, Yuhan Dai, Yondong Luo, Lei LI, Shuhuai Ren, Renrui Zhang, Zihan Wang, Chenyu Zhou, Yunhang Shen, Mengdan Zhang, Peixian Chen, Yanwei Li, Shaohui Lin, Sirui Zhao, Ke Li, Tong Xu, Xiawu Zheng, Enhong Chen, Rongrong Ji, Xing Sun

With Video-MME, we extensively evaluate various state-of-the-art MLLMs, including GPT-4 series and Gemini 1. 5 Pro, as well as open-source image models like InternVL-Chat-V1. 5 and video models like LLaVA-NeXT-Video.

Paper
Add Code

Dataset Regeneration for Sequential Recommendation

no code implementations • 28 May 2024 • Mingjia Yin, Hao Wang, Wei Guo, Yong liu, Suojuan Zhang, Sirui Zhao, Defu Lian, Enhong Chen

The sequential recommender (SR) system is a crucial component of modern recommender systems, as it aims to capture the evolving preferences of users.

Sequential Recommendation

Paper
Add Code

Learning Partially Aligned Item Representation for Cross-Domain Sequential Recommendation

no code implementations • 21 May 2024 • Mingjia Yin, Hao Wang, Wei Guo, Yong liu, Zhi Li, Sirui Zhao, Defu Lian, Enhong Chen

Cross-domain sequential recommendation (CDSR) aims to uncover and transfer users' sequential preferences across multiple recommendation domains.

Multi-Task Learning Self-Supervised Learning +1

Paper
Add Code

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

2 code implementations • 19 Dec 2023 • Chaoyou Fu, Renrui Zhang, Zihan Wang, Yubo Huang, Zhengye Zhang, Longtian Qiu, Gaoxiang Ye, Yunhang Shen, Mengdan Zhang, Peixian Chen, Sirui Zhao, Shaohui Lin, Deqiang Jiang, Di Yin, Peng Gao, Ke Li, Hongsheng Li, Xing Sun

They endow Large Language Models (LLMs) with powerful capabilities in visual understanding, enabling them to tackle diverse multi-modal tasks.

Visual Reasoning

9,901

Paper
Code

APGL4SR: A Generic Framework with Adaptive and Personalized Global Collaborative Information in Sequential Recommendation

1 code implementation • 6 Nov 2023 • Mingjia Yin, Hao Wang, Xiang Xu, Likang Wu, Sirui Zhao, Wei Guo, Yong liu, Ruiming Tang, Defu Lian, Enhong Chen

To this end, we propose a graph-driven framework, named Adaptive and Personalized Graph Learning for Sequential Recommendation (APGL4SR), that incorporates adaptive and personalized global collaborative information into sequential recommendation systems.

Graph Learning Multi-Task Learning +1

Paper
Code

Woodpecker: Hallucination Correction for Multimodal Large Language Models

1 code implementation • 24 Oct 2023 • Shukang Yin, Chaoyou Fu, Sirui Zhao, Tong Xu, Hao Wang, Dianbo Sui, Yunhang Shen, Ke Li, Xing Sun, Enhong Chen

Hallucination is a big shadow hanging over the rapidly evolving Multimodal Large Language Models (MLLMs), referring to the phenomenon that the generated text is inconsistent with the image content.

Hallucination

562

Paper
Code

A Solution to CVPR'2023 AQTC Challenge: Video Alignment for Multi-Step Inference

1 code implementation • 26 Jun 2023 • Chao Zhang, Shiwei Wu, Sirui Zhao, Tong Xu, Enhong Chen

In this paper, we present a solution for enhancing video alignment to improve multi-step inference.

Video Alignment

Paper
Code

A Survey on Multimodal Large Language Models

1 code implementation • 23 Jun 2023 • Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, Enhong Chen

Recently, Multimodal Large Language Model (MLLM) represented by GPT-4V has been a new rising research hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform multimodal tasks.

Hallucination In-Context Learning +5

9,901

Paper
Code

AU-aware graph convolutional network for Macro- and Micro-expression spotting

1 code implementation • 16 Mar 2023 • Shukang Yin, Shiwei Wu, Tong Xu, Shifeng Liu, Sirui Zhao, Enhong Chen

Automatic Micro-Expression (ME) spotting in long videos is a crucial step in ME analysis but also a challenging task due to the short duration and low intensity of MEs.

Micro-Expression Spotting

Paper
Code

More is Better: A Database for Spontaneous Micro-Expression with High Frame Rates

no code implementations • 3 Jan 2023 • Sirui Zhao, Huaying Tang, Xinglong Mao, Shifeng Liu, Hanqing Tao, Hao Wang, Tong Xu, Enhong Chen

To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7, 526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.