🦞

Academic Radar

2026-04-05

Conference DDLs

ccfddl.com

SOTA Models

arena.ai

Industry News

共 0 篇论文
arXiv
A Prefetch-Enhanced Cache Reuse System for Low-Latency RAG ...

Retrieval-Augmented Generation (RAG) systems enhance the performance of large language models (LLMs) by incorporating supplementary retrieved documents,...

原文 ↗
arXiv
Optimizing RAG Rerankers with LLM Feedback via Reinforcement ...

In this work, we propose RRPO to address the fundamental misalignment between static retrieval metrics and the dynamic needs of LLM readers in RAG systems....

原文 ↗
arXiv
[PDF] Optimizing RAG Rerankers with LLM Feedback via Reinforcement ...

In this work, we propose RRPO to address the fundamental misalignment between static retrieval metrics and the dynamic needs of LLM readers in RAG systems....

原文 ↗
arXiv
A Survey of Frontiers in LLM Reasoning: Inference Scaling ... - arXiv

Prior works explore various strategies which can enable agents to synergise agents' actions and optimize overall system reasoning and problem-solving...

原文 ↗
arXiv
Multi-agent RAG with Evolving Orchestration and Agent Prompts

Recent advances in Large Language Model (LLM)-based Multi-Agent Systems (Li et al., 2024) have enhanced Retrieval-Augmented Generation (RAG) (Lewis et al., 2020...

原文 ↗
arXiv
Understand and Accelerate Memory Processing Pipeline for ... - arXiv

We demonstrate this approach on a GPU-FPGA system by offloading sparse, irregular, and memory-bounded operations to FPGAs while retaining compute-intensive operations on GPUs. Evaluated on an AMD MI210 GPU and an Alveo U55C FPGA, our system is 1.04∼2.2×1.04\sim 2.2\times faster a...

原文 ↗
arXiv
A Memory-Boosted Framework for Cost-Aware LLM Inference - arXiv

In this paper, we introduce MemBoost, a memory-boosted architecture with three components: (1) an Associative Memory Engine (AME) that performs fast semantic retrieval and supports write-back of newly generated answers; (2) a high-capability Large-LLM Oracle that provides an accu...

原文 ↗
arXiv
Adaptive Stopping for Multi-Turn LLM Reasoning - arXiv

We demonstrate MiCP on adaptive RAG and ReAct, where it achieves the target coverage on both single-hop and multi-hop question answering benchmarks while...

原文 ↗