Databases Papers
@UFCS
Followers
149
Following
1
Media
0
Statuses
11K
Covers database management, datamining, and data processing - new submissions to https://t.co/nAsAct7zgt (not affiliated with arXiv)
Joined November 2010
SHARP: Shared State Reduction for Efficient Matching of Sequential Patterns.
arxiv.org
The detection of sequential patterns in data is a basic functionality of modern data processing systems for complex event processing (CEP), OLAP, and retrieval-augmented generation (RAG). In...
0
0
0
TCSR-SQL: Towards Table Content-aware Text-to-SQL with Self-retrieval.
arxiv.org
Large Language Model-based (LLM-based) Text-to-SQL methods have achieved important progress in generating SQL queries for real-world applications. When confronted with table content-aware...
0
0
1
Are We Asking the Right Questions? On Ambiguity in Natural Language Queries for Tabular Data Analysis.
arxiv.org
Natural language interfaces to tabular data must handle ambiguities inherent to queries. Instead of treating ambiguity as a deficiency, we reframe it as a feature of cooperative interaction, where...
0
0
0
RUST-BENCH: Benchmarking LLM Reasoning on Unstructured Text within Structured Tables.
arxiv.org
Existing tabular reasoning benchmarks mostly test models on small, uniform tables, underrepresenting the complexity of real-world data and giving an incomplete view of Large Language Models'...
0
0
0
Coordination-Free Lane Partitioning for Convergent ANN Search.
arxiv.org
Production vector search systems often fan out each query across parallel lanes (threads, replicas, or shards) to meet latency service-level objectives (SLOs). In practice, these lanes rediscover...
0
0
0
BAPPA: Benchmarking Agents, Plans, and Pipelines for Automated Text-to-SQL Generation.
arxiv.org
Text-to-SQL systems provide a natural language interface that can enable even laymen to access information stored in databases. However, existing Large Language Models (LLM) struggle with SQL...
0
0
0
Learning Filter-Aware Distance Metrics for Nearest Neighbor Search with Multiple Filters.
arxiv.org
Filtered Approximate Nearest Neighbor (ANN) search retrieves the closest vectors for a query vector from a dataset. It enforces that a specified set of discrete labels $S$ for the query must be...
0
0
0
OptiMA: A Transaction-Based Framework with Throughput Optimization for Very Complex Multi-Agent Systems.
arxiv.org
In recent years, the research of multi-agent systems has taken a direction to explore larger and more complex models to fulfill sophisticated tasks. We point out two possible pitfalls that might...
0
0
0
EntroGD: Efficient Compression and Accurate Direct Analytics on Compressed Data.
arxiv.org
Generalized Deduplication (GD) enables lossless compression with direct analytics on compressed data by dividing data into \emph{bases} and \emph{deviations} and performing dictionary encoding on...
0
0
0
GPU-Based Floating-point Adaptive Lossless Compression.
arxiv.org
Domains such as IoT (Internet of Things) and HPC (High Performance Computing) generate a torrential influx of floating-point time-series data. Compressing these data while preserving their...
0
0
0
Aegis: A Correlation-Based Data Masking Advisor for Data Sharing Ecosystems.
arxiv.org
Data-sharing ecosystems enable entities -- such as providers, consumers, and intermediaries -- to access, exchange, and utilize data for various downstream tasks and applications. Due to privacy...
0
0
0
Why Isn't Relational Learning Taking Over the World?.
arxiv.org
Artificial intelligence seems to be taking over the world with systems that model pixels, words, and phonemes. The world is arguably made up, not of pixels, words, and phonemes but of entities...
0
0
0
L2T-Tune:LLM-Guided Hybrid Database Tuning with LHS and TD3.
arxiv.org
Configuration tuning is critical for database performance. Although recent advancements in database tuning have shown promising results in throughput and latency improvement, challenges remain....
0
0
0
Differentially Private Data Generation with Missing Data.
arxiv.org
Despite several works that succeed in generating synthetic data with differential privacy (DP) guarantees, they are inadequate for generating high-quality synthetic data when the input data has...
0
0
0
In-Memory Indexing and Querying of Provenance in Data Preparation Pipelines.
arxiv.org
Data provenance has numerous applications in the context of data preparation pipelines. It can be used for debugging faulty pipelines, interpreting results, verifying fairness, and identifying...
0
0
0
HERP: Hardware for Energy Efficient and Realtime DB Search and Cluster Expansion in Proteomics.
arxiv.org
Database (DB) search and clustering are fundamental in proteomics but conventional full clustering and search approaches demand high resources and incur long latency. We propose a lightweight...
0
0
0
Formalizing ETLT and ELTL Design Patterns and Proposing Enhanced Variants: A Systematic Framework for Modern Data Engineering.
arxiv.org
Traditional ETL and ELT design patterns struggle to meet modern requirements of scalability, governance, and real-time data processing. Hybrid approaches such as ETLT...
0
0
0
Explainable Graph Neural Architecture Search via Monte-Carlo Tree Search (Full version).
arxiv.org
The number of graph neural network (GNN) architectures has increased rapidly due to the growing adoption of graph analysis. Although we use GNNs in wide application scenarios, it is a laborious...
0
0
0
Relational Deep Dive: Error-Aware Queries Over Unstructured Data.
arxiv.org
Unstructured data is pervasive, but analytical queries demand structured representations, creating a significant extraction challenge. Existing methods like RAG lack schema awareness and struggle...
0
0
0