BotBeat
...
← Back

> ▌

UC BerkeleyUC Berkeley
RESEARCHUC Berkeley2026-07-02

UC Berkeley's DocETL Brings Declarative LLM-Powered Data Processing to VLDB 2025

Key Takeaways

  • ▸DocETL makes LLM-powered data processing more accessible by allowing users to define operations in natural language rather than writing code
  • ▸Automatic optimization reduces computational costs and improves accuracy by intelligently selecting models, rewriting prompts, and substituting code where beneficial
  • ▸The framework supports multiple LLM providers (OpenAI, Anthropic, etc.) and includes both Python API and low-code YAML declarative syntax
Source:
Hacker Newshttps://github.com/ucbepic/docetl↗

Summary

Researchers at UC Berkeley's EPIC Data Lab have published DocETL, an open-source framework that simplifies complex data processing pipelines using large language models. Rather than writing individual LLM calls and manually optimizing them, users can declare operations in natural language—such as "pull out every complaint in this ticket"—and DocETL handles the heavy lifting with map-reduce operators, automatic parallelization, and intelligent optimization. The framework automatically tunes accuracy, cost, and latency by swapping models, rewriting prompts, and replacing LLM subtasks with code where appropriate. The paper was published at VLDB 2025, one of the top database systems conferences, alongside a companion research paper on DocWrangler (Best Paper Honorable Mention at UIST 2025), an interactive UI for visual pipeline development.

  • Distributed across open source with interactive playground (DocWrangler UI) and Claude Code integration for quick pipeline development

Editorial Opinion

DocETL addresses a real pain point in the LLM application space—the gap between simple one-off LLM calls and production-grade data pipelines. By combining declarative syntax with automatic optimization, it could significantly lower the barrier to entry for building sophisticated data processing workflows. The dual publications at VLDB and UIST suggest the authors have thought deeply about both the systems architecture and the user experience, which is rare in academic work.

Generative AIAI AgentsData Science & AnalyticsMLOps & InfrastructureScience & ResearchOpen Source

More from UC Berkeley

UC BerkeleyUC Berkeley
RESEARCH

UC Berkeley Researchers Introduce ENPIRE: Autonomous Framework for Real-World Robot Policy Improvement

2026-06-17
UC BerkeleyUC Berkeley
RESEARCH

UC Berkeley ADRS Project Explores Memory Management for AI-Driven GPU Code Generation

2026-06-11
UC BerkeleyUC Berkeley
RESEARCH

CommBench: Researchers Reveal Critical Gap in LLMs' GPU Communication Code Generation

2026-06-11

Comments

Suggested

AnthropicAnthropic
PRODUCT LAUNCH

Anthropic Launches Claude Science: AI Research Workbench for Life Scientists

2026-07-02
Google / AlphabetGoogle / Alphabet
RESEARCH

Google Retrofits Multi-Token Prediction Into Frozen Gemini Nano Models for Faster Mobile AI

2026-07-02
PalantirPalantir
INDUSTRY REPORT

Palantir CEO Alex Karp Warns Industry Against Problematic AI Sales Practices

2026-07-02
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us