[SIP-166] AI Assistant #33215
Labels
change:backend
Requires changing the backend
sip
Superset Improvement Proposal
sqllab
Namespace | Anything related to the SQL Lab
[SIP-166] Proposal for AI Assistant
Motivation
An accurate text-to-SQL translator (AI Assistant) can greatly enhance the SQLLab user experience by increasing productivity, supporting users with limited SQL knowledge, and making it easier to discover and access data in SQLLab.
Proposed Change
We propose implementing a text-to-SQL translator that is intentionally simple — avoiding the use of RAG, vector databases, or agentic LLM frameworks. This approach is designed to maximize compatibility across diverse database types and sizes, provide flexible configuration options, and leverage user-supplied context filtering when available. The system is built to handle scenarios with limited support gracefully, ensuring robust operation even when some functionality is unavailable.
We believe that by intentionally keeping this solution simple and avoiding complex dependencies, it will be easier for the community to reach consensus and approve its inclusion. This practical and accessible first implementation of the AI Assistant is designed to accelerate its adoption and help it materialize sooner as an official Superset OSS release.
The AI Assistant was developed in alignment with the guiding principles outlined above, within a dedicated fork of the Superset repository, based on the 5.0.0rc2 tag. For a comprehensive overview of its features and configuration, refer to the AI Assistant documentation.
New or Changed Public Interfaces
React Components:
REST Endpoints:
sqllab/generate_db_context
: Initiates a rebuild of the database metadata LLM context.sqllab/generate_sql
: Sends user prompts to the LLM provider to generate SQL queries.sqllab/db_context_status
: Retrieves the status of the database metadata context and the context builder worker.database/{db_id}/schema_tables
: Returns all schemas and tables for a specified database.Dashboards or Visualizations:
No changes.
Superset CLI:
No changes.
Deployment:
No changes.
To simplify the setup of a custom Docker Compose deployment (e.g. deploying this fork), we have provided a shell script and configuration files. Detailed instructions and resources can be found here.
New dependencies
The new dependencies introduced are primarily related to integration with supported LLM API providers and data structure validation for building the database metadata context JSON file:
google-genai
: Python SDK for Google Generative AI.openai
: Python SDK for OpenAI models.anthropic
: Python SDK for Anthropic models.pydantic
: Used for robust data validation and serialization.These dependencies are required to enable AI Assistant functionality and ensure reliable handling of LLM-related data.
Migration Plan and Compatibility
Since these are additive changes, migration should be straightforward.
Changes to metadata database tables:
llm_provider
llm_model
llm_api_key
llm_enabled
llm_context_options
No breaking changes are expected, and existing deployments can be upgraded without data loss. Standard database migration procedures apply.
The text was updated successfully, but these errors were encountered: