--- license: mit datasets: - tahoebio/Tahoe-100M tags: - tahoe-deepdive - hackathon - tahoe-100M --- # Team Name **Kepler** ## Members - Ashton Teng - Quinn Leng # Project ## Title Kepler: Natural Language AI Agent for Tahoe-100M Exploration ## Overview Kepler lets biologists query the Tahoe-100M dataset in plain English, automating data access, analysis, and visualization without coding. ## Motivation High-dimensional datasets like Tahoe-100M require heavy compute setup, tool expertise, and programming skill—barriers that slow scientific insight. We demonstrate the capability for the agent to allow for users to perform simple analyses with natural language. ## Methods - Extracted a pseudobulked subset with Vision differential expression scores. - Loaded metadata tables for cell lines, drugs, and gene sets. - Built an AI agent to translate natural-language queries into analysis code and visual outputs. ## Results Demo query: “Which pathways are upregulated in BRAF.V600E mutant models after inhibitor treatment?” Agent automatically filtered the data, ran the analysis, and generated plots with interpretations. ## Discussion - **Scalability:** Move initial subsetting to DuckDB or Databricks for larger subsets. - **Knowledge alignment:** Enhance the agent’s scientific context for broader, valid analyses. - **Next steps:** Expand to full Tahoe-100M and optimize compute pipeline.