ashtonteng commited on
Commit
91ff058
·
verified ·
1 Parent(s): 638a44c

Upload 2 files

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +42 -0
  3. Tahoe-100M.pdf +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ Tahoe-100M.pdf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - tahoebio/Tahoe-100M
5
+ tags:
6
+ - tahoe-deepdive
7
+ - hackathon
8
+ - tahoe-100M
9
+ ---
10
+
11
+ # Team Name
12
+ **Kepler**
13
+ ## Members
14
+ - Ashton Teng @ashtonteng
15
+ - Quinn Leng
16
+ - [Affiliation, GitHub handles if applicable]
17
+
18
+ # Project
19
+ ## Title
20
+ Kepler: Natural Language AI Agent for Tahoe-100M Exploration
21
+
22
+ ## Overview
23
+ Kepler lets biologists query the Tahoe-100M dataset in plain English, automating data access, analysis, and visualization without coding.
24
+
25
+ ## Motivation
26
+ High-dimensional datasets like Tahoe-100M require heavy compute setup, tool expertise, and programming skill—barriers that slow scientific insight.
27
+
28
+ We demonstrate the capability for the agent to allow for users to perform simple analyses with natural language.
29
+
30
+ ## Methods
31
+ - Extracted a pseudobulked subset with Vision differential expression scores.
32
+ - Loaded metadata tables for cell lines, drugs, and gene sets.
33
+ - Built an AI agent to translate natural-language queries into analysis code and visual outputs.
34
+
35
+ ## Results
36
+ Demo query: “Which pathways are upregulated in BRAF.V600E mutant models after inhibitor treatment?”
37
+ Agent automatically filtered the data, ran the analysis, and generated plots with interpretations.
38
+
39
+ ## Discussion
40
+ - **Scalability:** Move initial subsetting to DuckDB or Databricks for larger subsets.
41
+ - **Knowledge alignment:** Enhance the agent’s scientific context for broader, valid analyses.
42
+ - **Next steps:** Expand to full Tahoe-100M and optimize compute pipeline.
Tahoe-100M.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2fe23a77dd0c8186edbad785da55dba15e11e3bb9227fa7d3452573c86d9f478
3
+ size 528392