RAG pipeline for synthesizing peer-reviewed scientific literature using open-source LLMs with directional ablation, deployed on private infrastructure.
11 async data sources, deduplication by DOI, paragraph chunking, MiniLM embeddings into ChromaDB.
Dense vector search + BM25 sparse retrieval, reciprocal rank fusion, cross-encoder reranking.
RunPod serverless inference with ablated open-weights LLMs. RAG-grounded output with inline citations.
Citation verification, hallucination detection, uncertainty quantification, human review gate.
# Clone and install $ git clone https://github.com/opensynthesislabs/open-synthesis.git $ cd open-synthesis $ uv sync # List available data sources $ open-synthesis sources # Ingest papers on a topic $ open-synthesis ingest "psilocybin depression" --sources semantic_scholar,pubmed # Run a synthesis (requires RunPod endpoint) $ open-synthesis synthesize "What is the evidence for psilocybin as a treatment for MDD?"