A Retrieval-Augmented Generation (RAG) chatbot powered by Mistral-7B-Instruct, designed to deliver thoughtful, down-to-earth responses inspired by Daoist philosophy (Laozi, Zhuangzi). Built using FastAPI and optimized for lightweight, locally-hosted usage.
- RAG Architecture: Retrieves relevant passages from a corpus of Daoist notes using semantic search with FAISS.
- Philosophy-Grounded Prompting: Uses a system message that simulates a grounded Daoist mentor — gentle, honest, and reflective.
- Quantized Model Inference: Runs Mistral-7B in 4-bit using bitsandbytes for faster generation on consumer GPUs.
- Dynamic Knowledge Base: Loads and chunks
.md
notes from Daoist texts into embeddings at startup. - API Endpoint: Exposes a clean
POST /chat
route for frontend integration (e.g. Unity or web).
Component | Technology |
---|---|
Language Model | mistralai/Mistral-7B-Instruct-v0.2 |
Embeddings | sentence-transformers/all-MiniLM-L6-v2 |
Vector Search | FAISS |
API Framework | FastAPI |
Quantization | bitsandbytes (4-bit NF4) |
Tokenizer & Model | transformers |
git clone https://github.com/Awakuruf/rag-chatbot.git
cd rag-bot
1.Create a virtual environment:
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
- Install dependencies:
pip install -r requirements.txt
- Run the FastAPI server:
cd .\app\
uvicorn main:app --reload
- The Unity game will POST messages to http://127.0.0.1:8000/chat.
rag-chatbot/
│
├── app/
│ ├── main.py # FastAPI app with /chat endpoint
│ ├── rag_pipeline.py # Core RAG logic
│ ├── ingest.py # Loads + embeds PDF/Markdown/web docs
│
├── data/
│ ├── daodejing.pdf
│ ├── daodejing_notes.md
| ├── zhuangzi.pdf
│ └── zhuangzi_notes.md
│
├── requirements.txt
├── example_responses.txt
└── README.md
-
Long response times from AI? Consider:
- Reducing max_new_tokens
- Using smaller models like mistral-7b-instruct in 4-bit mode
- Chunking your documents more efficiently
MIT License. Feel free to remix or adapt for educational and non-commercial purposes.