Knowledge Settings (RAG)
Knowledge settings control how your agent retrieves and processes information from its assigned Knowledge Buckets. Texterz uses an advanced Retrieval-Augmented Generation (RAG) architecture to ensure your agents provide accurate, data-backed responses.
1. Retrieval Strategy
When a user asks a question, the system does not send all your documents to the AI (which would be too slow and expensive). Instead:
- Search: It searches your buckets for the most relevant "chunks" of text.
- Context Construction: It picks the top results and injects them into the AI's prompt as "Context".
- Generation: The AI answers based only on that context.
2. Similarity Threshold
The Similarity Threshold (range 0.0 to 1.0) is the most important setting for controlling accuracy.
| Value | Behavior | Best For |
|---|---|---|
| High (0.8+) | Strict. The AI only uses information that is a near-perfect match. | Technical manuals, legal docs, pricing lists. |
| Medium (0.5 - 0.7) | Balanced. The AI uses relevant info but allows for slight variations in phrasing. | General FAQs, customer service, sales. |
| Low (< 0.4) | Creative. The AI pulls in even loosely related info. | Risk of Hallucination. Only use for very broad topics. |
Pro Tip: If your bot often says "I don't know," lower the threshold by 0.1. If it makes up facts, increase it.
3. Web Crawling & Scraping
When adding websites to a bucket, you can fine-tune how our crawler behaves:
Crawl Depth
- Single Page: Only the exact URL is indexed.
- Directory: All pages under that path (e.g.,
/docs/*). - Domain: The entire website (use with caution to avoid storage limits).
Exclusion Patterns
You can prevent the crawler from indexing specific areas using regex or simple paths:
/admin/*/login*?session_id=*
4. Notion Integration
Connect your Notion workspace to import pages and databases directly into a Knowledge Bucket.
- Selective Sync: Choose exactly which pages or sub-pages the bot should access.
- Automatic Updates: When you edit a page in Notion, the Knowledge Bucket can be refreshed to stay up to date.
- Support Documentation: Perfect for syncing internal wiki pages or public help centers hosted on Notion.
5. Document Processing (Automatic)
Texterz automatically optimizes your data during upload:
- Chunking: Large files are split into ~500-1000 character pieces with overlapping margins to preserve context.
- Cleaning: Boilerplate (headers, footers, navigation) is stripped from web pages to keep the AI focused.
- Embedding: Every chunk is converted into a 1536-dimensional vector for high-speed semantic search.
Best Practices
For a guide on organizing your data, see 1. Managing Knowledge.