Skip to main content

Knowledge Settings (RAG)

Knowledge settings control how your agent retrieves and processes information from its assigned Knowledge Buckets. Texterz uses an advanced Retrieval-Augmented Generation (RAG) architecture to ensure your agents provide accurate, data-backed responses.

1. Retrieval Strategy

When a user asks a question, the system does not send all your documents to the AI (which would be too slow and expensive). Instead:

  1. Search: It searches your buckets for the most relevant "chunks" of text.
  2. Context Construction: It picks the top results and injects them into the AI's prompt as "Context".
  3. Generation: The AI answers based only on that context.

2. Similarity Threshold

The Similarity Threshold (range 0.0 to 1.0) is the most important setting for controlling accuracy.

ValueBehaviorBest For
High (0.8+)Strict. The AI only uses information that is a near-perfect match.Technical manuals, legal docs, pricing lists.
Medium (0.5 - 0.7)Balanced. The AI uses relevant info but allows for slight variations in phrasing.General FAQs, customer service, sales.
Low (< 0.4)Creative. The AI pulls in even loosely related info.Risk of Hallucination. Only use for very broad topics.

Pro Tip: If your bot often says "I don't know," lower the threshold by 0.1. If it makes up facts, increase it.

3. Web Crawling & Scraping

When adding websites to a bucket, you can fine-tune how our crawler behaves:

Crawl Depth

  • Single Page: Only the exact URL is indexed.
  • Directory: All pages under that path (e.g., /docs/*).
  • Domain: The entire website (use with caution to avoid storage limits).

Exclusion Patterns

You can prevent the crawler from indexing specific areas using regex or simple paths:

  • /admin/*
  • /login
  • *?session_id=*

4. Notion Integration

Connect your Notion workspace to import pages and databases directly into a Knowledge Bucket.

  • Selective Sync: Choose exactly which pages or sub-pages the bot should access.
  • Automatic Updates: When you edit a page in Notion, the Knowledge Bucket can be refreshed to stay up to date.
  • Support Documentation: Perfect for syncing internal wiki pages or public help centers hosted on Notion.

5. Document Processing (Automatic)

Texterz automatically optimizes your data during upload:

  • Chunking: Large files are split into ~500-1000 character pieces with overlapping margins to preserve context.
  • Cleaning: Boilerplate (headers, footers, navigation) is stripped from web pages to keep the AI focused.
  • Embedding: Every chunk is converted into a 1536-dimensional vector for high-speed semantic search.

Best Practices

For a guide on organizing your data, see 1. Managing Knowledge.