How to Build a RAG Chatbot on Azure UK South Using Your Own Company Documents Without a Data Science Team

10 min read

Retrieval-augmented generation lets a chatbot answer questions using your own company documents rather than general internet knowledge. Azure now makes this buildable in a day, with all data staying in the UK South region. This guide walks through the three Azure services you need — Azure AI Search for retrieval, Azure OpenAI Service for the language model, and Azure AI Foundry as the orchestration layer — from a Basic-tier pilot costing around £120 per month to a production deployment handling thousands of daily queries. No Python scripting is required for the initial build: Foundry's guided 'On Your Data' experience handles indexing, chunking and prompt configuration through a browser interface. The article covers architecture options, realistic cost breakdowns for 50-person and 200-person organisations, failure modes worth planning for, and a 12-point checklist to run before going live.

Written by Thomas Burke

Bottom line: You can build a working RAG chatbot on Azure UK South in a single day using Azure AI Foundry's guided setup. The three services you need are Azure AI Search (retrieval), Azure OpenAI Service (language model) and Azure AI Foundry (orchestration). A Basic-tier pilot costs around £120 per month. All data stays in the UK South region. No Python, no Jupyter notebooks, no data science team required for the initial build — though you will want an engineer involved when you move to production.

What RAG Actually Is and Why It Matters for UK Data Residency

Retrieval-augmented generation is a pattern, not a product. It works in two steps. First, a search index finds the chunks of your documents that are relevant to the user's question. Then, a large language model reads those chunks and generates an answer grounded in your actual data rather than its general training.

The reason this matters for UK organisations is data residency. When you deploy all three components — search index, language model and orchestration — in Azure UK South (London), your documents never leave the UK. The language model does not train on your data. Azure OpenAI Service in UK South currently offers GPT-4o, GPT-4o mini and text-embedding-3-large, which is everything you need for a production RAG pipeline.

This is not a theoretical architecture. Microsoft's own Foundry platform now offers a guided browser-based experience called 'On Your Data' that handles document ingestion, chunking, vector embedding and prompt configuration without writing a line of code. The floor for getting started has dropped from 'hire a machine learning engineer' to 'have someone comfortable with the Azure Portal'.

The Three Services You Need

Every Azure RAG chatbot uses the same three building blocks.

Azure AI Search is the retrieval layer. It stores your documents as searchable chunks — both as traditional keyword indexes and as vector embeddings. When a user asks a question, Azure AI Search finds the top 5-10 relevant chunks. The Basic tier supports up to 2 GB of index storage and 3 replicas, which handles roughly 15,000 to 25,000 pages of documents comfortably.

Azure OpenAI Service provides the language model. GPT-4o is the current best option for answer quality. It reads the retrieved chunks, reasons across them, and generates a grounded answer with citations. The embedding model — text-embedding-3-large — converts your documents and queries into vectors for semantic search. Both are available in UK South on pay-as-you-go pricing.

Azure AI Foundry (previously called Azure AI Studio) is the orchestration layer. It connects the search index to the language model, manages prompt templates, handles conversation history and provides evaluation tools. Since November 2025, Foundry also offers Foundry IQ, a managed knowledge-base feature that handles the full RAG pipeline with minimal configuration.

ServiceRoleUK South AvailableStarting Tier------------Azure AI SearchRetrieval + vector indexYesBasic (~£60/month)Azure OpenAI ServiceGPT-4o + embeddingsYesPay-as-you-goAzure AI FoundryOrchestration + evaluationYesFree (portal-based)Azure App ServiceFront-end hosting (optional)YesBasic (~£10/month)

Architecture Options from Pilot to Production

There are three tiers worth considering, depending on your document volume and user count.

Tier 1 — Quick Pilot (1-3 users, under 5,000 pages) Use Azure AI Foundry's 'On Your Data' guided experience. Upload your PDFs, Word documents or text files directly. Foundry handles chunking, embedding and index creation. Deploy a test chat interface from the Foundry portal. Total build time: 2-4 hours. No code required.

Tier 2 — Department Rollout (10-50 users, 5,000-50,000 pages) Upgrade Azure AI Search from Basic to Standard S1 for higher query throughput and larger indexes. Deploy a proper front-end using Azure App Service or integrate with Microsoft Teams via a Power Virtual Agent connector. Add authentication through Azure Entra ID. An engineer spends 2-3 days on setup and testing.

Tier 3 — Production (50+ users, 50,000+ pages, SLA required) Move to Standard S2 search with multiple replicas for high availability. Add a content pipeline that automatically re-indexes when source documents change. Build custom prompt engineering for domain-specific accuracy. Add logging, monitoring and feedback loops through Azure Application Insights. This is where you either upskill an internal engineer or bring in a partner for a 2-4 week engagement.

TierUsersDocumentsAzure AI SearchMonthly Cost (Est.)Build Time------------------Pilot1-3Under 5,000 pagesBasic£120-150Half a dayDepartment10-505,000-50,000 pagesStandard S1£300-5002-3 daysProduction50+50,000+ pagesStandard S2+£800-2,000+2-4 weeks

Step-by-Step Build in Azure AI Foundry

This walkthrough covers the Tier 1 pilot build. You need an Azure subscription with the OpenAI service enabled and access to the UK South region.

Step 1 — Create the Azure AI Search resource. In the Azure Portal, search for 'AI Search' and create a new resource. Select UK South as the region. Choose the Basic pricing tier. The deployment takes 2-3 minutes.

Step 2 — Create the Azure OpenAI Service resource. Search for 'Azure OpenAI' in the Portal and create a new resource in UK South. Once deployed, go to Model deployments and deploy two models: GPT-4o (for chat completion) and text-embedding-3-large (for vector search). Name them something clear — 'gpt-4o-chat' and 'embedding-3-large' work fine.

Step 3 — Open Azure AI Foundry. Go to ai.azure.com and create a new project linked to your Azure OpenAI resource. Select UK South as the region.

Step 4 — Add your data. In the Foundry project, select 'Chat' from the left navigation, then click 'Add your data'. Choose Azure AI Search as the data source. Connect it to the search resource you created in Step 1. Upload your documents — PDF, DOCX, TXT, PPTX and HTML are all supported. Foundry will chunk the documents, generate embeddings using your text-embedding-3-large deployment, and create the search index automatically.

Step 5 — Configure the system prompt. The system prompt tells the model how to behave. For a company knowledge base, something like: 'You are a helpful assistant that answers questions using only the provided company documents. If the answer is not in the documents, say so. Always cite the source document.' Keep it direct and specific.

Step 6 — Test in the Foundry playground. Ask questions against your documents. Check that answers cite the right sources. Test edge cases — questions where the answer spans multiple documents, questions where the answer is not in the data, and questions that are ambiguous.

Step 7 — Deploy. Click 'Deploy to' and choose either a web app (Azure App Service) or an API endpoint. The web app option gives you a working chat interface with authentication in under 10 minutes. The API option lets your developers integrate the chatbot into existing applications.

Realistic Pricing for UK Organisations

Azure pricing is consumption-based, which makes forecasting awkward until you have real usage data. Here are realistic monthly estimates based on mid-market organisations running RAG chatbots in production.

GPT-4o on pay-as-you-go costs $2.50 per million input tokens and $10 per million output tokens. A typical RAG query sends around 3,000-5,000 tokens of context plus the user question (input) and receives 300-500 tokens back (output). At 500 queries per day, that works out to roughly £30-50 per month for the language model alone.

Text-embedding-3-large costs $0.13 per million tokens. Initial indexing of 10,000 pages costs roughly £2-3. Re-indexing is incremental and negligible.

Azure AI Search is the fixed-cost anchor. Basic is approximately £60 per month. Standard S1 jumps to roughly £196 per month but adds higher throughput, more storage and replica support.

Cost ComponentPilot (500 queries/day)Department (2,000 queries/day)Production (10,000 queries/day)------------Azure AI Search£60 (Basic)£196 (S1)£590+ (S2)GPT-4o (pay-as-you-go)£30-50£120-200£400-800Embeddings£3 (one-off)£5£10App Service / front-end£10£40£100Monthly total£103-123£361-441£1,100-1,500

Provisioned Throughput Units (PTUs) make sense at the production tier if you need guaranteed latency. PTU pricing starts at roughly $2,448 per month for 50 PTUs of GPT-4o — overkill for anything below enterprise scale.

What Can Go Wrong and How to Fix It

Poor answer quality from bad chunking. The default chunking strategy in Foundry splits documents into 1,024-token chunks with 128-token overlaps. This works for general documents but can split tables, numbered lists and multi-part policies across chunks, losing context. Fix: test with smaller chunk sizes (512 tokens) for structured documents, or use semantic chunking if your documents have clear section headers.

The chatbot 'hallucinates' despite RAG. RAG reduces hallucination but does not eliminate it. If the retrieved chunks are marginally relevant, the model may still generate plausible-sounding but wrong answers. Fix: tune the search relevance threshold — Foundry lets you set a minimum search score below which chunks are discarded. Also add explicit instructions in the system prompt: 'If the provided documents do not contain enough information to answer the question, say: I could not find the answer in the available documents.'

Slow response times. The main bottleneck is usually Azure AI Search query latency at scale. Basic tier has no SLA and can be slow under concurrent load. Fix: upgrade to Standard with replicas. Each replica adds query capacity. Two replicas give you read high availability.

Stale answers from outdated documents. If your source documents change regularly — policy updates, product specifications, pricing — the search index needs refreshing. Foundry IQ handles this automatically for managed knowledge bases. For custom builds, set up an Azure Function on a timer to re-index changed documents.

Users asking questions outside the document scope. Without guardrails, the model may answer general knowledge questions using its training data rather than your documents. Fix: constrain the system prompt strictly and consider adding a classifier that detects out-of-scope queries before they reach the model.

When to Bring in Specialists

The Tier 1 pilot genuinely does not need a data science team. Any engineer or technically confident IT administrator who can work the Azure Portal can get it running in half a day.

Bring in specialist help when:

  • You need to connect the chatbot to live data sources (APIs, databases, SharePoint libraries that update daily) - Your documents require custom parsing — scanned PDFs, handwritten notes, complex table extraction - You need to fine-tune response quality for domain-specific accuracy (legal, medical, financial) - You are processing personal data through the chatbot and need a Data Protection Impact Assessment - You want to add multi-turn conversation memory beyond the default context window - You need to meet a specific SLA for uptime and response latency

For a mid-market organisation, the typical engagement with a Microsoft partner runs 2-4 weeks and costs £15,000-£40,000 for a production-grade RAG deployment including authentication, monitoring, feedback loops and document pipeline automation.

Pre-Build Checklist

1. Confirm your Azure subscription has Azure OpenAI Service access approved — this still requires an application for new subscriptions 2. Check that GPT-4o and text-embedding-3-large are available in UK South for your subscription (model availability varies by subscription type) 3. Decide which documents go into the index first — start with a focused collection of 50-100 documents, not the entire SharePoint 4. Confirm who owns the Azure subscription and has Contributor access to create resources 5. Check your organisation's data classification policy — are all the target documents approved for cloud processing in Azure UK South? 6. Decide on authentication — Azure Entra ID for internal users is the obvious choice 7. Set a test plan: prepare 20-30 questions with known correct answers from your documents before building 8. Budget for three months of pilot running costs before deciding whether to scale 9. Identify who will maintain the index when source documents change 10. If processing personal data, involve your Data Protection Officer before uploading anything 11. Agree on success criteria: what answer accuracy rate justifies moving from pilot to department rollout? 12. Document the architecture for your IT team — even a one-page diagram of which services connect to what

Frequently Asked Questions

Does my company data leave the UK in this setup?

No. When you deploy Azure AI Search, Azure OpenAI Service and Azure AI Foundry all in the UK South region (London), your documents and queries stay within UK borders. Azure OpenAI Service does not use your data for model training. Check your Azure resource configuration to confirm all three services are set to UK South.

Do I need to know Python to build a RAG chatbot on Azure?

Not for the initial pilot. Azure AI Foundry's 'On Your Data' experience handles document upload, chunking, embedding and deployment through a browser interface. You will need coding skills — or a developer — if you want to customise the pipeline, connect live data sources, or build a custom front-end beyond the default web app.

How accurate are the answers from a RAG chatbot?

Accuracy depends on your document quality, chunking strategy and prompt engineering. A well-configured RAG chatbot typically answers 80-90% of in-scope questions correctly with proper source citations. The remaining 10-20% are usually edge cases where answers span multiple documents or where the question is ambiguous. Test with at least 30 known-answer questions before going live.

Can I connect the RAG chatbot to SharePoint or Teams?

Yes. Azure AI Search has native connectors for SharePoint Online, Azure Blob Storage and Azure SQL Database. To add the chatbot to Teams, you can deploy the chatbot as a Teams app using Power Virtual Agents or the Azure Bot Service. The SharePoint connector can automatically re-index when documents change.

What is the difference between Azure AI Foundry and Azure OpenAI Service?

Azure OpenAI Service is the API that hosts the language models (GPT-4o, embeddings). Azure AI Foundry is the platform that sits on top — it provides the project workspace, the 'On Your Data' setup wizard, evaluation tools and deployment options. Think of OpenAI Service as the engine and Foundry as the dashboard.

How long does it take to move from pilot to production?

Typical timeline is 6-8 weeks for a mid-market organisation. The pilot takes half a day to build. Testing and refinement takes 2-3 weeks. Adding authentication, monitoring, feedback collection and a document pipeline for production adds another 2-4 weeks, depending on whether you do it in-house or engage a Microsoft partner.

About the Author

Thomas Burke

With a background in Film Studies, I bring a cinematic approach to corporate communications. I don't believe in simply pointing a camera; I believe in a full 360° support system. This means I work closely with marketing teams and IT leaders on: Pre-production strategy to clarify the message. Media training to ensure executives are comfortable and authoritative. End-to-end production that is cost-effective and seamless. My work is defined by absolute professionalism and high standards; a commitment that has led to successful projects for the world’s largest IT companies and the British Royal Family.