Bottom line: You can build a working RAG chatbot on Azure UK South in a single day using Azure AI Foundry's guided setup. The three services you need are Azure AI Search (retrieval), Azure OpenAI Service (language model) and Azure AI Foundry (orchestration). A Basic-tier pilot costs around £120 per month. All data stays in the UK South region. No Python, no Jupyter notebooks, no data science team required for the initial build — though you will want an engineer involved when you move to production.
What RAG Actually Is and Why It Matters for UK Data Residency
Retrieval-augmented generation is a pattern, not a product. It works in two steps. First, a search index finds the chunks of your documents that are relevant to the user's question. Then, a large language model reads those chunks and generates an answer grounded in your actual data rather than its general training.
The reason this matters for UK organisations is data residency. When you deploy all three components — search index, language model and orchestration — in Azure UK South (London), your documents never leave the UK. The language model does not train on your data. Azure OpenAI Service in UK South currently offers GPT-4o, GPT-4o mini and text-embedding-3-large, which is everything you need for a production RAG pipeline.
This is not a theoretical architecture. Microsoft's own Foundry platform now offers a guided browser-based experience called 'On Your Data' that handles document ingestion, chunking, vector embedding and prompt configuration without writing a line of code. The floor for getting started has dropped from 'hire a machine learning engineer' to 'have someone comfortable with the Azure Portal'.
The Three Services You Need
Every Azure RAG chatbot uses the same three building blocks.
Azure AI Search is the retrieval layer. It stores your documents as searchable chunks — both as traditional keyword indexes and as vector embeddings. When a user asks a question, Azure AI Search finds the top 5-10 relevant chunks. The Basic tier supports up to 2 GB of index storage and 3 replicas, which handles roughly 15,000 to 25,000 pages of documents comfortably.
Azure OpenAI Service provides the language model. GPT-4o is the current best option for answer quality. It reads the retrieved chunks, reasons across them, and generates a grounded answer with citations. The embedding model — text-embedding-3-large — converts your documents and queries into vectors for semantic search. Both are available in UK South on pay-as-you-go pricing.
Azure AI Foundry (previously called Azure AI Studio) is the orchestration layer. It connects the search index to the language model, manages prompt templates, handles conversation history and provides evaluation tools. Since November 2025, Foundry also offers Foundry IQ, a managed knowledge-base feature that handles the full RAG pipeline with minimal configuration.
| Service | Role | UK South Available | Starting Tier | --- | --- | --- | --- | Azure AI Search | Retrieval + vector index | Yes | Basic (~£60/month) | Azure OpenAI Service | GPT-4o + embeddings | Yes | Pay-as-you-go | Azure AI Foundry | Orchestration + evaluation | Yes | Free (portal-based) | Azure App Service | Front-end hosting (optional) | Yes | Basic (~£10/month) |
Architecture Options from Pilot to Production
There are three tiers worth considering, depending on your document volume and user count.
Tier 1 — Quick Pilot (1-3 users, under 5,000 pages) Use Azure AI Foundry's 'On Your Data' guided experience. Upload your PDFs, Word documents or text files directly. Foundry handles chunking, embedding and index creation. Deploy a test chat interface from the Foundry portal. Total build time: 2-4 hours. No code required.
Tier 2 — Department Rollout (10-50 users, 5,000-50,000 pages) Upgrade Azure AI Search from Basic to Standard S1 for higher query throughput and larger indexes. Deploy a proper front-end using Azure App Service or integrate with Microsoft Teams via a Power Virtual Agent connector. Add authentication through Azure Entra ID. An engineer spends 2-3 days on setup and testing.
Tier 3 — Production (50+ users, 50,000+ pages, SLA required) Move to Standard S2 search with multiple replicas for high availability. Add a content pipeline that automatically re-indexes when source documents change. Build custom prompt engineering for domain-specific accuracy. Add logging, monitoring and feedback loops through Azure Application Insights. This is where you either upskill an internal engineer or bring in a partner for a 2-4 week engagement.
| Tier | Users | Documents | Azure AI Search | Monthly Cost (Est.) | Build Time | --- | --- | --- | --- | --- | --- | Pilot | 1-3 | Under 5,000 pages | Basic | £120-150 | Half a day | Department | 10-50 | 5,000-50,000 pages | Standard S1 | £300-500 | 2-3 days | Production | 50+ | 50,000+ pages | Standard S2+ | £800-2,000+ | 2-4 weeks |
Step-by-Step Build in Azure AI Foundry
This walkthrough covers the Tier 1 pilot build. You need an Azure subscription with the OpenAI service enabled and access to the UK South region.
Step 1 — Create the Azure AI Search resource. In the Azure Portal, search for 'AI Search' and create a new resource. Select UK South as the region. Choose the Basic pricing tier. The deployment takes 2-3 minutes.
Step 2 — Create the Azure OpenAI Service resource. Search for 'Azure OpenAI' in the Portal and create a new resource in UK South. Once deployed, go to Model deployments and deploy two models: GPT-4o (for chat completion) and text-embedding-3-large (for vector search). Name them something clear — 'gpt-4o-chat' and 'embedding-3-large' work fine.
Step 3 — Open Azure AI Foundry. Go to ai.azure.com and create a new project linked to your Azure OpenAI resource. Select UK South as the region.
Step 4 — Add your data. In the Foundry project, select 'Chat' from the left navigation, then click 'Add your data'. Choose Azure AI Search as the data source. Connect it to the search resource you created in Step 1. Upload your documents — PDF, DOCX, TXT, PPTX and HTML are all supported. Foundry will chunk the documents, generate embeddings using your text-embedding-3-large deployment, and create the search index automatically.
Step 5 — Configure the system prompt. The system prompt tells the model how to behave. For a company knowledge base, something like: 'You are a helpful assistant that answers questions using only the provided company documents. If the answer is not in the documents, say so. Always cite the source document.' Keep it direct and specific.
Step 6 — Test in the Foundry playground. Ask questions against your documents. Check that answers cite the right sources. Test edge cases — questions where the answer spans multiple documents, questions where the answer is not in the data, and questions that are ambiguous.
Step 7 — Deploy. Click 'Deploy to' and choose either a web app (Azure App Service) or an API endpoint. The web app option gives you a working chat interface with authentication in under 10 minutes. The API option lets your developers integrate the chatbot into existing applications.
Realistic Pricing for UK Organisations
Azure pricing is consumption-based, which makes forecasting awkward until you have real usage data. Here are realistic monthly estimates based on mid-market organisations running RAG chatbots in production.
GPT-4o on pay-as-you-go costs $2.50 per million input tokens and $10 per million output tokens. A typical RAG query sends around 3,000-5,000 tokens of context plus the user question (input) and receives 300-500 tokens back (output). At 500 queries per day, that works out to roughly £30-50 per month for the language model alone.
Text-embedding-3-large costs $0.13 per million tokens. Initial indexing of 10,000 pages costs roughly £2-3. Re-indexing is incremental and negligible.
Azure AI Search is the fixed-cost anchor. Basic is approximately £60 per month. Standard S1 jumps to roughly £196 per month but adds higher throughput, more storage and replica support.
| Cost Component | Pilot (500 queries/day) | Department (2,000 queries/day) | Production (10,000 queries/day) | --- | --- | --- | --- | Azure AI Search | £60 (Basic) | £196 (S1) | £590+ (S2) | GPT-4o (pay-as-you-go) | £30-50 | £120-200 | £400-800 | Embeddings | £3 (one-off) | £5 | £10 | App Service / front-end | £10 | £40 | £100 | Monthly total | £103-123 | £361-441 | £1,100-1,500 |
Provisioned Throughput Units (PTUs) make sense at the production tier if you need guaranteed latency. PTU pricing starts at roughly $2,448 per month for 50 PTUs of GPT-4o — overkill for anything below enterprise scale.
What Can Go Wrong and How to Fix It
Poor answer quality from bad chunking. The default chunking strategy in Foundry splits documents into 1,024-token chunks with 128-token overlaps. This works for general documents but can split tables, numbered lists and multi-part policies across chunks, losing context. Fix: test with smaller chunk sizes (512 tokens) for structured documents, or use semantic chunking if your documents have clear section headers.
The chatbot 'hallucinates' despite RAG. RAG reduces hallucination but does not eliminate it. If the retrieved chunks are marginally relevant, the model may still generate plausible-sounding but wrong answers. Fix: tune the search relevance threshold — Foundry lets you set a minimum search score below which chunks are discarded. Also add explicit instructions in the system prompt: 'If the provided documents do not contain enough information to answer the question, say: I could not find the answer in the available documents.'
Slow response times. The main bottleneck is usually Azure AI Search query latency at scale. Basic tier has no SLA and can be slow under concurrent load. Fix: upgrade to Standard with replicas. Each replica adds query capacity. Two replicas give you read high availability.
Stale answers from outdated documents. If your source documents change regularly — policy updates, product specifications, pricing — the search index needs refreshing. Foundry IQ handles this automatically for managed knowledge bases. For custom builds, set up an Azure Function on a timer to re-index changed documents.
Users asking questions outside the document scope. Without guardrails, the model may answer general knowledge questions using its training data rather than your documents. Fix: constrain the system prompt strictly and consider adding a classifier that detects out-of-scope queries before they reach the model.
When to Bring in Specialists
The Tier 1 pilot genuinely does not need a data science team. Any engineer or technically confident IT administrator who can work the Azure Portal can get it running in half a day.
Bring in specialist help when:
- You need to connect the chatbot to live data sources (APIs, databases, SharePoint libraries that update daily) - Your documents require custom parsing — scanned PDFs, handwritten notes, complex table extraction - You need to fine-tune response quality for domain-specific accuracy (legal, medical, financial) - You are processing personal data through the chatbot and need a Data Protection Impact Assessment - You want to add multi-turn conversation memory beyond the default context window - You need to meet a specific SLA for uptime and response latency
For a mid-market organisation, the typical engagement with a Microsoft partner runs 2-4 weeks and costs £15,000-£40,000 for a production-grade RAG deployment including authentication, monitoring, feedback loops and document pipeline automation.
Pre-Build Checklist
1. Confirm your Azure subscription has Azure OpenAI Service access approved — this still requires an application for new subscriptions 2. Check that GPT-4o and text-embedding-3-large are available in UK South for your subscription (model availability varies by subscription type) 3. Decide which documents go into the index first — start with a focused collection of 50-100 documents, not the entire SharePoint 4. Confirm who owns the Azure subscription and has Contributor access to create resources 5. Check your organisation's data classification policy — are all the target documents approved for cloud processing in Azure UK South? 6. Decide on authentication — Azure Entra ID for internal users is the obvious choice 7. Set a test plan: prepare 20-30 questions with known correct answers from your documents before building 8. Budget for three months of pilot running costs before deciding whether to scale 9. Identify who will maintain the index when source documents change 10. If processing personal data, involve your Data Protection Officer before uploading anything 11. Agree on success criteria: what answer accuracy rate justifies moving from pilot to department rollout? 12. Document the architecture for your IT team — even a one-page diagram of which services connect to what

