Skip to content
Platform approach

Technical documentation

AI Platform Engineering on AWS.

How we design, secure and run GenAI services in production on AWS, from the cloud account to the agent platform.

For technical teams (architecture, IT, cloud, security). Choices are tailored to your context; the following describes our reference architecture.

AI as a product, on a reusable foundation.

The goal isn't to run a model, but to deliver a reliable GenAI service, integrated with the IS, secured and measured, then replicate it. That requires a cloud foundation and engineering practices, not a series of POCs.

Security & governance from the start

Isolated accounts, least privilege, encryption, traceability: compliance is a starting condition, not a patch.

Data where it lives

We connect existing sources (S3, databases, ERP/CRM/DMS) without needless copying, with EU data residency.

Everything as Infrastructure as Code

Landing zone, networks, models, agents: provisioned and versioned (Terraform / CDK), reproducible across environments.

Observability & FinOps

Every model call is traced, measured and budgeted; cost per use and per token is steered.

Human in the loop

Sensitive actions require explicit validation; guardrails frame responses.

Reference architecture.

A useful GenAI use case relies on layers that work together, from cloud foundation to adoption. Security, observability and FinOps are cross-cutting.

06 Adoption & value
Business assistants Usage metrics KPIs Scale-out
05 Agents & applications
API Gateway Lambda ECS / Fargate Step Functions EventBridge
04 GenAI core, Amazon Bedrock
Bedrock (Claude) Knowledge Bases (RAG) Guardrails AgentCore EU inference
03 Data & IS integration
S3 OpenSearch Serverless Aurora pgvector DynamoDB Glue MCP connectors
02 Network & security
VPC PrivateLink KMS Secrets Manager GuardDuty WAF
01 Landing Zone & accounts
AWS Organizations Control Tower SCPs IAM Identity Center
Cross-cutting across all layers
Security Compliance Observability (CloudWatch · X-Ray) Traceability (CloudTrail) FinOps (Budgets · Cost Explorer)

Setting up the cloud environment (Landing Zone).

Before any GenAI use case, we lay a sound cloud foundation: isolated environments, centralised security and automatic guardrails. That's what makes the rest reproducible and auditable.

  • Multi-account by default

    Separating production, non-production, security and tooling limits blast radius and clarifies responsibilities.

  • Automatic guardrails (SCPs)

    Organization rules prevent dangerous configurations: unauthorised regions, unencrypted resources.

  • Centralised identity (SSO)

    Federated access via IAM Identity Center, temporary credentials and least privilege.

  • Centralised security & logs

    CloudTrail, Config, GuardDuty and Security Hub aggregated in a dedicated account.

  • Controlled network

    Private VPCs, PrivateLink to Bedrock and services, no needless Internet exposure.

AWS Organizations

Management

  • Organizations
  • Billing
  • Control Tower

Security

  • Log Archive
  • Audit
  • GuardDuty · Security Hub

Infrastructure

  • Network (Transit Gateway)
  • Shared VPCs
  • Shared services

Workloads

  • GenAI, Dev
  • GenAI, Prod

Foundation deployed via AWS Control Tower + Infrastructure as Code, adaptable to an existing landing zone.

The AWS services we use.

Driven by need and your context, no gratuitous complexity.

GenAI

  • Amazon Bedrock

    Managed models (incl. Claude), no servers to run.

  • Bedrock Knowledge Bases

    Managed RAG: ingestion, embeddings, retrieval.

  • Bedrock Guardrails

    Content/topic filtering, PII masking.

  • Bedrock AgentCore

    Agents: orchestration, memory, identity, tools.

Data & vectors

  • Amazon S3

    Storage for documents and source data.

  • OpenSearch Serverless

    Vector search for RAG.

  • Aurora PostgreSQL / pgvector

    Vectors and relational data.

  • DynamoDB · Glue

    Agent state/memory, data ingestion.

Integration & runtime

  • API Gateway · Lambda

    Private APIs and serverless functions.

  • ECS / Fargate

    Containers for durable workloads.

  • Step Functions · EventBridge

    Orchestration and events.

  • MCP connectors

    Agent tools to ERP, CRM, DMS, APIs.

Security & identity

  • IAM · IAM Identity Center

    Least privilege, federated access (SSO).

  • KMS · Secrets Manager

    Encryption and secrets management.

  • PrivateLink

    Private access to Bedrock and services.

  • GuardDuty · Security Hub · WAF

    Detection, posture, app protection.

Governance & landing zone

  • AWS Organizations · Control Tower

    Multi-account and guardrails.

  • AWS Config

    Continuous resource compliance.

  • CloudTrail

    Audit trail of every action.

  • Service Catalog

    Approved self-service templates.

Observability & FinOps

  • CloudWatch · X-Ray

    Metrics, logs, request traces.

  • Cost Explorer · Budgets

    Cost tracking and alerts.

  • Bedrock logging

    Model invocation tracing.

  • FinOps tags

    Cost per use, team and environment.

Agentic GenAI patterns.

The building blocks we assemble per use case, always governed.

RAG (retrieval-augmented)

Bedrock Knowledge Bases + vector store: answers grounded in your documents, with source citations.

Agents & tools (MCP)

An agent (AgentCore) calls tools, search, business APIs, actions, via MCP, with scoped permissions.

Guardrails

Content and topic filtering, sensitive data (PII) masking on input and output.

Human in the loop

Risky actions (write, send, decide) require explicit, traced validation.

Evaluation & quality

Test sets, relevance measurement and regression checks before and after go-live.

Aligned with the AWS Well-Architected Framework.

Our choices follow the six pillars of the Well-Architected Framework, and its Generative AI Lens. Applied to a GenAI use case:

Operational excellence

Automated deployments (IaC, CI/CD), runbooks, end-to-end observability.

Security

Least privilege, encryption, network isolation, traceability, guardrails.

Reliability

Managed quotas and limits, error recovery, graceful degradation of model calls.

Performance efficiency

Model chosen per need, caching, targeted RAG, serverless sizing.

Cost optimization

Cost per token tracked, right-sized models, budgets and alerts (FinOps).

Sustainability

Suitable regions and models, serverless resources, no over-provisioning.

We also apply the Generative AI Lens: response evaluation, guardrails, human oversight and inference cost control.

Industrialisation: Infrastructure as Code & CI/CD.

Landing zone, networks, guardrails, MCP servers and agents: everything is versioned, tested and deployed automatically. Terraform (or AWS CDK) for infrastructure, a dedicated CI/CD pipeline for MCP tools and agents.

Terraform, illustrative excerpt
# Bedrock guardrail, filter content + mask PII
resource "aws_bedrock_guardrail" "assistant" {
  name                      = "assistant-guardrail"
  blocked_input_messaging   = "Request blocked."
  blocked_outputs_messaging = "Response filtered."

  sensitive_information_policy_config {
    pii_entities_config { type = "EMAIL", action = "ANONYMIZE" }
  }
}

# Least-privilege execution role for the agent / MCP server
resource "aws_iam_role" "agent" {
  name               = "genai-agent-exec"
  assume_role_policy = data.aws_iam_policy_document.assume.json
}

resource "aws_iam_role_policy" "agent" {
  role = aws_iam_role.agent.id
  policy = jsonencode({
    Version   = "2012-10-17"
    Statement = [{
      Effect   = "Allow"
      Action   = ["bedrock:InvokeModel", "bedrock:Retrieve"]
      Resource = [var.model_arn, aws_opensearchserverless_collection.kb.arn]
    }]
  })
}

# MCP server container on ECS Fargate, behind a private API
module "mcp_server" {
  source        = "./modules/fargate-service"
  name          = "mcp-crm"
  image         = "${aws_ecr_repository.mcp.repository_url}:${var.git_sha}"
  task_role_arn = aws_iam_role.agent.arn
  private       = true
}

CI/CD pipeline, MCP servers & agents

  1. 01

    Commit

    Agent / MCP server code, prompts, Terraform config.

  2. 02

    Lint & tests

    Unit tests, MCP tool schema validation, IaC scan.

  3. 03

    Build image

    MCP / agent container pushed to ECR, vulnerability scan.

  4. 04

    Evaluation

    Agent test sets: relevance, guardrails, regression.

  5. 05

    Deploy

    Terraform apply, ECS/Fargate or AgentCore, dev → prod.

  6. 06

    Run & observability

    Traces, costs, alerts; rollback on regression.

dev → prod promotion approved; secrets injected via Secrets Manager, never in plaintext. Tools: GitLab CI / GitHub Actions / CodePipeline.

Security, compliance & control.

Data residency (EU)

Inference and storage in Europe; your data isn't used to train the models.

End-to-end encryption

KMS (managed keys), TLS everywhere, secrets in Secrets Manager.

Least privilege

Dedicated IAM roles per use case, temporary access, separated environments.

Full traceability

CloudTrail, Bedrock invocation logging, source and action tracing.

Private network

Access to Bedrock and data via PrivateLink, no Internet transit.

Costs under control (FinOps)

Budgets, cost per use and per token, drift alerts.

From the AWS account to production.

01

Foundation

Landing zone, accounts, security and network (Control Tower + IaC).

02

Data connection

IS sources, ingestion, vector store, access rights.

03

Service build

RAG and/or agent, MCP tools, guardrails, API.

04

Industrialisation

CI/CD, IaC, dev → prod environments, tests and evaluation.

05

Run

Observability, FinOps, continuous improvement, scale-out.

Audit your foundation or scope a GenAI architecture on AWS?

Let's discuss your existing environment, your security constraints and the simplest path to a production use case.

Talk to an architect