Skip to main content

aws-bedrock


title: "Aws Bedrock" sidebar_label: "Aws Bedrock" description: "AWS Bedrock foundation models for generative AI. Use when invoking foundation models, building AI applications, creating embeddings, configuring model access, or implementing RAG patterns."​

Aws Bedrock

AWS Bedrock foundation models for generative AI. Use when invoking foundation models, building AI applications, creating embeddings, configuring model access, or implementing RAG patterns.

Details​

PropertyValue
Skill Directory.github/skills/aws-bedrock/
PhaseGeneral
User Invocable✅ Yes
Usage/aws-bedrock Model name, capability, or pattern to look up (e.g. 'Claude text generation', 'Titan embeddings', 'streaming response', 'RAG with knowledge base')

Documentation​

AWS Bedrock

Amazon Bedrock provides access to foundation models (FMs) from AI companies through a unified API. Build generative AI applications with text generation, embeddings, and image generation capabilities.

Table of Contents​

Core Concepts​

Foundation Models​

Pre-trained models available through Bedrock:

  • Claude (Anthropic): Text generation, analysis, coding
  • Titan (Amazon): Text, embeddings, image generation
  • Llama (Meta): Open-weight text generation
  • Mistral: Efficient text generation
  • Stable Diffusion (Stability AI): Image generation

Model Access​

Models must be enabled in your account before use:

  • Request access in Bedrock console
  • Some models require acceptance of EULAs
  • Access is region-specific

Inference Types​

TypeUse CasePricing
On-DemandVariable workloadsPer token
Provisioned ThroughputConsistent high-volumeHourly commitment
Batch InferenceAsync large-scaleDiscounted per token

Common Patterns​

Invoke Claude (Text Generation)​

AWS CLI:

aws bedrock-runtime invoke-model \
--model-id anthropic.claude-3-sonnet-20240229-v1:0 \
--content-type application/json \
--accept application/json \
--body '{
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Explain AWS Lambda in 3 sentences."}
]
}' \
response.json

cat response.json | python3 -c "import sys,json; print(json.load(sys.stdin)['content'][0]['text'])"

boto3:

import boto3
import json

bedrock = boto3.client('bedrock-runtime')

def invoke_claude(prompt, max_tokens=1024):
response = bedrock.invoke_model(
modelId='anthropic.claude-3-sonnet-20240229-v1:0',
contentType='application/json',
accept='application/json',
body=json.dumps({
'anthropic_version': 'bedrock-2023-05-31',
'max_tokens': max_tokens,
'messages': [
{'role': 'user', 'content': prompt}
]
})
)

result = json.loads(response['body'].read())
return result['content'][0]['text']

# Usage
response = invoke_claude('What is Amazon S3?')
print(response)

Streaming Response​

import boto3
import json

bedrock = boto3.client('bedrock-runtime')

def stream_claude(prompt):
response = bedrock.invoke_model_with_response_stream(
modelId='anthropic.claude-3-sonnet-20240229-v1:0',
contentType='application/json',
accept='application/json',
body=json.dumps({
'anthropic_version': 'bedrock-2023-05-31',
'max_tokens': 1024,
'messages': [
{'role': 'user', 'content': prompt}
]
})
)

for event in response['body']:
chunk = json.loads(event['chunk']['bytes'])
if chunk['type'] == 'content_block_delta':
yield chunk['delta'].get('text', '')

# Usage
for text in stream_claude('Write a haiku about cloud computing.'):
print(text, end='', flush=True)

Generate Embeddings​

import boto3
import json

bedrock = boto3.client('bedrock-runtime')

def get_embedding(text):
response = bedrock.invoke_model(
modelId='amazon.titan-embed-text-v2:0',
contentType='application/json',
accept='application/json',
body=json.dumps({
'inputText': text,
'dimensions': 1024,
'normalize': True
})
)

result = json.loads(response['body'].read())
return result['embedding']

# Usage
embedding = get_embedding('AWS Lambda is a serverless compute service.')
print(f'Embedding dimension: {len(embedding)}')

Multi-Turn Conversation​

import boto3
import json

bedrock = boto3.client('bedrock-runtime')

class Conversation:
def __init__(self, system_prompt=None):
self.messages = []
self.system = system_prompt

def chat(self, user_message):
self.messages.append({
'role': 'user',
'content': user_message
})

body = {
'anthropic_version': 'bedrock-2023-05-31',
'max_tokens': 1024,
'messages': self.messages
}

if self.system:
body['system'] = self.system

response = bedrock.invoke_model(
modelId='anthropic.claude-3-sonnet-20240229-v1:0',
contentType='application/json',
accept='application/json',
body=json.dumps(body)
)

result = json.loads(response['body'].read())
assistant_message = result['content'][0]['text']

self.messages.append({
'role': 'assistant',
'content': assistant_message
})

return assistant_message

# Usage
conv = Conversation(system_prompt='You are an AWS solutions architect.')
print(conv.chat('What database should I use for a chat application?'))
print(conv.chat('What about for time-series data?'))

List and Check Available Models​

# List all foundation models
aws bedrock list-foundation-models \
--query 'modelSummaries[*].[modelId,modelName,providerName]' \
--output table

# Filter by provider
aws bedrock list-foundation-models \
--by-provider anthropic \
--query 'modelSummaries[*].modelId'

CLI Reference​

Bedrock (Control Plane)​

CommandDescription
aws bedrock list-foundation-modelsList available models
aws bedrock get-foundation-modelGet model details
aws bedrock list-custom-modelsList fine-tuned models
aws bedrock create-model-customization-jobStart fine-tuning

Bedrock Runtime (Data Plane)​

CommandDescription
aws bedrock-runtime invoke-modelInvoke model synchronously
aws bedrock-runtime invoke-model-with-response-streamInvoke with streaming
aws bedrock-runtime converseMulti-turn conversation API
aws bedrock-runtime converse-streamStreaming conversation

Bedrock Agent Runtime​

CommandDescription
aws bedrock-agent-runtime invoke-agentInvoke a Bedrock agent
aws bedrock-agent-runtime retrieveQuery knowledge base
aws bedrock-agent-runtime retrieve-and-generateRAG query

Best Practices​

Cost Optimization​

  • Use appropriate models: Smaller models for simple tasks
  • Set max_tokens: Limit output length when possible
  • Cache responses: For repeated identical queries
  • Batch when possible: Use batch inference for bulk processing
  • Monitor usage: Set up CloudWatch alarms for cost

Performance​

  • Use streaming: For better user experience with long outputs
  • Connection pooling: Reuse boto3 clients
  • Regional deployment: Use closest region to reduce latency
  • Provisioned throughput: For consistent high-volume workloads

Security​

  • Least privilege IAM: Only grant needed model access
  • VPC endpoints: Keep traffic private
  • Guardrails: Implement content filtering
  • Audit with CloudTrail: Track model invocations

IAM Permissions​

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": [
"arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0",
"arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0"
]
}
]
}

Troubleshooting​

AccessDeniedException​

Causes:

  • Model access not enabled in console
  • IAM policy missing bedrock:InvokeModel
  • Wrong model ID or region

Debug:

# Check model access status
aws bedrock list-foundation-models \
--query 'modelSummaries[?modelId==`anthropic.claude-3-sonnet-20240229-v1:0`]'

ThrottlingException​

Causes:

  • Exceeded on-demand quota
  • Too many concurrent requests

Solutions:

  • Request quota increase
  • Implement exponential backoff
  • Consider provisioned throughput

ValidationException​

Common issues:

  • Invalid model ID
  • Malformed request body
  • max_tokens exceeds model limit

Retry with Exponential Backoff​

import time
from botocore.exceptions import ClientError

def invoke_with_retry(bedrock, body, max_retries=3):
for attempt in range(max_retries):
try:
return bedrock.invoke_model(
modelId='anthropic.claude-3-sonnet-20240229-v1:0',
body=json.dumps(body)
)
except ClientError as e:
if e.response['Error']['Code'] in ('ThrottlingException', 'ModelNotReadyException'):
time.sleep(2 ** attempt)
else:
raise
raise Exception('Max retries exceeded')

References​