This repository contains five different strategies for automating vector embeddings creation in PostgreSQL using Amazon Aurora.
-
Direct RDS-Bedrock Integration (found in
lib/01_rds_bedrock/).Uses direct database integration with Amazon Bedrock for embedding generation
-
RDS with Synchronous Lambda-Bedrock Integration (found in
lib/02_rds_lambda_bedrock_sync/)Utilizes AWS Lambda functions to synchronously generate embeddings through Bedrock
-
RDS with Asynchronous Lambda-Bedrock Integration (found in
lib/03_rds_lambda_bedrock_async/)Implements asynchronous embedding generation using Lambda functions and Bedrock
-
RDS with Lambda and SQS Integration (found in
lib/04_rds_lambda_sqs/)Uses a combination of Lambda functions and SQS queues for managed embedding generation
-
RDS with Polling Mechanism (found in
lib/05_rds_polling/)Implements a polling-based approach for embedding generation
These strategies showcase various approaches to generate and store embeddings using Amazon Bedrock and pgvector. The project includes infrastructure as code using AWS CDK to deploy a fully managed Aurora PostgreSQL cluster, along with a bastion host for database access.
Running make deploy provisions the core infrastructure stack, which includes:
- the Aurora PostgreSQL database cluster
- networking components
- a bastion host for secure access.
Additional nested stacks are deployed to support serverless integrations with AWS Lambda functions and Amazon SQS queues, enabling automated embedding generation workflows. Each strategy is implemented as a separate stack. The infrastructure includes proper IAM roles, VPC and communications between services.
- Overview π
- Prerequisites βοΈ
- Node.js (v22.x or later) π
- npm (v10.x or later) π¦
- AWS CDK βοΈ
- TypeScript π
- Project Setup π οΈ
- Database Connection ποΈ
- Run the scenarios provided π―
- Security π
- Contributing π₯
- Clean Up π§Ή
- License π
Before you begin, ensure you have the following installed on your machine:
- For version management, we recommend using nvm (Node Version Manager)
# Install nvm
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash
# Install Node.js
nvm install 22
nvm use 22Comes with Node.js installation
npm install -g aws-cdk
npm install -g typescript
git clone <repository-url>
cd embedding-strategies-postgresql
make help
make install
make bootstrap
make deploy
Once you deployed the stack with the make deploy command, you can connect to the PostgreSQL Aurora database from the Bastion Host with the following instructions:
-
SSH into the EC2 bastion host (or use AWS System manager)
-
Run the connect.sh script found in:
-
/home/ec2-user/connect.sh -
./connect.sh
example:
cd /home/ec2-user/ chmod +x connect.sh ./connect.sh -
-
(alternative) Retrieve the database password from AWS Secrets Manager:
export PGPASSWORD=$(aws secretsmanager get-secret-value \ --secret-id Aurora-credentials \ --query SecretString \ --output text \ --region eu-central-1 | jq '.password')
-
(alternative) Connect using psql:
psql -h <your-cluster-endpoint> \ -U postgresadmin \ -d postgres \ -p 5432 \ --set=sslmode=verify-full \ --set=sslcert=/usr/local/share/postgresql/global-bundle.pem
Under the lib directory, each solution folder contains SQL scripts that create the necessary database resources as described in the blog post, including:
- Tables (
documentsanddocument_embeddings) for storing text content and their vector representations - Triggers for automated embedding generation
- Stored procedures for vector operations and embedding management
For educational purpose we decided to split the "documents" table from the "documents_embeddings" to avoid an update of the row during asynchronous embedding when inserting a document for the first time. In PostgreSQL an update is a delete followed by an insert, so, we want to avoid generating dead rows.
You can run the script that create the resources for all 5 scenarios along with installing the required extensions:
./home/ec2-user/init-db.sh
Otherwise, you can run your preferred scenario by executing one of these SQL scripts to automatically generate embedding vectors using the selected strategy:
lib/01_rds_bedrock/scripts/init.sqllib/02_rds_lambda_bedrock_sync/scripts/init.sqllib/03_rds_lambda_bedrock_async/scripts/init.sqllib/04_rds_lambda_sqs/scripts/init.sqllib/05_rds_polling/scripts/init.sql
All database credentials are managed through AWS Secrets Manager
SSL/TLS is enforced for database connections
Infrastructure changes are version controlled and deployed through CDK
Create a new branch for your feature
Make your changes
Submit a pull request
To avoid incurring charges, clean up your resources:
make destroy
This project is licensed under the MIT License - see the LICENSE file for details.