Fragmented Thought

Using CDK to deploy a Typescript GraphQL APIs

By

Published:

Updated:

Lance Gliser

Heads up! This content is more than six months old. Take some time to verify everything still works as expected.

Updates
  • 2022-10-29 - Breaking
    • Modified the architecture to keep ApolloService fully generic.
    • Moved DNS records into ApolloService
    • Created ExampleService for specifics like permissions, metrics, and alarms.
    • Introduced S3 bucket KMS key policies into permissions.
    • Added logging to details using embedded log format
    • Added CDK tests

Motivation

Two months ago our company made a hard pivot. The mandate came down to drop Kubernetes and go AWS services direct. Everyone's immediate concern was 'how do we develop now'? Some AWS recommendations I've found:

  • A shared Sandbox account, with resource cost limits tied back to the developer that deployed them. Handy, but complicated.
  • An entire AWS account per developer, with a simple spending limit rolled up to a parent. Handy, but complicated. Worse, it might end up with a developer with no means of working.

Either option put more money in the pocket of AWS, so started trying to find better solutions:

  • AWS Severless Application Model (SAM) An interesting option, and we used it for proof of functioning lambdas initially.
  • LocalStack Our first real love in this space. The community version provided us with all we needed so far: Lambdas, DynamoDB, S3, and OpenSearch.

OpenSearch is another article's worth of documentation and discovery we owe to jonofoz. Hopefully he'll be interested in publishing.

Overview

After some negotiations, we landed on this high level setup:

  • Developers can do most simple work locally thanks to OpenStack.
  • Cloud deployments can use the CDK files with some minor contextual variables to target existing services such as S3, DynamoDB, etc that were established by our DevOps teams.
  • Creation and management of our own AWS components:
    • Web Application Filter (WAF) / Access Control List (ACL)
    • API Gateway
    • Lambda with NodeJS environment
    • Compiled Typescript running an Apollo server
    • CloudWatch metrics and alarms
      • An SNS performance topic for alarm notifications
    • DNS through ARecords set as aliases

Let's step through the piece required:

A generic LocalStack repo

This shared repo will serve as our means of providing services locally. You might be interested in their enterprise version depending on your required services. Be sure to check their coverage levels for your project.

docker-compose.yml

version: "3.8" services: # Values for localstack are straight out of the suggestions at # https://github.com/localstack/localstack/blob/master/docker-compose.yml localstack: container_name: "${LOCALSTACK_DOCKER_NAME-localstack_main}" image: localstack/localstack ports: - "127.0.0.1:4566:4566" # LocalStack Gateway - "127.0.0.1:4510-4559:4510-4559" # external services port range - "127.0.0.1:53:53" # DNS config (only required for Pro) - "127.0.0.1:53:53/udp" # DNS config (only required for Pro) - "127.0.0.1:443:443" # LocalStack HTTPS Gateway (only required for Pro) environment: - DEBUG=${DEBUG-} - PERSISTENCE=${PERSISTENCE-} - LAMBDA_EXECUTOR=${LAMBDA_EXECUTOR-} - LOCALSTACK_API_KEY=${LOCALSTACK_API_KEY-} # only required for Pro - DOCKER_HOST=unix:///var/run/docker.sock volumes: - "${LOCALSTACK_VOLUME_DIR:-./volume}:/var/lib/localstack" - "/var/run/docker.sock:/var/run/docker.sock" logging: &loggingLocal driver: local

Usage

To start:

docker compose up -d

To remove:

docker compose down -v

To relaunch:

docker compose down -v && docker compose up -d

To check all aws services deployed on your localstack you can visit localhost:4566/health.

Service GUI's

DynamoDB

dynamodb-admin provides a GUI to inspect and modify the database.

To use it, install the package once globally:

npm install -g dynamodb-admin

You can then access the Admin by launching it:

AWS_REGION=us-east-1 DYNAMO_ENDPOINT=http://localhost:4566 dynamodb-admin
S3

You can access a particular bucket at localhost:4566/{bucketName}.

Bootstrapping

Technically that ends the generic part of LocalStack. You could use it just as it stands. For our purposes, we'll also need to set up the environment for CDK usage by bootstrap.

For local deployment, we'll cdklocal. To get started, install it globally:

npm install -g aws-cdk-local

A bit of warning: Our team suffered some initial massive frustration due to the nature of cdklocal. It's a thin wrapper around the real cdk. It can and does use your AWS profile's configurations by default. Unless explicitly specified, the region used inside LocalStack during any command will be your profile's region. This lead to our team's code failing based on each person's setup, and again in the validation pipelines. You should explicitly state the region for all cdklocal commands.

You'll need to bootstrap your local environment each time you compose up:

cdklocal bootstrap --region us-east-1

With the LocalStack running services and our region bootstrapped, we can move to individual API's to start deployment.

Writing CDK to deploy one or more services

Internally, our application team relies on Apollo server (based on Express). Everything in the API's world is just a bunch of unrelated twelve factor environmental variables. This frees us to choose between running the actual lambda provided during CDK deployment, or the more responsive npm run dev systems. Personally, I choose to run the API, and Jest tests through node.

Let's talk through the CDK files we'd expect in any API. I'll omit any file generated that's completely standard.

bin/cdk-ts.ts

Nothing much surprising here. It's the default generated file other than hooking into package.json for values.

#!/usr/bin/env node import "source-map-support/register"; import { App } from "aws-cdk-lib"; import { Stack } from "../lib/Stack"; import { name, description } from "../package.json"; const app = new App(); new Stack(app, name, { /* If you don't specify 'env', this stack will be environment-agnostic. * Account/Region-dependent features and context lookups will not work, * but a single synthesized template can be deployed anywhere. */ /* Uncomment the next line to specialize this stack for the AWS Account * and Region that are implied by the current CLI configuration. */ // env: { // account: process.env.CDK_DEFAULT_ACCOUNT, // region: process.env.CDK_DEFAULT_REGION, // }, /* Uncomment the next line if you know exactly what Account and Region you * want to deploy the stack to. */ // env: { account: "****", region: "us-east-1" }, /* For more information, see https://docs.aws.amazon.com/cdk/latest/guide/environments.html */ description, });

lib/Stack.ts

The Stack file is responsible for declaring AWS services we need, packaging our code into a lambda, and granting permissions.

import * as core from "aws-cdk-lib"; import { Construct } from "constructs"; import { join } from "path"; import { Runtime } from "aws-cdk-lib/aws-lambda"; import { Certificate, DnsValidatedCertificate, ICertificate, } from "aws-cdk-lib/aws-certificatemanager"; import { ARecord, ARecordProps, HostedZone, RecordTarget, } from "aws-cdk-lib/aws-route53"; import { Table as DynamoDBTable, AttributeType as DynamoBMAttributeType, } from "aws-cdk-lib/aws-dynamodb"; import { CfnWebACL } from "aws-cdk-lib/aws-wafv2"; import { aclVisibilityConfigDefaults, defaultRules } from "./WebACLRules"; import { aws_route53_targets, Duration } from "aws-cdk-lib"; import * as s3 from "aws-cdk-lib/aws-s3"; import { exampleComponentId, ExampleService } from "./ExampleService"; // Allows you to create any resources you require from AWS, along with your own ApolloServer. export class Stack extends core.Stack { constructor(scope: Construct, id: string, props?: core.StackProps) { super(scope, id, props); // We'll start by reading any variables we need from `cdk deploy --context key=value` pairs. // These are our controls for environment specific variables. // As a general strategy, we'll accept id's for services and objects in them such as ARN's, Zone Ids, etc. // If supplied, we'll use those. If not (as in the case of LocalStack), we'll create them. // Certificate manager certificates. A wildcard would be expected given this file's operations. const certificateARN: string | undefined = this.node.tryGetContext("certificateARN"); // A DynamoDB table const exampleTableARN: string | undefined = this.node.tryGetContext("exampleTableARN"); const exampleTableName: string = this.node.tryGetContext("exampleTableName") || "Example"; // An S3 bucket const exampleBucketARN = this.node.tryGetContext("exampleBucketARN"); const exampleBucketKMSKeyARN = this.node.tryGetContext( "exampleBucketKMSKeyARN" ); const exampleBucketName: string = this.node.tryGetContext("exampleBucketName") || "examples"; // CloudWatch const areAlarmsEnabled: boolean = this.node.tryGetContext("areAlarmsEnabled") === "true"; // WAF (Access Control Lists) const isWAFEnabled: boolean = this.node.tryGetContext("isWAFEnabled") === "true"; // Route 53 DNS const hostedZoneId: string | undefined = this.node.tryGetContext("hostedZoneId"); // Examples: sandbox.torch.ai, indigo.dev-int.torch.ai const hostedZoneDomain: string | undefined = this.node.tryGetContext("hostedZoneDomain") || "torch.ai"; // Custom controls for our deployment const subdomain: string = this.node.tryGetContext("subdomain") || "api"; // Output the context during any CDK operations to help make your CI/CD pipelines more sensible. console.info("Context settings", { areAlarmsEnabled, certificateARN, exampleTableARN, exampleTableName, exampleBucketARN, exampleBucketKMSKeyARN, exampleBucketName, hostedZoneId, hostedZoneDomain, isWAFEnabled, subdomain, }); // Next we'll collect pointers to various AWS src. // If provided with an external identifier, we'll use that, otherwise we'll create what we can. // Dynamo DB const exampleTableId = exampleTableName; const exampleTable = exampleTableARN ? DynamoDBTable.fromTableArn(this, exampleTableId, exampleTableARN) : new DynamoDBTable(this, exampleTableId, { partitionKey: { name: "pk", type: DynamoBMAttributeType.STRING, }, sortKey: { name: "sk", type: DynamoBMAttributeType.STRING, }, tableName: exampleTableName, }); // S3 Bucket const exampleBucketId = exampleBucketName; const exampleBucket = exampleBucketARN ? s3.Bucket.fromBucketArn(this, exampleBucketId, exampleBucketARN) : new s3.Bucket(this, exampleBucketId, { bucketName: exampleBucketName, }); // Route 53 // Note that we do not create a hosted zone if the references are missing. // LocalStack has some trouble with Route53 and certificates, we skip those portions with some iffing. const hostedZone = hostedZoneId && hostedZoneDomain ? HostedZone.fromHostedZoneAttributes(this, "Zone", { hostedZoneId, zoneName: hostedZoneDomain, }) : undefined; // Certificate const certificateID = "Certificate"; let certificate: ICertificate | undefined; if (hostedZone) { certificate = certificateARN ? Certificate.fromCertificateArn(this, certificateID, certificateARN) : new DnsValidatedCertificate(this, certificateID, { domainName: `${subdomain}.${hostedZoneDomain}`, hostedZone, subjectAlternativeNames: [`*.${subdomain}.${hostedZoneDomain}`], }); } // Web Application Firewall const webACL = !isWAFEnabled ? undefined : new CfnWebACL(this, `WebACL`, { name: id, defaultAction: { allow: {}, }, visibilityConfig: { ...aclVisibilityConfigDefaults, metricName: `${id}-waf`, }, scope: "REGIONAL", rules: defaultRules, }); // API specifics // Our initial CDK setup was a monorepo with APIs deployed as sets of APIGateways in front of Lambdas. // We will use a setup with APIGateway includes custom DNS settings coming from the 'packages/*' directories. // You could easily take the domains of each and feed them into a Federated GraphQL Gateway deploying all of it // at once through the same CDK Stack. If you choose to do that, strongly consider not using a lambda // for the Gateway. The performance was abysmal. // Apollo src const packagesPath = join(__dirname, "..", "packages"); // API // packages/example const examplePath = join(packagesPath, "example"); const exampleSubdomain = subdomain; const exampleDomainName = `${exampleSubdomain}.${hostedZoneDomain}`; // ApolloService.ts is also featured in this post. new ExampleService(this, exampleComponentId, { areAlarmsEnabled, restApiProps: { domainName: hostedZone && certificate ? { domainName: exampleDomainName, // Requires fully qualified DNS certificate: certificate, } : undefined, }, lambdaProps: { // packages/example/package-lock.json depsLockFilePath: join(examplePath, "package-lock.json"), // packages/example/src/lambda.ts entry: join(examplePath, "src", "lambda.ts"), // The name of the export to use from the entry file handler: "handler", // It is possible run with less memory, down to 128. // Higher memory also seems paired with higher CPU. memorySize: 512, projectRoot: join(examplePath), environment: { DYNAMODB_EXAMPLES_TABLE_NAME: exampleTableName, S3_EXAMPLES_IMAGES_BUCKET: exampleBucketName, }, runtime, }, domainName: monolithDomainName, subdomain: monolithSubdomain, exampleTable, examplePayLoadBucket, examplePayloadBucketKMSKeyARN, }); } } export const runtime = Runtime.NODEJS_16_X; const dnsRecordDefaults: Partial<ARecordProps> = { comment: "Managed by application-services CDK stack.", deleteExisting: true, ttl: Duration.minutes(5), };

lib/WebACLRules.ts

This provides us with the AWS core rule set along with a couple more dynamic rules.

// Based on https://github.com/cdk-patterns/serverless/tree/main/the-waf-apigateway import { CfnWebACL } from "aws-cdk-lib/aws-wafv2"; export const aclVisibilityConfigDefaults: Omit< CfnWebACL.VisibilityConfigProperty, "metricName" > = { cloudWatchMetricsEnabled: true, sampledRequestsEnabled: true, }; export const defaultRules: CfnWebACL.RuleProperty[] = [ // 1 AWS Managed Rules { name: "AWS-AWSManagedRulesCommonRuleSet", priority: 1, overrideAction: { none: {} }, statement: { managedRuleGroupStatement: { name: "AWSManagedRulesCommonRuleSet", vendorName: "AWS", excludedRules: [{ name: "SizeRestrictions_BODY" }], }, }, visibilityConfig: { ...aclVisibilityConfigDefaults, metricName: "awsCommonRules", }, }, // 2 AWS AnonIPAddress { name: "AWS-AWSManagedRulesAnonymousIpList", priority: 2, overrideAction: { none: {} }, statement: { managedRuleGroupStatement: { name: "AWSManagedRulesAnonymousIpList", vendorName: "AWS", excludedRules: [], }, }, visibilityConfig: { ...aclVisibilityConfigDefaults, metricName: "awsAnonymous", }, }, // 3 AWS ip reputation List { name: "AWS-AWSManagedRulesAmazonIpReputationList", priority: 3, overrideAction: { none: {} }, statement: { managedRuleGroupStatement: { name: "AWSManagedRulesAmazonIpReputationList", vendorName: "AWS", excludedRules: [], }, }, visibilityConfig: { ...aclVisibilityConfigDefaults, metricName: "awsReputation", }, }, // 4 GeoBlock based on ISO 2 country codes if required from accessing gateway // { // name: "geoblockRule", // priority: 4, // action: { block: {} }, // statement: { // geoMatchStatement: { // countryCodes: [], // }, // }, // visibilityConfig: { // ...wafRulesVisibilityDefaults, // metricName: "geoBlock", // }, // }, ];

lib/ExampleService.ts

Here we have an example of a specific service we might launch. You can create multiple extending from ApolloService to handle all of your micro service needs.

import { Construct } from "constructs"; import { ApolloService, ApolloServiceProps } from "./ApolloService"; import { EmailSubscription } from "aws-cdk-lib/aws-sns-subscriptions"; import { Topic } from "aws-cdk-lib/aws-sns"; import { Alarm, ComparisonOperator, Metric, Unit, } from "aws-cdk-lib/aws-cloudwatch"; import { SnsAction } from "aws-cdk-lib/aws-cloudwatch-actions"; import { Duration } from "aws-cdk-lib"; import { metricNameExampleMetric } from "../packages/example/src/utils/cloudwatch"; import { cloudWatchMetricsNamespace } from "../packages/example/src/config/logger"; import { ITable } from "aws-cdk-lib/aws-dynamodb"; import { Effect, Policy, PolicyStatement } from "aws-cdk-lib/aws-iam"; import { IBucket } from "aws-cdk-lib/aws-s3"; export const exampleComponentId = "ExampleApi"; interface ExampleServiceProps extends ApolloServiceProps { readonly areAlarmsEnabled?: boolean; readonly exampleTable: ITable; readonly examplePayLoadBucket: IBucket; readonly examplePayloadBucketKMSKeyARN: string | undefined; } export class ExampleService extends ApolloService { constructor( scope: Construct, id: string, { areAlarmsEnabled, exampleTable, examplePayLoadBucket, examplePayloadBucketKMSKeyARN, ...props }: ExampleServiceProps ) { super(scope, id, props); const alarms: Alarm[] = []; // Register our metrics as alarms that hit the topic if (areAlarmsEnabled) { alarms.push( new Alarm(this, "Metric-ExampleMetric", { comparisonOperator: ComparisonOperator.GREATER_THAN_OR_EQUAL_TO_THRESHOLD, threshold: 2000, evaluationPeriods: 1, alarmName: "ExampleService - ExampleMetric average duration", // This code will create a metric, pushing to it will be another story. // See the follow up sections on logging. metric: new Metric({ // You'll need to import the namespace from the package code namespace: cloudWatchMetricsNamespace, // Same the metric name itself. metricName: metricNameGetDocuments, unit: Unit.MICROSECONDS, statistic: "avg", period: Duration.minutes(5), }), }) ); const performanceTopic = new Topic( this, "ExampleService_Performance_Alarms" ); // If you have a simple set email, you can just add it. performanceTopic.addSubscription(new EmailSubscription("panic@torch.ai")); // It's also possible to create a parameter as part of the deployment you can managed later. // const emailAddress = new CfnParameter(this, `${id}-performance-email`); // myTopic.addSubscription(new EmailSubscription(emailAddress.valueAsString)); alarms.forEach((alarm) => { alarm.addAlarmAction(new SnsAction(performanceTopic)); }); } // DynamoDB table access exampleTable.grantReadWriteData(this.lambda); // S3 bucket general access examplePayLoadBucket.grantReadWrite(this.lambda); // Encryption key access for bucket contents // If you attempt to create a signed url and run into: // "<Message>The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access." // This is the culprit, the bucket was created with a KMS key, and you have to give the servide permission to use it. // I attempted to use `examplePayLoadBucket.encryptionKey`, but it is undefined when created using an ARN reference. if (examplePayloadBucketKMSKeyARN) { this.lambda.role?.attachInlinePolicy( new Policy(this, "example-bucket-kms-access", { statements: [ new PolicyStatement({ effect: Effect.ALLOW, actions: [ "kms:RetireGrant", "kms:CreateGrant", "kms:ReEncrypt*", "kms:GenerateDataKey*", "kms:Encrypt", "kms:DescribeKey", "kms:Decrypt", ], resources: [examplePayloadBucketKMSKeyARN], }), ], }) ); } } }

lib/ApolloService.ts

The ApolloService file is responsible for creating a Lambda accessed by an APIGateway and DNS. We return the configured objects for use in other pieces of the stack.

import { aws_route53_targets, Duration, Stack } from "aws-cdk-lib"; import { Construct } from "constructs"; import { CfnWebACL, CfnWebACLAssociation } from "aws-cdk-lib/aws-wafv2"; import { Tracing, Architecture } from "aws-cdk-lib/aws-lambda"; import { Cors, LambdaIntegration, RestApi } from "aws-cdk-lib/aws-apigateway"; import { NodejsFunction } from "aws-cdk-lib/aws-lambda-nodejs"; import { NodejsFunctionProps } from "aws-cdk-lib/aws-lambda-nodejs/lib/function"; import { RestApiProps } from "aws-cdk-lib/aws-apigateway/lib/restapi"; import { ARecord, ARecordProps, IHostedZone, RecordTarget, } from "aws-cdk-lib/aws-route53"; import { ICertificate } from "aws-cdk-lib/aws-certificatemanager"; // Based on documentation from https://www.apollographql.com/docs/apollo-server/deployment/lambda/ export interface ApolloServiceProps { readonly lambdaProps?: Partial<NodejsFunctionProps>; readonly restApiProps?: Partial<RestApiProps>; readonly webACL?: CfnWebACL; readonly certificate?: ICertificate; readonly hostedZone?: IHostedZone; readonly domainName: string; readonly subdomain: string; readonly dnsRecordDefaults?: Partial<ARecordProps>; } // TODO https://docs.aws.amazon.com/cdk/api/v1/docs/aws-apigatewayv2-readme.html export class ApolloService extends Construct { readonly webACL?: CfnWebACL; readonly restApi: RestApi; readonly lambda: NodejsFunction; constructor( scope: Construct, id: string, { lambdaProps, restApiProps, webACL, certificate, hostedZone, domainName, subdomain, dnsRecordDefaults = {}, }: ApolloServiceProps ) { super(scope, id); // Create a Lambda based on NodeJS that will compile and upload our source code. // See: https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_lambda_nodejs-readme.html // It uses esbuild under the hood. See: https://esbuild.github.io/ const lambda = new NodejsFunction(this, `Lambda`, { // If you can support modern architectures, be sure to declare it. // There's a nice performance boost option. architecture: Architecture.ARM_64, // Bundling options here are passed through to esbuild in this post. bundling: { loader: { // We use { loadFilesSync } from "@graphql-tools/load-files" to get typeDefsArray. // This means the build needs to include the actual `.graphql` files to read a run time. // I'll follow up on this concept further along in packages/*/src/lambda.ts. ".graphql": "file", ...lambdaProps?.bundling?.loader, }, ...lambdaProps?.bundling, }, environment: { ...lambdaProps?.environment, }, timeout: Duration.seconds(30), tracing: Tracing.ACTIVE, ...lambdaProps, }); // Create an API that will control access to our Lambda and exposes it to the internet const api = new RestApi(this, `RestApi`, { ...restApiProps, restApiName: `${id} graphql endpoint`, description: `This service serves ${id} subgraph data through Apollo graphQL.`, defaultCorsPreflightOptions: { ...restApiProps?.defaultCorsPreflightOptions, allowOrigins: Cors.ALL_ORIGINS, allowMethods: Cors.ALL_METHODS, // this is also the default }, ...restApiProps, deployOptions: { ...restApiProps?.deployOptions, tracingEnabled: true, ...restApiProps?.deployOptions, }, }); // Store the gateway ARN for use with our WAF stack const apiGatewayARN = `arn:aws:apigateway:${ Stack.of(this).region }::/restapis/${api.restApiId}/stages/${api.deploymentStage.stageName}`; const graphqlPostIntegration = new LambdaIntegration(lambda); // Connect the pair, and set the allowed methods api.root.addMethod("POST", graphqlPostIntegration); api.root.addMethod("GET", graphqlPostIntegration); if (webACL) { new CfnWebACLAssociation(this, "WebACLAssociation", { webAclArn: webACL.attrArn, resourceArn: apiGatewayARN, }); } if (hostedZone && certificate) { new ARecord(this, ["ARecord", domainName].join(":"), { ...dnsRecordDefaults, target: RecordTarget.fromAlias(new aws_route53_targets.ApiGateway(api)), recordName: subdomain, // The portion of the domain below the hosted zone zone: hostedZone, }); } this.lambda = lambda; this.restApi = api; this.webACL = webACL; } }

package.json

This file is minimally changed from the generated file for any Typescript CDK project. Be sure to note the esbuild packages.

{ "name": "application-services", "description": "Provides GraphQL services to support application", "version": "0.11.0", "bin": { "cdk-ts": "bin/cdk-ts.js" }, "scripts": { "cdk": "cdk", "cdk:synth": "cdk synth", "dev": "concurrently \"npm run dev:*\"", "dev:example": "cd packages/example && npm run dev:watch", "pre-commit": "lint-staged", "lint": "tsc --noEmit && eslint .", "prepare": "husky install", "test": "npm run lint && jest" }, "contributors": [ { "name": "Lance Gliser", "email": "lance.gliser@torch.ai" }, { "name": "Jon Korte", "email": "jon.korte@torch.ai" } ], "devDependencies": { "@apollo/gateway": "^2.0.5", "@aws-cdk/aws-s3-deployment": "^1.169.0", "@types/jest": "^27.5.2", "@types/node": "10.17.27", "@types/prettier": "2.6.0", "@typescript-eslint/eslint-plugin": "^5.30.6", "aws-cdk": "2.31.1", "concurrently": "^7.3.0", "esbuild": "^0.14.49", "eslint": "^8.20.0", "husky": "^8.0.1", "jest": "^27.5.1", "lint-staged": "^13.0.3", "prettier": "^2.7.1", "prettier-eslint": "^15.0.1", "source-map-support": "^0.5.21", "ts-jest": "^27.1.4", "ts-node": "^10.8.1", "typescript": "^4.4.4" }, "dependencies": { "@aws-sdk/client-s3": "^3.157.0", "@aws-sdk/s3-request-presigner": "^3.157.0", "aws-cdk-lib": "2.31.1", "constructs": "^10.0.0" }, "lint-staged": { "./**/*.{js,ts,json,md,graphql}": ["prettier --write"] } }

Tests

The CDK is automatically wired to test with Jest. We'll provide some tests to help ensure our work right.

import * as cdk from "aws-cdk-lib"; import { Match, Template } from "aws-cdk-lib/assertions"; import { runtime, Stack } from "../lib/Stack"; import { exampleComponentId } from "../lib/ExampleService"; import { Matcher } from "aws-cdk-lib/assertions/lib/matcher"; import { cloudWatchMetricsNamespace } from "../packages/example/src/config/logger"; import { metricNameExampleMetric } from "../packages/example/src/utils/cloudwatch"; const hostedZoneDomain = "sandbox.torch.ai"; const subdomain = `api.test`; const hostName = `${subdomain}.${hostedZoneDomain}`; describe("Stack", () => { const app = new cdk.App({ context: { areAlarmsEnabled: "true", certificateARN: undefined, documentsTableARN: undefined, hostedZoneId: "0000000000000", hostedZoneDomain, isWAFEnabled: "true", subdomain, }, }); const stack = new Stack(app, "MyTestStack", { env: { // CDK local defaults account: "000000000000", region: "us-west-2", }, }); const template = Template.fromStack(stack); it("should contain a WebACL", () => { expectWebACL(template); }); it("should contain our example DynamoDB table", () => { expectDynamoDBTable(template, { KeySchema: Match.arrayWith([ { AttributeName: "pk", KeyType: "HASH", }, { AttributeName: "sk", KeyType: "RANGE", }, ]), TableName: "Example", }); }); it("should contain our example S3 bucket", () => { expectS3Bucket(template, { BucketName: "example", }); }); describe("ExampleService", () => { it("should contain an ApiGateway pointing at Lambda", () => { expectApolloServiceLambda(template, { componentId: exampleComponentId, }); }); describe("Metrics and alarms", () => { it("should contain an SNS performance topic", () => { template.hasResource("AWS::CloudWatch::Alarm", {}); template.hasResourceProperties("AWS::SNS::Subscription", { Protocol: "email", }); }); it("should contain an ExampleMetric metric and alarm", () => { template.hasResourceProperties("AWS::CloudWatch::Alarm", { ComparisonOperator: "GreaterThanOrEqualToThreshold", AlarmActions: [ { Ref: Match.stringLikeRegexp(exampleComponentId), }, ], MetricName: Match.stringLikeRegexp(metricNameExampleMetric), Namespace: cloudWatchMetricsNamespace, Statistic: "Average", }); }); }); }); }); const expectWebACL = (template: Template) => { // WAF template.hasResourceProperties("AWS::WAFv2::WebACL", { Scope: "REGIONAL", Rules: [ Match.objectLike({ Statement: { ManagedRuleGroupStatement: { Name: "AWSManagedRulesCommonRuleSet", VendorName: "AWS", }, }, }), Match.objectLike({ Statement: { ManagedRuleGroupStatement: { Name: "AWSManagedRulesAnonymousIpList", VendorName: "AWS", }, }, }), Match.objectLike({ Statement: { ManagedRuleGroupStatement: { Name: "AWSManagedRulesAmazonIpReputationList", VendorName: "AWS", }, }, }), ], }); }; const expectDynamoDBTable = ( template: Template, properties: { TableName: string; KeySchema: Matcher; } ) => { template.hasResource("AWS::DynamoDB::Table", { Properties: { ...properties, }, }); }; const expectS3Bucket = ( template: Template, properties: { BucketName: string } ) => { template.hasResource("AWS::S3::Bucket", { Properties: { ...properties, }, }); }; interface ApolloServiceLambdaProps { componentId: string; } const expectApolloServiceLambda = ( template: Template, { componentId }: ApolloServiceLambdaProps ) => { template.hasResource("AWS::WAFv2::WebACLAssociation", { Properties: { ResourceArn: { "Fn::Join": Match.arrayWith([ "", Match.arrayWith([ Match.stringLikeRegexp("arn:aws:apigateway:"), Match.objectLike({ Ref: Match.stringLikeRegexp(`RestApi`), }), ]), ]), }, WebACLArn: { "Fn::GetAtt": Match.arrayWith([Match.stringLikeRegexp(`WebACL`)]), }, }, }); // API template.hasResourceProperties("AWS::ApiGateway::RestApi", { Name: `${componentId} graphql endpoint`, }); template.hasResourceProperties("AWS::ApiGateway::Method", { HttpMethod: "OPTIONS", RestApiId: { Ref: Match.stringLikeRegexp(`${componentId}RestApi`), }, }); template.hasResourceProperties("AWS::ApiGateway::Method", { HttpMethod: "POST", RestApiId: { Ref: Match.stringLikeRegexp(`${componentId}RestApi`), }, }); template.hasResource("AWS::ApiGateway::Method", { Properties: { HttpMethod: "GET", RestApiId: { Ref: Match.stringLikeRegexp(`${componentId}RestApi`), }, }, }); // Lambda template.hasResource("AWS::Lambda::Function", { Properties: { Handler: "index.handler", Runtime: runtime.name, Timeout: 30, TracingConfig: { Mode: "Active", }, }, DependsOn: [ Match.stringLikeRegexp(`${componentId}LambdaServiceRole`), Match.stringLikeRegexp(`${componentId}LambdaServiceRole`), ], }); template.hasResource("AWS::Route53::RecordSet", { Properties: { Type: "A", AliasTarget: { DNSName: { "Fn::GetAtt": Match.arrayWith([ Match.stringLikeRegexp(`${componentId}RestApiCustomDomain`), ]), }, }, }, }); template.hasResource("AWS::CloudFormation::CustomResource", { Properties: { DomainName: hostName, SubjectAlternativeNames: Match.arrayWith([ Match.stringLikeRegexp(hostName), ]), HostedZoneId: Match.anyValue(), }, }); };

Usage

Then you can iterate on your CDK components by deploying:

cdklocal deploy --region us-east-1

Or using cdklocal watch. A great alternative to cdklocal deploy when developing. It will automatically redeploy to LocalStack upon file change. This does make a nodemon style reload of the lambda possible, but its still painfully slow. Our team has opted to just run nodemon instead.

cdklocal watch --region us-east-1

The fastest way to run is to use Node locally running a simple server. Jest will do the same. Simply launch the dev command from the root package:

npm run dev

Writing services for our CDK setup

The CDK setup we've used above has some specifics have to comply with. Our primary problems are these:

  • Developers locally will want nodemon and jest requiring standard src/index.ts handling.
  • CDK will expect lambda.ts handling.

With some minimal effort we can extract the parts into shared files, and have limited launch specific logic. Our primary concern will be the difference between the bundled code and lambda operations. I'll comment to those specifics.

packages/*/src/index.ts

Our standard Node based index.ts will look very familiar to most Express developers:

import { logger, IS_PRODUCTION, useExpressErrorLogging, useCors, useBodyRequestParsing, useErrorHandling, use404Handler, } from "@torch-ai-internal/express-common"; import dotenv from "dotenv"; logger.debug("Applying .env file variables"); // In specificity order as dotenv won't overwrite something already loaded dotenv.config({ path: "./.env.local" }); const { error: environmentError } = dotenv.config({ path: "./.env" }); if (environmentError) { throw environmentError; } import { onProcessUncaughtException, onProcessUnhandledRejection, onProcessExit, onAppListening, } from "@torch-ai-internal/express-common"; import { PORT, ROOT_URL } from "./constants"; import express from "express"; import { GraphQLSchema } from "graphql"; import { getResolvers } from "./src"; import { useGraphQL } from "@torch-ai-internal/express-graphql"; import { mergeTypeDefs } from "@graphql-tools/merge"; import { loadFilesSync } from "@graphql-tools/load-files"; import { getDefaultOptions } from "./server"; process.on("uncaughtException", onProcessUncaughtException); process.on("unhandledRejection", onProcessUnhandledRejection); process.on("exit", onProcessExit); // Node can use file watchers and glob patterns. Manual maintenance is not required here. // If a new graphql file is created, be sure to update the lambda.ts imports. const typeDefsArray = loadFilesSync(["src/**/*.graphql"]); const typeDefs = mergeTypeDefs(typeDefsArray); export const getApplication = async (): Promise<express.Application> => { logger.debug("Configuring application"); const app = express(); const schema: GraphQLSchema = await getResolvers(typeDefs); useExpressErrorLogging(app); useCors(app); useBodyRequestParsing(app); await useGraphQL( app, { ...getDefaultOptions(), schema, }, ROOT_URL ); useErrorHandling(app); use404Handler(app); return app; }; getApplication().then((app) => { const server = app.listen(process.env.NODE_ENV !== "test" ? PORT : 0, () => { onAppListening(app, server); }); });

packages/*/src/lambda.ts

The lambda entry is very similar, but comes with preconfigured Express defaults so we omit those. Here we have to pay attention to the direct file imports of each .graphql file.

// noinspection JSUnusedGlobalSymbols import { ApolloServer } from "apollo-server-lambda"; import { getResolvers } from "./src"; import { GraphQLSchema } from "graphql"; import { Handler } from "aws-lambda"; import { getDefaultOptions } from "./server"; import { loadFilesSync } from "@graphql-tools/load-files"; import { mergeTypeDefs } from "@graphql-tools/merge"; import componentTypes from "./src/src.graphql"; import interfacesTypes from "./src/interfaces.graphql"; import s3Types from "./src/s3/s3.graphql"; import systemTypes from "./src/system/system.graphql"; import usersTypes from "./src/users/users.graphql"; // CDK uses esbuild, with loaders "graphql": "file". // The imports in bundled context are actual pointers to in the bundle's root. const typeDefsArray = loadFilesSync([ componentTypes as unknown as string, interfacesTypes as unknown as string, s3Types as unknown as string, usersTypes as unknown as string, systemTypes as unknown as string, ]); const typeDefs = mergeTypeDefs(typeDefsArray); // This is the export declared in lib/Stack new ApolloService({lambdaProps.handler}) export const handler: Handler = async (event, context, callback) => { const schema: GraphQLSchema = await getResolvers(typeDefs); const server = new ApolloServer({ ...getDefaultOptions(), schema: schema, }); const handler = server.createHandler(); return handler(event, context, callback); };

Unifying GraphQL context differences between Lambda and Node

One last pitfall you'll need to manage is the difference in context. A lambda provides:

export interface LambdaContextFunctionParams { event: ReturnType<typeof getCurrentInvoke>["event"]; context: ReturnType<typeof getCurrentInvoke>["context"]; express: ExpressContext; }

To avoid rewriting all of your existing logic, I recommend a simple remapping function as you build context for each call:

// Our base line context, prior to any request and user specific details export interface SystemContext { s3Client: S3Client; dynamoDBClient: DynamoDBClient; examplesRepo: examplesRepo; } export const getSystemContext = async (): Promise<SystemContext> => { const s3Client = getS3Client(); const dynamoDBClient = getDynamoDBClient(); return { s3Client, dynamoDBClient, examplesRepo: new examplesRepo(dynamoDBClient), }; }; // Additions for SystemContext that are present during a request that merge to create GraphQLContext. export interface UserContext { user: User; authorization?: string; } export interface GraphQLContext extends UserContext, SystemContext, ExpressContext, Partial<Pick<LambdaContextFunctionParams, "context" | "event">> { // The event object contains the API Gateway event (HTTP headers, HTTP method, body, path, ...). // The context object (not to be confused with the context function itself!) contains the current // Lambda Context (Function Name, Function Version, awsRequestId, time remaining, ...). } export type TGetGraphQLContextAdditions = ( // What we get depends on our launching context // Our locals will not have the lamda additions. context: LambdaContextFunctionParams | ExpressContext ) => Promise<GraphQLContext>; export const getGraphQLContextAdditions: TGetGraphQLContextAdditions = async ( context ) => { // Pull { req, res } from the Lambda's context if present, or fall back to the standard Node. const expressContext: ExpressContext = { req: "express" in context ? context.express.req : context.req, res: "express" in context ? context.express.res : context.res, }; const systemContext = await getSystemContext(); const user = await getUserFromRequest(expressContext); return { ...expressContext, ...systemContext, ...context, user, authorization: expressContext.req.headers.authorization, }; };

We can then use getGraphQLContextAdditions in a shared server.ts file that provides configuration to lambda.ts and index.ts:

import { Config } from "apollo-server-core/src/types"; import { getGraphQLContextAdditions } from "./src/context"; import { ApolloServerPluginLandingPageGraphQLPlayground } from "apollo-server-core"; export const getDefaultOptions = (): Config => ({ csrfPrevention: true, cache: "bounded", context: getGraphQLContextAdditions, debug: true, plugins: [ApolloServerPluginLandingPageGraphQLPlayground()], introspection: true, });

Creating metrics data for alarms

AWS Labs offers a specialized package to embed metrics in your standard log outputs. It's a nice way to get performance metrics without making overhead API calls to log them.

A few minor configuration steps are required:

Use logging or bootstrap file to set the default configuration for metrics to your desired namespace. Recall that we imported this namespace into the alarm setup in ExampleService.

import { Configuration } from "aws-embedded-metrics"; // Set the namespace for our AWS metrics export const cloudWatchMetricsNamespace = `Prism/Application/API/${name}`; Configuration.namespace = cloudWatchMetricsNamespace;

Log metrics anyway you like, I'll drop my specific implementation but many things will work. One critical caveat needs to be understood though: The embedded metrics include dimensions by default. Unless removed, the dimensions will cause aggregation failures. You can learn about the potential change of that behavior that would remedy that at RFC - Remove LogGroupName as default dimension.

Until the behavior changes, no matter your implementation, just include lines to reset the dimensions:

const metrics = createMetricsLogger(); metrics.setDimensions({});

A full example implementation I use looks like this:

import { createMetricsLogger, Unit } from "aws-embedded-metrics"; import { IS_EXECUTION_IN_LAMBDA } from "../constants"; /** * Creates and updates CloudWatch metric data. * ⚠️ Use judiciously. Each metric incurs unique costs. * * If provided context, it will be used also log to CloudWatch in ECS format. * * Suggested usage: * // Start a timer * const getDuration = getDurationTimer(); * // Do your thing * await someComplexFunction(); * // Log the duration metric * await logDurationMetric(context, "ExampleMetrics", getDuration); **/ type LogDurationMetric = ( metricName: string, getDuration: ReturnType<typeof getDurationTimer>, options?: { dimensionSets?: Record<string, string>; properties?: Record<string, string>; } ) => Promise<void>; export const logDurationMetric: LogDurationMetric = async ( metricName, getDuration, options ) => { if (!IS_EXECUTION_IN_LAMBDA) { return; } const duration = getDuration(); const { dimensionSets = {}, properties = {} } = options || {}; const metrics = createMetricsLogger(); metrics.setDimensions(dimensionSets); Object.values(properties).forEach(([key, value]) => { metrics.setProperty(key, value); }); metrics.putMetric(metricName, duration / 1000, Unit.Microseconds); // Nano to micro await metrics.flush(); }; /** * Returns a function that can be called to get the time in nanoseconds since the timer was started. * * const getDuration = getDurationTimer(); * const duration = getDuration(); */ export const getDurationTimer = (): (() => number) => { const start = process.hrtime.bigint(); return () => { const end = process.hrtime.bigint(); return Number(end - start); }; };

CI/CD

Validation

With a bit more setup, you can leverage the LocalStack setup for developers to also run your tests on pull requests! Our setup involves Azure DevOps pipelines hooked up as build requirements. I'll show the basics of that here.

We require docker-compose.yml but its just a clone of the generic repo's. I won't both reposting the contents. I wonder if I could perhaps setup that repo as a unique upstream or a sub repo to keep in sync or something... No matter! Onward!

azure-pipelines/validation.yml

# Node.js # Build a general Node.js project with npm. # Add steps that analyze code, save build artifacts, deploy, and more: # https://docs.microsoft.com/azure/devops/pipelines/languages/javascript pool: vmImage: ubuntu-latest trigger: none steps: # Ensure we're using the latest packages. The initial set on hosted machines are out of date. - task: CmdLine@2 inputs: script: sudo npm install --location=global aws-cdk-local aws-cdk npm@latest displayName: npm install npm@latest aws-cdk-local aws-cdk # Ensure our CDK packages pass security standards - task: Npm@1 displayName: npm audit inputs: command: custom customCommand: audit --audit-level=high --omit=dev workingDir: . # Ensure our API packages pass security standards - task: Npm@1 displayName: npm audit example inputs: command: custom customCommand: audit --audit-level=high --omit=dev workingDir: ./packages/example # Install the CDK packages so it can operate - task: Npm@1 inputs: command: ci workingDir: . displayName: npm ci # Install the API packages so it can operate - task: Npm@1 inputs: command: ci workingDir: ./packages/example displayName: npm ci example # Use the tests we wrote for CDK to check deployment produces what it should - task: Npm@1 displayName: root npm run test inputs: command: custom workingDir: . customCommand: run test # Use Docker to start a LocalStack instance - task: DockerCompose@0 inputs: containerregistrytype: "Container Registry" dockerComposeFile: "./azure-pipelines/docker-compose.yml" action: "Run a Docker Compose command" dockerComposeCommand: "up -d" displayName: start localstack # Give LocalStack a little bit longer to allow all services to come online. # Typical failure messages indicate credential failures. # TODO Someday we should make this a loop that pings the stack - task: CmdLine@2 inputs: script: sleep 5 displayName: Delaying for localStack viability # Bootstrap the LocalStack - task: CmdLine@2 inputs: script: | cdklocal bootstrap --region us-east-1 --verbose displayName: cdklocal bootstrap # Deploy our stack - task: CmdLine@2 inputs: script: | cdklocal deploy --region us-east-1 --require-approval never --verbose displayName: cdklocal deploy # Add the required static variables used during compile for this environment - script: | echo Writing: .env.local { echo 'DYNAMODB_ENDPOINT=http://localhost:4566' echo 'DYNAMODB_REGION=us-east-1' echo 'DYNAMODB_EXAMPLE_TABLE_NAME: "examples"', echo 'AWS_ACCESS_KEY_ID="dummy"' echo 'AWS_SECRET_ACCESS_KEY="dummy"' echo 'AWS_SESSION_TOKEN="dummy"' echo 'S3_EXAMPLES_IMAGES_BUCKET=examples' echo 'S3_ENDPOINT=http://localhost:4566' echo 'S3_REGION=us-east-1' } > .env.local cat .env.local workingDirectory: ./packages/example displayName: "Write example .env.local file" # Use Jest to fire the tests pointing at our deployed LocalStack src # You _must_ write your tests with something like a `beforeAll` that inserts minimal test data. - task: Npm@1 inputs: command: "custom" workingDir: "./packages/example" customCommand: "run test" displayName: example npm run test

Deployment

We use a single multi-stage pipeline, and a set of templates shared between them:

azure-pipelines/deployment/pipeline.yml

The pipeline.yml file sets the stages and provides the required environment specific variables for each.

trigger: batch: true branches: include: - main paths: exclude: - README.md pool: vmImage: ubuntu-latest demands: - npm stages: - stage: Prepare displayName: Prepare the base artifact jobs: - job: Prepare displayName: Prepare job steps: - publish: $(System.DefaultWorkingDirectory) artifact: drop - stage: Development displayName: Deploy to the Development environment variables: certificateARN: "" # Using the cert caused me 'not included in cert' grief issues. Easier to generate. exampleTableARN: "arn:aws:dynamodb:us-east-2:01...:table/examples" exampleTableName: examples exampleBucketARN: "arn:aws:s3:::development-example-bucket" exampleBucketKMSKeyARN: "arn:aws:kms:us-east-2:01...:key/85..." exampleBucketName: "development-example-bucket" hostedZoneId: Z00... hostedZoneDomain: indigo.dev-int.torch.ai regionName: us-east-1 subdomain: api dependsOn: Prepare condition: succeeded() jobs: - deployment: Deploy environment: Development strategy: runOnce: deploy: steps: - download: current artifact: drop - template: templates/npm-update.yml - template: templates/npm-ci.yml - template: templates/aws-deploy.yml parameters: # This is the name of a service connection created by our DevOps team # via AWS Toolkit for Azure DevOps. See https://aws.amazon.com/vsts/ awsCredentials: "AWS_Development" # This parameter accepts a string, so we're abusing yml's multiline string concatenation. cdkAwsArguments: >- --context areAlarmsEnabled=$(areAlarmsEnabled) --context certificateARN=$(certificateARN) --context exampleTableARN=$(exampleTableARN) --context exampleTableName=$(exampleTableName) --context exampleBucketARN=$(exampleBucketARN) --context exampleBucketKMSKeyARN=$(exampleBucketKMSKeyARN) --context exampleBucketName=$(exampleBucketName) --context hostedZoneId=$(hostedZoneId) --context hostedZoneDomain=$(hostedZoneDomain) --context isWAFEnabled=true --context subdomain=$(subdomain) regionName: $(regionName) - stage: Test displayName: Deploy to the Test environment variables: # Same as above dependsOn: Development condition: succeeded() jobs: # Same as above

azure-pipelines/deployment/templates/pipeline.yml

This ensure we're working on the latest version of npm. Hosted machines have an older version by default.

steps: - task: CmdLine@2 inputs: script: sudo npm install npm@latest --location=global displayName: npm install npm@latest --location=global

azure-pipelines/deployment/templates/npm-ci.yml

This installs all the required node modules, both for the root CDK project, and any APIs it deploys.

parameters: # The location of the chart downloaded to the stage from the artifact - name: workingDirectory type: string default: $(Pipeline.Workspace)/drop steps: - task: Npm@1 inputs: command: ci workingDir: ${{ parameters.workingDirectory }} displayName: npm ci - root - task: Npm@1 inputs: command: ci workingDir: ${{ parameters.workingDirectory }}/packages/example displayName: npm ci - example

azure-pipelines/deployment/templates/aws-deploy.yml

This performs the actual CDK deployment.

parameters: - name: awsCredentials type: string - name: regionName type: string - name: cdkAwsArguments type: string default: "" - name: workingDirectory type: string default: $(Pipeline.Workspace)/drop steps: - script: | echo "Installing packages" sudo npm install --location=global aws-cdk ts-node displayName: "Installing node packages for CDK" # The CLI task told me 'cdk' was not a valid argument # - task: AWSCLI@1 # inputs: # awsCommand: 'cdk' # awsSubCommand: 'deploy' - task: AWSShellScript@1 inputs: awsCredentials: ${{ parameters.awsCredentials }} disableAutoCwd: true inlineScript: cdk deploy --ci --require-approval never ${{ parameters.cdkAwsArguments }} regionName: ${{ parameters.regionName }} scriptType: inline workingDirectory: ${{ parameters.workingDirectory }} displayName: aws cdk deploy

It's been quite a slog moving from nice pretty helm and Kubernetes based deployments, but the option to use CDK has felt almost natural after some surprises. Our DevOps team can still focus on their needs, and let teams have full control to create assets using code. It's a fine compromise. Hopefully this helps someone out there.

Cheers! 🥂