Fault Injection Into Google Cloud Platform
This guide will walk you through injecting network faults into Google Cloud Platform Cloud Run. You will not need to change any code.
Prerequisites
-
Install fault
If you haven’t installed fault yet, follow the installation instructions.
Inject Latency Into a Cloud Run Service
Clmoud Run is the GCP platform to run workload using containers. The approach taken by fault is to create a new revision where we add a sidecar container to an existing Cloud Run specification. This container then becomes the entrypoint of network traffic. fault is configured to then route all traffic from that port to the application's port transparently. When done, we rollback to the previous revision.
raffic Before fault Is Injected
---
config:
theme: 'default'
themeVariables:
'git0': '#ff00ff'
gitGraph:
showBranches: true
showCommitLabel: true
mainBranchName: 'normal'
---
gitGraph
commit id: "LB"
commit id: "Backend Service"
commit id: "Cloud Run"
commit id: "Application Container"
Traffic After fault Is Injected
---
config:
theme: 'default'
themeVariables:
'git0': '#ff00ff'
'git1': '#00ffff'
gitGraph:
showBranches: true
showCommitLabel: true
mainBranchName: 'normal'
---
gitGraph
commit id: "LB"
commit id: "Injected" type: HIGHLIGHT
commit id: "Backend Service"
branch fault
commit id: "Cloud Run"
commit id: "fault Container"
commit id: "Application Container"
checkout normal
merge fault id: "Rolled back" type: HIGHLIGHT
-
Create a basic Cloud Run service
You may want to follow the official GCP documentation to deploy a sample service.
-
Upload the fault container image to a GCP artifactory
Cloud Run will expect the fault image to be pulled from an artifactory in the same region (or a global one). So this means, you must upload the official fault image to your own artifactory repository.
Follow the official documentation to upload the fault image
Something along the lines:
# locally download the official fault image docker pull ghcr.io/rebound-how/fault:<version> # tag it to match your nex GCP Artifactory repository docker tag ghcr.io/rebound-how/fault:<version> <region>-docker.pkg.dev/<project>/<repository>/fault:<version> # push it to the repository docker push <region>-docker.pkg.dev/<project>/<repository>/fault:<version>
-
Inject fault into the nginx service
The following injects a
800ms
into the service response time.fault inject gcp \ --project <project> \ # (1)! --region <region> \ # (2)! --service <service> \ # (3)! --image <image> \ # (4)! --duration 30s \ # (5)! --with-latency --latency-mean 800
- The GCP project where your CloudRun service is running
- The GCP region where your CloudRun service is running
- The GCP CloudRun service name
- The fault container image full url
- Optional duration after which the injection rollbacks. If unset, the user input is expected
When you do not explicitly set the service, fault lets you pick up one from the CLI:
fault inject gcp \ --project <project> \ --region <region> \ --image <image> \ --with-latency --latency-mean 800 ? Service: > hello [↑↓ to move, enter to select, type to filter]
Once started, a new revision of the service will be deployed with the fault process running as a sidecar container of the service's main container. It will expose a port to receive traffic and route it to the application.