Create and run chaos experiments
Harness Chaos Engineering (CE) gives you the flexibility to create elaborate chaos experiments that help create complex, real-life failure scenarios against which you can validate your applications. At the same time, the chaos experiments are declarative and can be constructed using the Chaos Studio user interface with no programmatic intervention.
A chaos experiment is composed of chaos faults that are arranged in a specific order to create a failure scenario. The chaos faults target various aspects of an application, including the constituent microservices and underlying infrastructure. The parameters associated with these faults can be tuned to impart the desired chaos behavior.
For more information, go to Flow of control in a chaos experiment.
Construct a chaos experiment
To add a chaos experiment:
In Harness, navigate to Chaos > Chaos Experiments.
Select + New Experiment.
Chaos Studio is displayed.
In the Experiment Overview, enter the experiment Name and optional Description and Tags.
In Select a Chaos Infrastructure, select the infrastructure where the target resources reside, and then click Next.
This takes you to the Experiment Builder tab, where you can choose how to start building your experiment.
For more information on infrastructure, go to Connect chaos infrastructures.
Select how you want to build the experiment. The options, explained later, are:
- Blank Canvas - Lets you build the experiment from scratch, adding the specific faults you want.
- Templates from Chaos Hubs - Lets you preview and select and experiment from pre-curated experiment templates available in Chaos Hubs.
- Upload YAML - Lets you upload an experiment manifest YAML file.
These options are explained below.
If you select Blank Canvas:
The Experiment Builder tab is displayed.
Select Add, then select each fault you want to add to the experiment individually.
For each fault you select, you'll tune the fault's properties next.
To tune each fault:
Specify the target application (only for pod-level Kubernetes faults): This lets the application's corresponding pods be targeted.
Tune fault parameters: Every fault has a set of common parameters, such as the chaos duration, ramp time, etc., and a set of unique parameters that may be customised as needed.
Add chaos probes: (Optional) On the Probes tab, you can add chaos probes to automate the chaos hypothesis checks for a fault during the experiment execution. Probes are declarative checks that aid in the validation of certain criteria that are deemed necessary to declare an experiment as passed.
Tune fault weightage: Set the weight for the fault, which sets the importance of the fault relative to the other faults in the experiments. This is used to calculate the resilience score of the experiment.
To add a fault that runs in parallel to another fault, point your mouse below an existing fault, and then select Add.
In Experiment Builder, faults that are stacked vertically run in parallel, and faults or groups of parallel faults run in sequence from left to right.
If you select Templates from Chaos Hubs:
Select an experiment template from a chaos hub.
- Select Experiment Type to see available chaos hubs to select templates from.
- Select a template to see a preview of the faults included.
You can edit the template to add more faults or update the existing faults.
If you select Upload YAML:
Upload an experiment manifest YAML file to create the experiment.
You can edit the experiment to update the existing faults or add more of them.
Save the experiment.
Now, you can choose to either run the experiment right away by selecting the Run button on the top, or create a recurring schedule to run the experiment by selecting the Schedule tab.
Advanced experiment setup options
You can select Advanced Options on the Experiment Builder tab to configure the advanced options (described below) while creating an experiment for a Kubernetes chaos infrastructure:
General options
Node Selector
Specifies the node on which the experiment pods will be scheduled. Provide the node label as a key-value pair.
Can be used with node-level faults to avoid the scheduling of the experiment pod on the target node(s).
Can be used to limit the scheduling of the experiment pods on nodes that have an unsupported OS.
Toleration
Specifies the tolerations that must be satisfied by a tainted node to be able to schedule the experiment pods. For more information on taints and tolerations, go to the Kubernetes documentation.
Can be used with node-level faults to avoid the scheduling of the experiment pod on the target node(s).
Can be used to limit the scheduling of the experiment pods on nodes that have an unsupported OS.
Annotations
Specifies the annotations to be added to the experiment pods. Provide the annotations as key-value pairs. For more information on annotations, go to the Kubernetes documentation.
Can be used for bypassing network proxies enforced by service mesh tools like Istio.
Security options
Enable runAsUser
Specifies the user ID to be used for starting all the processes in the experiment pod containers. By default 1000
user ID is used.
Allows privileged access or restricted access for experiment pods
Enable runAsGroup
Specifies the group ID to be used for starting all the processes in the experiment pod containers instead of a user ID.
Allows privileged access or restricted access for experiment pods
Launch an experiment from a chaos hub
You can launch experiments from the default Enterprise Chaos Hub or from custom hubs.
Launching the experiment from a hub is different from running an experiment from the Chaos Experiments page. The experiments in chaos hubs are actually templates, so when you launch them from a hub you must provide some additional details. The experiments in the Chaos Experiments page execute immediately, as configured, when you run them.
To launch an experiment from a chaos hub:
In Harness, navigate to Chaos > ChaosHubs, and then select the hub you want.
Find the experiment you want to launch, and then select Launch Experiment.
Select a chaos infrastructure, and then select Next.
You can change the infrastructure type if necessary.
Chaos Studio is displayed when you select Next.
In Chaos Studio's Experiment Builder, select the faults in the experiment to configure them.
You can add more faults or delete faults from the experiment, or update the sequence of faults.
Select Run to execute the experiment.
You can also save your customized experiment as a template in a chaos hub using the Save button.
Analyze chaos experiments
You can observe the status of execution of fault/s of a chaos experiment during its run. The screen shows the experiment pipeline on the right hand side, and details such as Environment, Infrastructure Name, and the runs that have passed and failed on the left hand side.
When the experiment completes execution, it displays the Resilience Score. This score describes how resilient your application is to unplanned failures. The probe success percentage helps determine the outcome of every fault in the chaos experiment. Probes (if any) associated with the experiment are used to understand how the application fared.
If any of the faults fail, you can find the Fail Step that elaborates on the reason why the fault failed.