0

I am learning chaos engineering and I am following a tutorial, but my code is not running as it should.

The service I am testing.

service.py

import io
import time
import threading
from wsgiref.validate import validator
from wsgiref.simple_server import make_server

EXAMPLE_FILE = './example.dat'

def update_file():
    """Write the current time to the file every second."""
    print('Updating file...')
    while True:
        with open(EXAMPLE_FILE, 'w') as f:
            f.write(datetime.now().isoformat())
        time.sleep(1)

def simple_app(environ, start_response):
    """A simple WSGI application.

    This application just writes the current time to the response.
    """
    status = '200 OK'
    headers = [('Content-type', 'text/plain; charset=utf-8')]
    start_response(status, headers)
    with open(EXAMPLE_FILE, 'r') as f:
        return [f.read().encode('utf-8')]

if __name__ == '__main__':
    # Start the file update thread.
    t = threading.Thread(target=update_file)
    t.start()
    httpd = make_server('', 8000, simple_app)
    print("Serving on port 8000...")
    try:
        httpd.serve_forever()
    except KeyboardInterrupt:
        print("\nKeyboard interrupt received, exiting.")
        httpd.shutdown()
        t.join(timeout=1)
        print("Exiting.")

My chaos experiment file experiment.json

  "title": "Does our service tolerate the loss of its example file?",
  "description": "Our service reads data from a file, can it work without it?",
  "tags": ["tutorial", "filesystem"],

  "steady-state-hypothesis": {
    "title": "The exchange file must exist",
    "probes": [
      {
        "type": "probe",
        "name": "service-is-unavailable",
        "tolerance": [200, 503],
        "provider": {
          "type": "http",
          "url": "http://localhost:8000"
        }
      }
    ]
  },
  "method": [
    {
      "name": "move-example-file",
      "type": "action",
      "provider": {
        "type": "python",
        "module": "os",
        "func": "rename",
        "arguments": {
          "src": "./example.dat",
          "dst": "./example.dat.old"
        }
      }
    }
  ]
}

But instead of renaming my old file, chaos creates a new file with the provided name and the experiment ends with a success, which I am not expecting.

enter image description here

Please help.

evanstjabadi
  • 305
  • 3
  • 10
  • first use `print()` to see what you have in variables and which part of code is executed. It is called `"print debuging"` – furas Dec 06 '21 at 14:44
  • I don;t see where you rename files. I don't see where you even load `experiment.json`. As for me this code can't create `example.dat.old`. All this code has nothing to do with renaming. – furas Dec 06 '21 at 14:46
  • I am using chaos library. I run ```service.py```, then I run ```chaos run experiment.json``` command. The rename method is within the experiment file and chaos uses that to create the example.dat.old. That works fine. But the problem is that it creates a new file, instead of renaming the current one. – evanstjabadi Dec 06 '21 at 15:20
  • Unless, if each time the ```service.py``` runs, it creates the file again. – evanstjabadi Dec 06 '21 at 15:21
  • I have one idea: maybe it creates file `.dat.old`, copy data and but it has problem to delete old file. Maybe original file needs different privileges – furas Dec 06 '21 at 16:19
  • Thanks for your suggestion @furas....things were happening as they should, just too fast to notice. Added my answer below. – evanstjabadi Dec 06 '21 at 23:42

1 Answers1

1

Finally!

Problem: The update_file updated the example.dat file every second and if it didn't exist, it would just create it! So when chaos renames example.dat to example.dat.old, update_file just creates another example.dat and it seems like the chaos steady-state hypothesis is met all the time.

One solution: Set the update_file to run after a significantly longer time. In my case, time.sleep(60) worked!

Logs from chaos run experiment.json

[2021-12-07 01:11:19 INFO] Validating the experiment's syntax
[2021-12-07 01:11:19 INFO] Experiment looks valid
[2021-12-07 01:11:19 INFO] Running experiment: Does our service tolerate the loss of its example file?     
[2021-12-07 01:11:19 INFO] Steady-state strategy: default
[2021-12-07 01:11:19 INFO] Rollbacks strategy: default
[2021-12-07 01:11:19 INFO] Steady state hypothesis: The exchange file must exist
[2021-12-07 01:11:19 INFO] Probe: service-is-unavailable
[2021-12-07 01:11:21 INFO] Steady state hypothesis is met!
[2021-12-07 01:11:21 INFO] Playing your experiment's method now...
[2021-12-07 01:11:21 INFO] Action: move-example-file
[2021-12-07 01:11:21 INFO] Steady state hypothesis: The exchange file must exist
[2021-12-07 01:11:21 INFO] Probe: service-is-unavailable
[2021-12-07 01:11:23 CRITICAL] Steady state probe 'service-is-unavailable' is not in the given tolerance so failing this experiment
[2021-12-07 01:11:23 INFO] Experiment ended with status: deviated
[2021-12-07 01:11:23 INFO] The steady-state has deviated, a weakness may have been discovered
evanstjabadi
  • 305
  • 3
  • 10