-1

I make a YAML file as a configuration file and read and write python code through the YAML library.

However, this YAML file is read and written by multiple processes, but sometimes the file is corrupted and an error occurs.

As shown below, one character is added below the existing content.

test1: 6511.75277715
test2: false
test3: false
test4: ''
test5: 13.20523311014153
test6: 0.6054349199466555
test7: 0
test8: -1
test9: 33473012.13609034
test1: 6511.75277715
test2: false
test3: false
test4: ''
test5: 13.20523311014153
test6: 0.6054349199466555
test7: 0
test8: -1
test9: 33473012.13609034
4   <<< added letters

I am using python and I am writing a YAML file in the way below. How can I avoid this?

with open(app_setting_dir, "w") as f:
        yaml.dump(data, f)

Could you please take a look at the code with the fasteners applied?? What I want is to prevent any .py files other than 1.py from modifying yaml while the 1.py file is running.

app_setting_dir = "./setting/app_setting.yaml"

def read_settings(dir):
    with open(dir) as f:
        return yaml.safe_load(f)

def write_setting2(data):
    ruamel_yaml = ruamel.yaml.YAML(typ="safe")
    rw_lock = fasteners.InterProcessReaderWriterLock(app_setting_dir)

    with rw_lock.write_lock():
        with open(app_setting_dir, "w") as f:
            yaml.dump(data, f)

main():

    while True:
        APP_SETTING_DICT = read_settings(app_setting_dir)
        if APP_SETTING_DICT != None:
            break

    print("read data : ",APP_SETTING_DICT)

    APP_SETTING_DICT["test1"] = result[0]
    APP_SETTING_DICT["test2"] = result[1]

    write_setting2(APP_SETTING_DICT)

윤태일
  • 537
  • 1
  • 9
  • 21
  • You may want to read up on the [readers-writers problem](https://en.wikipedia.org/wiki/Readers–writers_problem). What you are doing is inherently unsafe. – chepner May 18 '22 at 17:31

2 Answers2

1

The dumping can take a while, and if in that time another process starts writing to the same file you get strange results depending on how much of the file buffer has been written out.

What you should consider is using locks on the file, /e.g. using fasteners. That however requires that you think about what you do with concurrent updates. If you read the YAML, update it and then dump it while you hold the lock, you will be fine. If you only get the lock for writing, then you might overwrite a change some other process made to the file, so I recommend not doing that.

If you don't need to preserve comments etc in your YAML, I recommend using the typ='safe' parameter for much faster loading and dumping:

import ruamel.yaml
import pathlib
import fasteners

app_settings = Path('your.yaml')

def update_yaml(fn, newvalues):
    yaml = ruamel.yaml.YAML(typ='safe')
    rw_lock = fasteners.InterProcessReaderWriterLock('path/to/lock.file')
    with rw_lock.write_lock():
         data = yaml.load(fn)
         data.update(newvalues)  # or however you update the data
         yaml.dump(data, fn)

nv = .....   # do your calculations setting 

update_yaml(app_settings, nv)
Anthon
  • 69,918
  • 32
  • 186
  • 246
  • Thank you for answer. I've used fasteners before, but when tested in a Windows environment, the locks seemed to work. So, I tried to apply it in the Ubuntu environment, but I gave up because I thought it could not be applied in Ubuntu. (The file was opened in two places and data was written on both sides, but the lock did not work properly and the file was modified) I'll try again as per the answer. – 윤태일 May 19 '22 at 01:26
  • I have two questions. First, is it correct to put the yaml path in ```fasteners.InterProcess ReaderWriterLock('path/to/lock.file')```??? The second is the ```'str' object has no attribute 'write'``` error from yaml.dump(). Do I have to open the file separately by any chance?? (like a ```with open(app_setting_dir, "w") as f:```) – 윤태일 May 19 '22 at 14:21
  • That is incorrect. You need all processes to use the same lockfile, but that shouldn't be a file used for something else (like YAML). I use `app_settings` which is a Path instance. If you have a filename in variable `app_settings_dir` (I assumed that is a driectory from the name) you can do `yaml.dump(data, Path(app_settings_dir))`. Maybe you should look into the `pathlib` standard library. – Anthon May 19 '22 at 14:53
  • And you can do `with open(app_setting_dir, 'wb') as fp: yaml.dump(data, fp)` as well, the second parameter is the stream to write to, or a `Path` instance, that will be correctly opened for you. – Anthon May 19 '22 at 14:54
  • I am really sorry. I don't quite understand. I edited the text, could you please take a look? – 윤태일 May 23 '22 at 06:44
  • @윤태일 SO is not an interactive support platform so only use edits for make the question more clear, not to pose new questions. If you have a different question post it as such. My code clearly puts the data read and write operation within the lock, if you don't understand why that is so, just use it **as is** instead of breaking the code (as per your updated question). – Anthon May 23 '22 at 07:16
0

You could set the explicit_end parameter of yaml.dump to true. This will add ... to the end of the file. This might help you find the cause of the appended extra digit.

Bouke
  • 1,531
  • 1
  • 12
  • 21