0

its just a question to understand if maybe the function could create some problems/fails in the large file. i have >10 users who want to read/write not exactly in the same time but nearly as a background progress with a .py script the same large file. each user has his own line where huge relation information to one other user has been written as a really large string. as example:

[['user1','user2','1'],['user6','user50','2'],['and so on']]
['user1','user2','this;is;the;really;long;string;..(i am 18k letters long)...']
['user6','user50','this;is;the;really;long;string;..(i am 16k letters long)...']   
...and so on

now user 1 want just to read his line 1 and user 6 wants to remove his own line 2. so now my questions:

  1. i cant find any problems if all users just read the file, but if user 6 wants to delete his own line information and rewrite the line 0 with the new information and rewrite the other lines to a newline position, how would the other users >10 would read the file if user 6 needs more time to write the new file as the other users >10? they dont need so long to open the file and if they down wait to let user 6 finished his job, the others would read the wrong information for the file

  2. would be enough to write the .py script

     f = open(fileNameArr, "rw")
     ....
     f.close()
    

to solve that problem? or maybe "rwb+" or what would be needed to do for that?

  1. Should i insert some temp timeout function in the .py script as example i have to insert it in php set_time_limit(300); for long calculations and outputs?

for any input to understand a big thx up to you.

trash1
  • 3
  • 4
  • Since, in unix, files don't change until they're saved, I think the best you can do is (1) record the last-modified-time when opening the file, (2) periodically check that the last-modified-time hasn't changed, (3) if it has, reload the file and merge current edits with original state somehow – Green Cloak Guy Nov 27 '20 at 14:57
  • Sounds like you need to discover databases. – tripleee Nov 27 '20 at 15:06
  • @GreenCloakGuy hi.thx for your point of view how the system works. sure and thx to that pointer trick. i had the same idea with a singel '1' or '0' in line 0 of the file, but that schould creates large waiting times and maybe problems too. could you tell me if you know exactly. if you open the file to write and the other .sh scripts >10 are waiting for it to write that file to. how the write function would works if >10 scripts are waiting to open and write the same file to dont have a overlapping time, if time is the pointer for such a write function? – trash1 Nov 27 '20 at 15:11
  • @tripleee nope i have 30MB+ huge files which are for singe users.if i have 1000 users i would have a nearly 30 GB huge database and .sql file which are not really cool. i tried but the system is so really f....g slow. but maybe you know something i dont know? :) – trash1 Nov 27 '20 at 15:14
  • I'm not a database person really but that sounds like you also need to discover database indices. – tripleee Nov 27 '20 at 18:37

1 Answers1

0

You should look up Unix file management - Unix doesn't give you a great out-of-the-box solution to this problem.

The short version is that any number of processes can read the same file at once, but under most sets of permissions, any process can overwrite the file. Unlike on, say, Windows, where the OS prevents multiple programs from editing the same file at once, on Unix any write will overwrite all previous writes - if two users start from the same base file and make different changes, then whoever calls .write() most recently will win. Yes, this does cause concurrency issues.

The answer above mentions some countermeasures - namely, enforcing file-locking at a software level in your program, which is essentially what I suggested in a comment - but to my knowledge there's no generalized solution to this issue.

Google Docs and the rest of Drive have collaborative file editing that, though the code is obviously not public, seems to use Operational Transformation as its main approach, in which, essentially, no user can directly modify the file, and instead of using typical file I/O commands each user sends the server its desired modifications and the server sorts out concurrency issues.


Maybe you should rethink the way you've designed this system? Why is all of this information stored within a single file, with each line dedicated to a specific user? Why not have multiple, smaller files, one for each user, which would cut down on the concurrency issues with reading/writing? Why not use a database to store this information instead, and let the database handle the concurrency issues? Most databases can handle arbitrarily large strings, and though some aren't easily scalable to the 30GB you mention in your question, others definitely are.

Green Cloak Guy
  • 23,793
  • 4
  • 33
  • 53
  • thx again and yes i think i have to rethink my system. btw. thx you helped me a lot to understand how unix and maybe the most of the programming languages works with such a problem. – trash1 Nov 27 '20 at 15:45