1

SUBSTANTIAL UPDATE:

Previously, I'd thought this must be due to an issue involving NFS, however, now it has occurred with NO NFS involvement - and on one of the very same files. So... "From the top:"

I have a script undergoing development - I always seem to have some script under development - and leaving an edit session open is just standard practice so it's easy to undo changes and try again. And, unlike the literally thousands of times this has gone on over the last 25 years here, on Fedora of some version or other, last night, for the first time, I began getting:

/bin/bash: bad interpreter: Text file busy

when trying to test the script.

The error clears of its own accord eventually, but the wait can be interminably long! Every once in a while the wait is totally ridiculous! ("Ridiculous" here is measured in many minutes - enough to draw me here!) Note that after some edits, there's no delay at all and it just works, and on other occasions it clears in just a few seconds, and then every once in a while it takes "ridiculous time" to clear. And as I'm constantly testing, this is more than just annoying.

Is there anything I can do to force it to clear? (I'm thinking a command-at-the-ready to fire off, if that's possible.)

I did my research: this is NOT a problem that's already noted here - not that I could find anyway... If someone wants me to cite all the similar but different questions / answers here, I'll do that, but prefer not to.

At first I thought this was likely due to heavy IO on the system (and/or NFS). However, it's not. This has even happened when there's no load to speak of. And, I'm the only one logged in and NOTHING would or could touch this script but me as I'm testing it. Nor could anything be messing with the directory tree or even the disk. The ONLY external influence is a continual stream of spam to the mail server on the box. Otherwise, it's just me.

I've been wondering if this is an indication of a hardware error? The system has ECC memory - it's a genuine server system!

As I've been at this literally for decades and NEVER seen this, one wonders "so, what's new?!

Other than a recent reload of Fedora to 35, nothing much! (The Fedora "upgrade" procedure, which is superior to a reinstallation, failed, so we were forced to reinstall "from scratch.")

When I first posted this question, I thought NFS was involved, and this morning I began the edit on the file server system - the one that hosts (among many other things) NFS - and thus, NFS was removed from consideration when it it happened again. And, as I was this time ready with an lsof command, here's what happened:

bash: /scripts/MyNewScript: /bin/bash: bad interpreter: Text file busy
[root@srvr /scripts]# lsof /scripts/MyNewScript
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
      Output information may be incomplete.

(Shortly after this, the script ran and didn't have a problem.)

...I've never taken the time to learn what fuse and gvfsd are. ... Could that / they be the cause?

This is REALLY perplexing!

Richard T
  • 4,570
  • 5
  • 37
  • 49
  • Background that seems like the likely reason [NFS mount: Device or resource busy](https://unix.stackexchange.com/q/348315/197080). If I read correctly,NFS removes the file that is being edited so that the original can be accessed by other clients and doesn't update with changes until the last file handle open with the file is closed. See answer by **roaima**. – David C. Rankin Jun 06 '22 at 04:03
  • @DavidC.Rankin I'm pretty sure that's not it because, first, nobody else is accessing the file but me - it's new! Second, this problem has never happened before and yet I do this kind of thing VERY frequently - developing scripts is part of what I do! Solo, just like this! And, perhaps most telling, it persists a LONG time. So, "what changed" is _always_ the key question. ATM I'm wondering about a network problem? Hardware? ...Other than this program, I know nothing changed because nobody was even logged in but me! – Richard T Jun 06 '22 at 04:44
  • @DavidC.Rankin ...I should have added that the script doesn't actually DO anything yet - it's just parsing arguments. So, it's not going to be leaving a child process running somewhere, etc... However, I'll DEFINITELY be looking into lsof output sometime soon. – Richard T Jun 06 '22 at 05:25
  • That was the fruits of a search for what could be going on, so it may not be the cause. The symptoms fit, but the diagnosis may not. It's been a long, long time (decade or maybe 2) since I've used NFS, so it's not something I can speak to from recent experience. – David C. Rankin Jun 06 '22 at 06:06
  • First paragraph is a bit unclear. Please clarify. Are you modifying on client or server? Are you executing on client or server? – jhnc Jun 06 '22 at 07:23
  • According to [/usr/bin/perl: bad interpreter: Text file busy](https://stackoverflow.com/q/1384398/4154375), and resources linked from it, the problem occurs because the file is open for writing when you try to execute it. Your plan to use `lsof` to investigate the problem seems like a good one. – pjh Jun 06 '22 at 13:18
  • @DavidC.Rankin Perhaps take a fresh look as I have more data - it happens WITHOUT NFS and I tried the lsof command... – Richard T Jun 06 '22 at 21:24
  • @pjh I'm pretty sure that's not it, but perhaps take a fresh look as I have more data - it happens WITHOUT NFS and I tried the lsof command... – Richard T Jun 06 '22 at 21:25
  • @RichardT, perplexing indeed. Have you tried running the program with `strace` to try to find out what is going wrong at the system call level? – pjh Jun 06 '22 at 21:34
  • Use an editor that uses the create-and-rename pattern when saving changes. Rewrite-in-place is not only bad because it causes this problem; it's _also_ bad because when interrupted it can leave a corrupt file in place of your source code! This is why tools like `rpm` update files by creating a new file under a temporary name in the same directory and then performing an atomic rename to put them in place. – Charles Duffy Jun 06 '22 at 22:37
  • As for `fuse`, you only have any reason to spend time learning about it if your file is under `/run/user/1000/gvfs`; if it's not, the warning has nothing whatsoever to do with the issue at hand. – Charles Duffy Jun 06 '22 at 22:40
  • Anyhow -- _which editor are you using when this happens?_ I'm assuming it's not one of the longstanding UNIX editors like vim or emacs -- those have been written by people who actually know the platform well enough to not make this class of mistake. – Charles Duffy Jun 06 '22 at 22:42
  • ...mind, if it's `/bin/bash` rather than your script that's open for write, that points to a larger issue, and one that your administration staff rather than yourself are arguably the right people to investigate (malware editing the binary to hide a keylogger?) – Charles Duffy Jun 06 '22 at 22:43
  • @CharlesDuffy WOW, Charles, thanks for all the attention to this! Firstly, I AM the "administration staff" - and Sr exec, too. We do Earth Science: Not much budget! (You might check out my profile.) Secondly, of course I'm using vi / vim ! NO WAY I'm going to try and learn a dozen editors and vi is ubiquitous! Thirdly, it's certainly NOT under /run! It's on a plain ole Enterprise class Gazillion Terra disk (though the disk is fairly new - a month, at most. (MAYBE it's got issues?!) ... Thanks again, Charles. In all my decades at this, this is a new one on me! – Richard T Jun 06 '22 at 23:03
  • @RichardT, I'm glad Charles dropped by as you will not get more definite or experienced guidance. I have adminned 3 Linux servers for more than 20 years, and in that time I've not run into an issue like this. It was '01 - '03 the last time I employed NFS and since have simply made shares avaialable via samba/cifs and done what I needed on remote hosts via ssh or sftp. There is something you are using, either editor or NFS related that is causing the issue. Best analogy I can give it appears your edits behave like writes in *"write-back"* mode where changes are cached and written later. – David C. Rankin Jun 06 '22 at 23:04
  • When I run into this flavor of problem, the tool I typically pull out is [sysdig](https://github.com/draios/sysdig); it lets you build a full recording of everything going on on your computer that you can parse and query later, so you can say "at the time when this error message was generated, which processes had this file open for write?" -- not just a point-in-time `lsof`, but a time-traveling retrospective one with `strace` functionality built-in too. – Charles Duffy Jun 07 '22 at 02:17
  • It's worth noting, btw, that it doesn't need to be bash itself that's in the middle of being written -- any library it links in would count too, f/e. Of course, those libraries _should_ all be handled by a package manager like `rpm`, and those tools properly do the tmpfile-and-rename dance to avoid causing this problem, but if you've got something like a `LD_PRELOAD` environment variable pointing at extra libraries, that can bring them into play. (Really, _any_ location pointed to by an environment variable whose name starts with `LD` should be treated with suspicion). – Charles Duffy Jun 07 '22 at 02:20
  • BTW, any chance you might have interesting things going on in `/proc/mounts` -- unionfs filesystems or similar? (The `/run/user/1000/gvfs` filesystem is unlikely to be at fault, but if you have a _different_ userspace filesystem in play elsewhere, I might need to eat my words on the "don't need to worry about fuse" bit). – Charles Duffy Jun 07 '22 at 02:25
  • (re: @DavidC.Rankin's observation -- a writeback cache _inside the OS kernel_ shouldn't have this problem because it's at the block layer, below the VFS layer where this matters; but a writeback cache happening _inside a userspace filesystem_ could look like this, which is part of why I'm wondering if there might be a _different_ fuse mount active). – Charles Duffy Jun 07 '22 at 14:55
  • @DavidC.Rankin Thanks for the comments. I'm sure it's "something I'm using", but it's NOT "something I'm doing" such as the things you propose, at least not directly. But perhaps there's something to this new disk I've got and a different "write-back" behavior - see the following note. – Richard T Jun 07 '22 at 21:45
  • @CharlesDuffy Thanks again. A glance at sysdig suggests it's for those using "containers", but I'm not - don't need 'em. I believe in the KISS principal whenever practiceable. Only rpm, yum, and dnf to install, and no LD evs except LD_LIBRARY_PATH needed, apparently, by JAVA, and it's set to the current OpenJDK stuff. Possible problem? As for mounts, only standard xfs, ext4, one vfat for /boot/efi, one swap, and all the rest is whatever magic the brains that invented Fedora Core, and then Fedora Server came up with - e.g. tmpfs. - none of which do I use directly. No "userspace filesystems." – Richard T Jun 07 '22 at 21:46
  • @CharlesDuffy Duffy That said, "I don't know the first thing about a fuse mount." However, your comment prompts me to add I DO have a brand new - age measured in, oh, a couple of weeks at most - 18T Seagate Enterprise class hard drive that's directly involved here. It JUST got in service and in fact, I'm still trying to populate it with content "as we speak." I've heard about this "leaving" stuff (I think it's called) that tries to get more disk space out of thin air (in my view), but I understood those were sold as NAS drives - SURE HOPE that's NOT what they sold me as I don't want that tech. – Richard T Jun 07 '22 at 21:48
  • If it's still an ext3/ext4/xfs/btrfs/otherwise-conventional filesystem on those drives, shouldn't be a problem at all; if they're configured the normal way any implementation details for the drives are all block-layer, whereas whatever's causing this bug is VFS-layer or above. – Charles Duffy Jun 07 '22 at 22:25
  • No, sysdig isn't only for container users; they've been emphasizing that use case in their marketing, but it works perfectly well for folks who aren't using containers. – Charles Duffy Jun 07 '22 at 22:26
  • ...so, about that `LD_LIBRARY_PATH` set for the sake of your JVM -- are the directories named in it all read-only? – Charles Duffy Jun 07 '22 at 22:29

0 Answers0