10

EDIT: I totally forgot about this thread. It turns out I had a bad hard disk. We had to redeploy this server for other needs so I finally got around to replacing the one bad disk and we're back in business.

For a few weeks now I couldn't figure out why I wasn't able to delete this one particular file. As root I can, but my shell script runs as a different user. So I go run ls -la and it's not there. However, if I call it as a parameter, it shows up! Sure enough, the owner is root, hence I'm not able to delete.

Notice, 6535 is missing ...

[root@server]# ls -la 653*
-rw-rw-r--  1 svn svn  24002 Mar 26 01:00 653
-rw-rw-r--  1 svn svn   7114 Mar 26 01:01 6530
-rw-rw-r--  1 svn svn   8653 Mar 26 01:01 6531
-rw-rw-r--  1 svn svn   6836 Mar 26 01:01 6532
-rw-rw-r--  1 svn svn   3308 Mar 26 01:01 6533
-rw-rw-r--  1 svn svn   3918 Mar 26 01:01 6534
-rw-rw-r--  1 svn svn   3237 Mar 26 01:01 6536
-rw-rw-r--  1 svn svn   3195 Mar 26 01:01 6537
-rw-rw-r--  1 svn svn  27725 Mar 26 01:01 6538
-rw-rw-r--  1 svn svn 263473 Mar 26 01:01 6539

Now it shows up if you call it directly.

[root@server]# ls -la 6535
-rw-rw-r--  1 root root 3486 Mar 26 01:01 6535

Here's something interesting. So I caught this issue because in my shell script, it would fail to delete because 6535 is owned by root. The file actually shows up after I run "rm -rf ." I tried it earlier and it failed to remove the directory since it told me the directory isn't empty. I went in and looked and sure enough, file "6535" finally shows up. No idea why it's doing this.

strace says the following

#strace ls -la 653* 2>&1 | grep ^open

open("/etc/ld.so.cache", O_RDONLY)      = 3
open("/lib64/tls/librt.so.1", O_RDONLY) = 3
open("/lib64/libacl.so.1", O_RDONLY)    = 3
open("/lib64/libselinux.so.1", O_RDONLY) = 3
open("/lib64/tls/libc.so.6", O_RDONLY)  = 3
open("/lib64/tls/libpthread.so.0", O_RDONLY) = 3
open("/lib64/libattr.so.1", O_RDONLY)   = 3
open("/etc/selinux/config", O_RDONLY)   = 3
open("/proc/mounts", O_RDONLY)          = 3
open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
open("/proc/filesystems", O_RDONLY)     = 3
open("/usr/share/locale/locale.alias", O_RDONLY) = 3
open("/usr/share/locale/en_US.UTF-8/LC_TIME/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US.utf8/LC_TIME/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US/LC_TIME/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en.UTF-8/LC_TIME/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en.utf8/LC_TIME/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en/LC_TIME/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/etc/nsswitch.conf", O_RDONLY)    = 3
open("/etc/ld.so.cache", O_RDONLY)      = 3
open("/lib64/libnss_files.so.2", O_RDONLY) = 3
open("/etc/passwd", O_RDONLY)           = 3
open("/etc/group", O_RDONLY)            = 3
open("/etc/mtab", O_RDONLY)             = 3
open("/proc/meminfo", O_RDONLY)         = 3
open("/etc/localtime", O_RDONLY)        = 3
sdot257
  • 3,059
  • 5
  • 30
  • 39

7 Answers7

7

That's a bit worrysome. I'd verify that your ls file wasn't modified by comparing to a known good file. You could use your distribution's package tools to verify the file on an isolated system.

Warner
  • 23,756
  • 2
  • 59
  • 69
6

Sometimes filenames get odd characters in them such as cursor movement sequences. Try this to make sure:

ls -lq

It should show question marks instead of control characters (it's probably the default, but it might not be).

This partially demonstrates the type of problem that may be present:

touch A C
touch B$(tput cuu1)$'\r'
ls -l
ls -lq
ls -l --show-control-chars    # for systems that have that option and default to -q

I would also try:

type -a ls
alias ls
declare -f ls
md5sum /bin/ls    # compare to a known-good identical system

to see if an alias or function is defined or to see if a binary is in an odd place or has been modified.

Dennis Williamson
  • 62,149
  • 16
  • 116
  • 151
  • 1
    +1 good insight. It's important to note that if `ls` were modified, the `md5sum` on the system could have potentially been modified too. You need a known sane environment to verify on to reach a definitive conclusion. – Warner Mar 26 '10 at 18:47
  • I've found that even if md5 has been altered to produce bogus results if you do things like 'md5 file' you can still get good results (if the md5 program works at all) by doing something like bzip2 < file | md5 and compare *that* to the same command elsewhere. – chris Mar 28 '10 at 13:02
3

You may want to fsck that volume.

Florin Andrei
  • 1,208
  • 1
  • 12
  • 18
2

I usually do something like this if I believe 'ls' has been modified...

python -c "import os; print os.listdir('.')"

Of course Python, the C Library, the kernel, or the file system could also be modified, but usually it's just the shell utils.

McJeff
  • 2,039
  • 13
  • 11
  • 2
    Or, you could use the shell's filename expansion to read the directory -- echo * ( and if you want everything, echo * .* ) – chris Mar 26 '10 at 19:36
  • `*.*` is only going to show you files that have character(s) followed by a dot followed by character(s). This is definitely not everything on *nix system. I'm not sure echo will show you everything in one command, I was able to do it `echo * && echo .*` – einstiien Mar 26 '10 at 19:47
  • 4
    If you look closely, it is * (space) .* , not \*.* Punctuation is not the strong suit of this commenting system... And, echo is perfectly happy to expand as many expressions separated by the "$IFS" as you care to feed it. The boolean && is not necessary, or really even make much sense, because && is a boolean *and* and will always work because the echo command is always successful. – chris Mar 26 '10 at 20:12
  • @chris: my bad, really hard to see that. – einstiien Mar 26 '10 at 22:22
2

You can look into exactly what ls is doing by using strace, and that may tell you why it is avoiding showing that filename.

strace ls -la 653* 2>&1 | less

look that through that and see what's going on.

strace ls -la 653* 2>&1 | grep ^open

The output will look like this:

open("/etc/ld.so.cache", O_RDONLY)      = 3
open("/lib/librt.so.1", O_RDONLY)       = 3
open("/lib/libacl.so.1", O_RDONLY)      = 3
open("/lib/libselinux.so.1", O_RDONLY)  = 3
open("/lib/libc.so.6", O_RDONLY)        = 3
open("/lib/libpthread.so.0", O_RDONLY)  = 3
open("/lib/libattr.so.1", O_RDONLY)     = 3
open("/lib/libdl.so.2", O_RDONLY)       = 3
open("/lib/libsepol.so.1", O_RDONLY)    = 3
open("/etc/selinux/config", O_RDONLY|O_LARGEFILE) = 3
open("/proc/mounts", O_RDONLY|O_LARGEFILE) = 3
open("/selinux/mls", O_RDONLY|O_LARGEFILE) = 3

and if you see something like

open("/var/tmp/.../H@ckl1st", O_RDONLY) = 3

be careful, you've been 0wned...

This isn't a conclusive test, but it is a good indicator...

(if you're using solaris or other OSs, you may need to use truss, or some other similar utility instead of strace)

(if you're using a csh/tcsh derived shell, you'll likely need different redirection statements)

chris
  • 11,944
  • 6
  • 42
  • 51
  • I like this. The `strace` utility really is a swiss army knife. You get right to the system call level and bypass a whole pile of arbitrary complication. It's one of the first things any system admin. worth a dime ought to dump on a newly installed machine. – McJeff Mar 26 '10 at 20:30
  • Yeah -- the two most valuable tools for a systems administrator are truss / strace and tcpdump. With these, you can *always* look under the covers to see wtf is going on when something is or isn't behaving the way you expect. – chris Mar 26 '10 at 20:52
2

Quick update, we had to replace the server for other reasons. It was the filesystem. All is well now!!! Thank you everyone.

sdot257
  • 3,059
  • 5
  • 30
  • 39
0

The hack theory is interesting, but I have an alternative theory. Unix file deletion semantics will keep the file around until all processes have closed open file handles pointing at it. Perhaps someone has paused an SVN checkout / commit, or a server thread hung up. If restarting the SVN process (or Apache) solves your problem, this is where I'd place the blame.

Perhaps you can identify the process still using this file with lsof | grep 6535?

jldugger
  • 14,342
  • 20
  • 77
  • 129
  • yea lsof didn't show anything. The interesting thing is, 6535 is also "missing" in the source. My script does a hotcopy of the original repo to another directory. That's when I ran into issues with not being able to delete this particular file for some reason. – sdot257 Mar 26 '10 at 20:21
  • A deleted but open file won't keep you from deleting the containing directory because once that directory entry is deleted, the directory entry won't exist so the directory *will* be empty. You won't get the space back from the file until the process that has it open is killed, but the directory that had that file can now be deleted. – chris Mar 26 '10 at 20:24
  • Your alternate theory is interesting. The removal, if successful, would instantly remove the hard link. The inode would likely still contain the data and some file handles may have it cached in memory, but I do not believe this scenario could explain the described behavior. Or, what chris said, heh. – Warner Mar 26 '10 at 21:03