Shell Script - list files, read files and write data to new file

Question

I have a special question to shell scripting.
Simple scripting is no Problem for me but I am new on this and want to make me a simple database file.

So, what I want to do is:

- Search for filetypes (i.e. .nfo) <-- should be no problem :)
- read inside of each found file and use some strings inside
- these string of each file should be written in a new file. Each found file informations

should be one row in new file

I hope I explained my "project" good.

My problem is now, to understand how I can tell the script it has to search for files and then use each of this files to read in it and use some information in it to write this in a new file.

I will explain a bit better.
I am searching for files and that gives me back:

file1.nfo
file2.nfo
file3.nfo

Ok now in each of that file I need the information between 2 lines. i.e.
file1.nfo:

<user>test1</user>

file2.nfo:

<user>test2</user>

so in the new file there should now be:

file1.nfo:user1
file2.nfo:user2

OK so:

find -name *.nfo  > /test/database.txt

is printing out the list of files. and

sed -n '/<user*/,/<\/user>/p' file1.nfo

gives me back the complete file and not only the information between <user> and </user>

I try to go on step by step and I am reading a lot but it seems to be very difficult.

What am I doing wrong and what should be the best way to list all files, and write the files and the content between two strings to a file?

EDIT-NEW:

Ok here is an update for more informations. I learned now a lot and searched the web for my problems. I can find a lot of informations but i don´t know how to put them together so that i can use it.

Working now with awk is that i get back filename and the String.

Here now the complete Informations (i thought i can go on by myself with a bit of help but i can´t :( )

Here is an example of: /test/file1.nfo

<string1>STRING 1</string1>
<string2>STRING 2</string2>
<string3>STRING 3</string3>
<string4>STRING 4</string4>
<personal informations>
<hobby>Baseball</hobby>
<hobby>Baskeball</hobby>
</personal informations>

Here an example of /test/file2.nof

<string1>STRING 1</string1>
<string2>STRING 2</string2>
<string3>STRING 3</string3>
<string4>STRING 4</string4>
<personal informations>
<hobby>Soccer</hobby>
<hobby>Traveling</hobby>
</personal informations>

The File i want to create has to look like this.

STRING 1:::/test/file1.nfo:::Date of file:::STRING 4:::STRING 3:::Baseball, Basketball:::STRING 2
STRING 1:::/test/file2.nfo:::Date of file:::STRING 4:::STRING 3:::Baseball, Basketball:::STRING 2

"Date of file" should be the creation date of the file. So that i can see how old is the file.

So, that´s what i need and it seems not easy.

Thanks a lot.

UPATE ERROR -printf

find: unrecognized: -printf

Usage: find [PATH]... [OPTIONS] [ACTIONS]

Search for files and perform actions on them.
First failed action stops processing of current file.
Defaults: PATH is current directory, action is '-print'

    -follow         Follow symlinks
    -xdev           Don't descend directories on other filesystems
    -maxdepth N     Descend at most N levels. -maxdepth 0 applies
                    actions to command line arguments only
    -mindepth N     Don't act on first N levels
    -depth          Act on directory *after* traversing it

Actions:
    ( ACTIONS )     Group actions for -o / -a
    ! ACT           Invert ACT's success/failure
    ACT1 [-a] ACT2  If ACT1 fails, stop, else do ACT2
    ACT1 -o ACT2    If ACT1 succeeds, stop, else do ACT2
                    Note: -a has higher priority than -o
    -name PATTERN   Match file name (w/o directory name) to PATTERN
    -iname PATTERN  Case insensitive -name
    -path PATTERN   Match path to PATTERN
    -ipath PATTERN  Case insensitive -path
    -regex PATTERN  Match path to regex PATTERN
    -type X         File type is X (one of: f,d,l,b,c,...)
    -perm MASK      At least one mask bit (+MASK), all bits (-MASK),
                    or exactly MASK bits are set in file's mode
    -mtime DAYS     mtime is greater than (+N), less than (-N),
                    or exactly N days in the past
    -mmin MINS      mtime is greater than (+N), less than (-N),
                    or exactly N minutes in the past
    -newer FILE     mtime is more recent than FILE's
    -inum N         File has inode number N
    -user NAME/ID   File is owned by given user
    -group NAME/ID  File is owned by given group
    -size N[bck]    File size is N (c:bytes,k:kbytes,b:512 bytes(def.))
                    +/-N: file size is bigger/smaller than N
    -links N        Number of links is greater than (+N), less than (-N),
                    or exactly N
    -prune          If current file is directory, don't descend into it
If none of the following actions is specified, -print is assumed
    -print          Print file name
    -print0         Print file name, NUL terminated
    -exec CMD ARG ; Run CMD with all instances of {} replaced by
                    file name. Fails if CMD exits with nonzero
    -delete         Delete current file/directory. Turns on -depth option

UNIX simply does not store creation dates of files so you can only get a file's creation date/time if you have some other tool that records that info when the file is created. Is there some other date that you're interested in (e.g. last modification date)? — Ed Morton, Apr 05 '13 at 16:45
Yes modification date is also very good, it should be better than creation. or last copy date or something like that. — Thomas, Apr 05 '13 at 17:32

Thor · Answer 1 · 2013-04-04T13:02:36.933

2

The pat1,pat2 notation of sed is line based. Think of it like this, pat1 sets an enable flag for its commands and pat2 disables the flag. If both pat1 and pat2 are on the same line the flag will be set, and thus in your case print everything following and including the <user> line. See grymoire's sed howto for more.

An alternative to sed, in this case, would be to use a grep that supports look-around assertions, e.g. GNU grep:

find . -type f -name '*.nfo' | xargs grep -oP '(?<=<user>).*(?=</user>)'

If grep doesn't support -P, you can use a combination of grep and sed:

find . -type f -name '*.nfo' | xargs grep -o '<user>.*</user>' | sed 's:</\?user>::g'

Output:

./file1.nfo:test1
./file2.nfo:test2

Note, you should be aware of the issues involved with passing files on to xargs and perhaps use -exec ... instead.

edited Apr 04 '13 at 13:02

answered Apr 04 '13 at 11:57

Thor

45,082
11
119
130

Thank you. But it seems that the function -P is not implemented in my environment here: grep: invalid option -- 'P' – Thomas Apr 04 '13 at 12:41
find . -type f -name '*.nfo' | xargs grep -o '.*' | sed 's:\?user>::g' ---- This is working. i get back the user. now i have to play around with it. Thank you very much – Thomas Apr 04 '13 at 15:08
But it only gives me back the the "test1" and not the filname. i also need some more parts of inside the nfo files. but i will play around and search here. I am sure i will find some solutions they have allready been posted. Thanks – Thomas Apr 04 '13 at 15:11
Update again. It is working with path/filename and content. Thanks again – Thomas Apr 04 '13 at 15:22

CSᵠ · Answer 2 · 2013-04-04T13:34:33.817

It so happens that grep outputs in the format you need and is enough for an one-liner.

By default a grep '' *.nfo will output something like:

file1.nfo:random data  
file1.nfo:<user>test1</user>  
file1.nfo:some more random data  
file2.nfo:not needed  
file2.nfo:<user>test2</user>  
file2.nfo:etc etc

By adding the -P option (Perl RegEx) you can restrict the output to matches only:

grep -P "<user>\w+<\/user>" *.nfo

output:

file1.nfo:<user>test1</user>  
file2.nfo:<user>test2</user>

Now the -o option (only show what matched) saves the day, but we'll need a bit more advanced RegEx since the tags are not needed:

grep -oP "(?<=<user>)\w+(?=<\/user>)" *.nfo > /test/database.txt

output of cat /test/database.txt:

file1.nfo:test1 
file2.nfo:test2

Explained RegEx here: http://regex101.com/r/oU2wQ1

And your whole script just became a single command.

Update:

If you don't have the --perl-regexp option try:

grep -oE "<user>\w+<\/user>" *.nfo|sed 's#</?user>##g' > /test/database.txt

`-P` and `--perl-regexp` are synonyms. You can use them interchangeably. — rioki, Apr 04 '13 at 13:10
`\w` is an abbreviation only understood by some tools. Try the POSIX equivalent `[[:alnum:]_]` — Ed Morton, Apr 04 '13 at 21:49

Ed Morton · Accepted Answer · 2013-04-05T18:31:18.113

1

All you need is:

find -name '*.nfo' | xargs awk -F'[><]' '{print FILENAME,$3}'

If you have more in your file than just what you show in your sample input then this is probably all you need:

... awk -F'[><]' '/<user>/{print FILENAME,$3}' file

Try this (untested):

> outfile
find -name '*.nfo' -printf "%p %Tc\n" |
while IFS= read -r fname tstamp
do
      awk -v tstamp="$tstamp" -F'[><]' -v OFS=":::" '
          { a[$2] = a[$2] sep[$2] $3; sep[$2] = ", " }
          END {
              print a["string1"], FILENAME, tstamp, a["string4"], a["string3"], a["hobby"], a["string2"]
          }
      ' "$fname" >> outfile
done

The above will only work if your file names do not contain spaces. If they can, we'd need to tweak the loop.

Alternative if your find doesn't support -printf (suggestion - seriously consider getting a modern "find"!):

> outfile
find -name '*.nfo' -print |
while IFS= read -r fname
do
      tstamp=$(stat -c"%x" "$fname")
      awk -v tstamp="$tstamp" -F'[><]' -v OFS=":::" '
          { a[$2] = a[$2] sep[$2] $3; sep[$2] = ", " }
          END {
              print a["string1"], FILENAME, tstamp, a["string4"], a["string3"], a["hobby"], a["string2"]
          }
      ' "$fname" >> outfile
done

If you don't have "stat" then google for alternatives to get a timestamp from a file or consider parsing the output of ls -l - it's unreliable but if it's all you've got...

edited Apr 05 '13 at 18:31

answered Apr 04 '13 at 21:24

Ed Morton

188,023
17
78
185

Thank you, that works good, too. But there are more inside the file. I need to select some and print it out to a file. But awk seems to be good, i will have a look at the documentation. – Thomas Apr 05 '13 at 11:58
I tweaked the script based on a guess at what your input files might contain - if you update the sample input in your question to be more representative of your real input, I'll show you how to do it in awk. – Ed Morton Apr 05 '13 at 12:52
OK your edit is now very good. I think i post in my first post now what exact i am searching for. But i allready learned a lot about find, grep and now awk :) – Thomas Apr 05 '13 at 13:12
@Thomas - could you try again? Your question is now completely unclear and confusing. You now show a file with some tags and then other files with a bunch of colons with some of the text from between the tags in no perceivable order, and some PATH/FILENAME text that comes out of nowhere, etc... Just show a couple of sample input files and the output you want to get from them with a description of why that should be the output. – Ed Morton Apr 05 '13 at 13:57
I posted an updated solution. Note that you CANNOT get the creation time of a file from UNIX so I'm using the last modification time instead in my answer. – Ed Morton Apr 05 '13 at 17:01
Wow. I would never come to this solution :) it seems that printf is not possible at my system: find: unrecognized: -printf – Thomas Apr 05 '13 at 17:40
I updated the error in first post. Maybe there is a alternative to -printf? – Thomas Apr 05 '13 at 17:44
The alternative to getting the timestamp from the find is to change the find printf to print and then get the timestamp after the find by running your tool of choice on "$fnmame" within the loop before calling awk. I'll tweak my answer to show you what I mean. – Ed Morton Apr 05 '13 at 17:49
Yes you are right, stat is not possible. But i will try to find a solution here. Without timestamp it works absolute perfekt. Amazing! Thank you very very very much. – Thomas Apr 05 '13 at 17:58
Update. I just installed stat :-) – Thomas Apr 05 '13 at 17:59
Great, now go install find :-). If you're happy with this solution remember to click the check mark next to it so others don't waste their time trying to help you find a solution when you already have it. – Ed Morton Apr 05 '13 at 18:08
OK i will do. "find" is allready installed, but the printf is not woring. but the solution with "stat" is working. Not the timeformat i need but i will find a solution for this. Other Question. If there are points "." in the filename it isn´t working. Message: find: unrecognized: this.is.a.test.3.nfo – Thomas Apr 05 '13 at 18:19
try quoting the RE s it's `'*.nfo'` instead of just `*.nfo`. I'll update my answer. – Ed Morton Apr 05 '13 at 18:30
It could be so simple. :-) Thanks the '' is the solution. I tried "" that was not ok. Basics are now OK, and i will mark the Question. Thank you very much, now i need to search the web for other problems and maybe make a new question. Again, Thank you very much. – Thomas Apr 05 '13 at 18:36

Shell Script - list files, read files and write data to new file

3 Answers3