-2

I have a text file with thousands of lines. The last 7 characters on each line are a mix of letters and numbers (eg AAP8945 or GGR6645). I want to save these in a separate file.

Excuse the noob question, but I can't work it out.

Gemma
  • 1
  • 2
    @oguzismail — you'd need `-o` too, wouldn't you? – Jonathan Leffler Mar 10 '20 at 05:30
  • 1
    Welcome to Stack Overflow. SO is a question and answer site for professional and enthusiast programmers. The goal is that you add some code of your own to your question to show at least the research effort you made to solve this yourself. – Cyrus Mar 10 '20 at 05:33
  • 1
    If you have GNU sed then @JonathanLeffler has the right answer because it used the tool you asked for. Do you have the GNU grep or not? If not then you could use sed: `sed -e 's/^.*\(.......\)$/\1/'` – Jerry Jeremiah Mar 10 '20 at 05:35
  • Does this answer your question? [Is there a cleaner way of getting the last N characters of every line?](https://stackoverflow.com/questions/24427009/is-there-a-cleaner-way-of-getting-the-last-n-characters-of-every-line) – kvantour Mar 10 '20 at 12:26

2 Answers2

3

With GNU grep

Assuming you have GNU grep:

grep -o -E '.{7}$' input > output

The -o option means 'output only what matches' (rather than the whole line). This is the key feature which makes it possible to use grep for the job. Without support for -o (or an equivalent option), grep is the wrong tool for the job.

The -E option is for extended regular expressions, and it means that the . (any character) is matched 7 times and then matches the end of line.

Without GNU grep

If you don't have GNU grep (or a compatible grep with the -o option or equivalent), then you can use sed instead (GNU or any other variant):

sed -e 's/.*\(.\{7\}\)$/\1/' input > output

This matches the start of the line (.*) and captures the last 7 characters (\(…\)) of the line; it replaces the whole with the captured part, and prints the result. If your variant of sed has extended regular expressions (usually -E or sometimes -r), then:

sed -E -e 's/.*(.{7})$/\1/' input > output

The difference is in the number of backslashes needed.

Both of those will print any short lines in their entirety. If those should be omitted, use:

sed -n -e 's/.*\(.\{7\}\)$/\1/p' input > output
sed -n -E -e 's/.*(.{7})$/\1/p' input > output
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 1
    Or simple BRE - `grep -o '.......$'` (but who likes to count `'.'`) – David C. Rankin Mar 10 '20 at 05:47
  • One, two, many — or is that too many? That still requires the `-o` option which is not universal (not being part of POSIX [`grep`](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/grep.html), AFAIK). – Jonathan Leffler Mar 10 '20 at 05:52
  • @DavidC.Rankin you can say `grep -o '.\{7\}$'` with BRE. – tshiono Mar 10 '20 at 05:53
  • Wow! Thank you. Learning occurred. I didn't think BRE would allow repetition in that manner. – David C. Rankin Mar 10 '20 at 05:54
  • Again, it depends on the dialect of `grep`, @tshiono. WIth many versions, you can do as you suggest; with others, you can't. (Checking: POSIX [BRE](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html) does required `\(…\)` to work in BRE — it was not always thus, but POSIX 1997 required support for it.) Probably if the `grep` supports `-o`, it supports the `\{7\}` count notation. It all gets tricky when it comes to writing portable code. – Jonathan Leffler Mar 10 '20 at 05:55
  • On doing some further research (in the rationale for POSIX.2 1992), BRE in POSiX was always defined with `\(…\)` support — but with a note that some historic implementations did not include it. Unfortunately for me, I learned the BRE notation before POSIX.2, and the support varied back in the days of yore. These days, it'll be available pretty much everywhere. – Jonathan Leffler Mar 10 '20 at 06:06
1
grep -Eo '.{7}$'

Or without grep:

rev input|cut -c -7|rev >output

The double rev is necessary here because I can not specify a position of the text from the right with cut.

user1934428
  • 19,864
  • 7
  • 42
  • 87