0

I wrote an awk command to deduplicate a .csv file. I'm running Ubuntu 20.04. This is the command:

awk -F, ' {key = $2 FS} !seen[key]++' gigs.csv > try.csv

I don't want to have to type it all the time, so I made an alias for it in ~/.bash_aliases as follows:

alias dedupe="awk -F, ' {key = $2 FS} !seen[key]++' gigs.csv > try.csv"

However, when I run dedupe in my terminal, it produces only one line, which is not the same result when I type out the full command. The full command produces the desired results Did I make a mistake with the aliasing? Why does this happen and how can I resolve it?

Here is a sample from the original .csv file:

Tue 30 Aug 08:34:17 AM,Do you use facebook? work remote from home. we are hiring!,https://atlanta.craigslist.org/atl/cpg/d/atlanta-do-you-use-facebook-work-remote/7527729597.html
Mon 29 Aug 03:51:29 PM,Cash for your opinions!,https://atlanta.craigslist.org/atl/cpg/d/atlanta-cash-for-your-opinions/7527517063.html
Mon 29 Aug 01:22:54 PM,Telecommute earn $20 per easy online product test gig w/ free products,https://montgomery.craigslist.org/cpg/d/hope-hull-telecommute-earn-20-per-easy/7527471859.html
Mon 29 Aug 01:53:58 PM,Telecommute earn $20 per easy online product test gig w/ free products,https://atlanta.craigslist.org/atl/cpg/d/smyrna-telecommute-earn-20-per-easy/7527456060.html
Mon 29 Aug 12:50:59 PM,Telecommute earn $20 per easy online product test gig w/ free products,https://bham.craigslist.org/cpg/d/adamsville-telecommute-earn-20-per-easy/7527454527.html
Wed 31 Aug 09:23:41 PM,Looking for a sales development rep,https://bham.craigslist.org/cpg/d/adamsville-looking-for-sales/7528472497.html
Wed 31 Aug 11:21:58 AM,Earn ~$30 | work from home | looking for 'ok google' users | taskverse,https://bham.craigslist.org/cpg/d/harbor-city-earn-30-work-from-home/7528233394.html
Mon 29 Aug 12:50:59 PM,Telecommute earn $20 per easy online product test gig w/ free products,https://bham.craigslist.org/cpg/d/adamsville-telecommute-earn-20-per-easy/7527454527.html
Wed 31 Aug 11:28:56 AM,Earn ~$30 | work from home | looking for 'ok google' users | taskverse,https://tuscaloosa.craigslist.org/cpg/d/harbor-city-earn-30-work-from-home/7528236901.html
Wed 31 Aug 11:27:53 AM,Earn ~$30 | work from home | looking for 'ok google' users | taskverse,https://montgomery.craigslist.org/cpg/d/harbor-city-earn-30-work-from-home/7528236389.html

I

Tendekai Muchenje
  • 440
  • 1
  • 6
  • 20
  • 3
    The single quote in the alias definition is just a literal character, not shell syntax. `$2` is being expanded before the `alias` command sees its argument. Don't use an alias here at all; use a function. – chepner Sep 01 '22 at 14:44
  • *I don't want to have to type it all the time* if you are not dead set at using `alias` for that then consider creating [Executable Script](https://www.gnu.org/software/gawk/manual/html_node/Executable-Scripts.html) – Daweo Sep 01 '22 at 14:59

1 Answers1

3

Define the alias using single quotes rather than double quotes. Nothing is special inside single quotes, so you won't have any unexpected issues with expansions like "... $2 ..." being expanded to the value of the 2nd positional parameter. The only thing is that to include an inner single quote, you need to break the quoting with ' ... '\'' ... ' or ' ... '"'"' ... '

alias dedupe='awk -F, '\'' {key = $2 FS} !seen[key]++'\'' gigs.csv > try.csv'

A function may be preferable in this case:

dedupe () { awk -F, ' {key = $2 FS} !seen[key]++' gigs.csv > try.csv; }
rowboat
  • 401
  • 1
  • 8
  • 5
    There's no reason to use an alias here at all; use the function. – chepner Sep 01 '22 at 14:44
  • This does work, thank you. However, I am still trying to understand why the alias command I wrote produces a different result from running the command in full. Yet yours seems to run just fine. What is the explanation there? – Tendekai Muchenje Sep 01 '22 at 14:49
  • 2
    @TendekaiMuchenje, your alias definition has a `$2` inside a double-quoted context. That double-quoted context makes the `$2` be expanded **immediately**, before the alias is created, so your code becomes `key = FS` with the `$2` completely removed (except in the unlikely case where your interactive shell has positional arguments defined). Whereas rowboat's is entirely in single-quoted context, with a quick switch out to unquoted context to be able to use the backslash+quote syntax before switching back to a single-quoted context. No double-quoted context means no unwanted expansion. – Charles Duffy Sep 01 '22 at 15:07
  • 2
    @rowboat, btw, putting the `>try.csv` on the outside of the function definition instead of before the `;` within the awk command strikes me as an unusual decision. – Charles Duffy Sep 01 '22 at 15:09