Questions tagged [unix-text-processing]

Questions about manipulating or examining textual data using common UNIX/Linux utilities.

32 questions
1
vote
2 answers

Miller - Ignore valid field names when using -N

I'm using miller to process some CSV files like so: mlr --mmap --csv --skip-comments -N cut -f 2 my.csv It works well, but some of the CSV files contain field names and some do not, which is why I'm using -N. In the files that have field names,…
T145
  • 1,415
  • 1
  • 13
  • 33
1
vote
2 answers

Removing lines based on duplicate first word, ignoring case

I have 1M word vectors in fasttext format (ignoring the first line containing vocab size and dim). Every line is a word followed by 300 numbers, all space separated, ex. Word 1.00 0.50 -2.30 WORD 0.90 0.40 -2.20 How can I keep the first line a word…
qwr
  • 9,525
  • 5
  • 58
  • 102
0
votes
1 answer

sed fails with many -e statments

I have huge (12G, 5.9G 1.1G,57M) files that I need to massage into submission in order to successfully run in MySQL-Shell for import. I dont have a choice on how the files were created they were zipped up and landed on my desk. So I tried and was…
user3008410
  • 644
  • 7
  • 15
0
votes
2 answers

Remove duplicates ignoring specific columns

I want to remove all duplicates from a file but ignoring the first 2 columns, I mean don't comparing those columns. This is my example input: 111 06:22 apples, bananas and pears 112 06:28 bananas 113 07:07 apples, bananas and pears 114 07:23 …
0
votes
0 answers

How can I fix the tag mapping for hyperlinks in UnRTF's HTML conversion?

I have RTFs with links like this {\field{\*\fldinst{HYPERLINK "https://www.stackoverflow.com"}}{\fldrslt my link text}}. I'm using UnRTF to convert them to HTML, but when I run the conversion, all of the links look exactly like this
0
votes
2 answers

Customised redirection of stderr?

I know that I can redirect the standard error output stream by: $ ./my-C-binary 2>error.log What if I want to ignore the first two characters of anything from stderr before writing to the file? I've searched but I can't seem to find a way to do so.…
FatArmpit
  • 3
  • 2
0
votes
1 answer

Sorting output based on row value with Linux bash

I need to print the whole output sorted based on 'Resident Set Size' value. Process: wccpd Memory (bytes) Total Virtual Size 29.5 Resident Set Size 4.0 Process: writeback Memory (bytes) Total Virtual Size 0 Resident Set Size 0 Process:…
lyoben
  • 9
  • 4
0
votes
1 answer
0
votes
4 answers

How to create dynamic IF condition

I have been writing a script which will run in while loop infinite times If all condition are met then only the script will break and execute another command My code : while true do # Note : below field will execute some command and generate…
kiric8494
  • 195
  • 1
  • 7
0
votes
2 answers

How to check all multiple keyword should be present in given text or not

I am bit stuck with this how do I check all keyword should exist in my text If any one keyword is not present then it should return me status : 1 or else 0 keyword='CAP\|BALL\|BAT\|CRICKET' echo "HE AS CAP AND LOVE TO PLAY BALL BAT , ITS IS CALLED…
kiric8494
  • 195
  • 1
  • 7
0
votes
1 answer

unbalanced parenthesis regex

!pip install emot from emot.emo_unicode import EMOTICONS_EMO def convert_emoticons(text): for emot in EMOTICONS_EMO: text = re.sub(u'\('+emot+'\)', "_".join(EMOTICONS_EMO[emot].replace(",","").split()), text) return text text =…
M J
  • 379
  • 2
  • 8
0
votes
2 answers

Processing text with multiple delims in awk

I have a text which looks like - Application.||dates:[2022-11-12]|models:[MODEL1]|count:1|ids:2320 Application.||dates:[2022-11-12]|models:[MODEL1]|count:5|ids:2320 I want the number from the count:1 columns so 1 and i wish to store these…
r4bb1t
  • 1,033
  • 2
  • 13
  • 36
0
votes
2 answers

How to extract branch name using regex and sed?

How can I extract the branch name from a string using bash? For example, I have the following command: branch=$(git branch -a --contains $sha) This may return: * branch-1.0 (the prefix is always an asterisk) branch-2.0 remotes/origin/branch-2.0…
John Doe
  • 3
  • 3
0
votes
1 answer

how can I sort a field form an Endnote Export File format where the Line contains GRAZ in the address as first line?

I have a Endote Export File, looking like this: %0 Journal Article %A Abu-Rous, M. %A Ingolic, E. %A Schuster, K. C. %D 2006 %Z Cellulose Article CODEN: CELLE %+ Christian Doppler-Laboratory of Fibre and Textile…
Walter Schrabmair
  • 1,251
  • 2
  • 13
  • 26
0
votes
1 answer

how to check if a field in one file does not contain list of values from another file in UNIX

I have two files, one has the data that is transactional value for that column. Suppose currency code and the another file has the valid/expected currency code. File1…