Questions about manipulating or examining textual data using common UNIX/Linux utilities.
Questions tagged [unix-text-processing]
32 questions
1
vote
2 answers
Miller - Ignore valid field names when using -N
I'm using miller to process some CSV files like so:
mlr --mmap --csv --skip-comments -N cut -f 2 my.csv
It works well, but some of the CSV files contain field names and some do not, which is why I'm using -N. In the files that have field names,…

T145
- 1,415
- 1
- 13
- 33
1
vote
2 answers
Removing lines based on duplicate first word, ignoring case
I have 1M word vectors in fasttext format (ignoring the first line containing vocab size and dim). Every line is a word followed by 300 numbers, all space separated, ex.
Word 1.00 0.50 -2.30
WORD 0.90 0.40 -2.20
How can I keep the first line a word…

qwr
- 9,525
- 5
- 58
- 102
0
votes
1 answer
sed fails with many -e statments
I have huge (12G, 5.9G 1.1G,57M) files that I need to massage into submission in order to successfully run in MySQL-Shell for import.
I dont have a choice on how the files were created they were zipped up and landed on my desk. So I tried and was…

user3008410
- 644
- 7
- 15
0
votes
2 answers
Remove duplicates ignoring specific columns
I want to remove all duplicates from a file but ignoring the first 2 columns, I mean don't comparing those columns.
This is my example input:
111 06:22 apples, bananas and pears
112 06:28 bananas
113 07:07 apples, bananas and pears
114 07:23 …

Liam McCartney
- 21
- 2
0
votes
0 answers
How can I fix the tag mapping for hyperlinks in UnRTF's HTML conversion?
I have RTFs with links like this {\field{\*\fldinst{HYPERLINK "https://www.stackoverflow.com"}}{\fldrslt my link text}}. I'm using UnRTF to convert them to HTML, but when I run the conversion, all of the links look exactly like this

360zen
- 25
- 6
0
votes
2 answers
Customised redirection of stderr?
I know that I can redirect the standard error output stream by:
$ ./my-C-binary 2>error.log
What if I want to ignore the first two characters of anything from stderr before writing to the file? I've searched but I can't seem to find a way to do so.…

FatArmpit
- 3
- 2
0
votes
1 answer
Sorting output based on row value with Linux bash
I need to print the whole output sorted based on 'Resident Set Size' value.
Process: wccpd
Memory (bytes)
Total Virtual Size 29.5
Resident Set Size 4.0
Process: writeback
Memory (bytes)
Total Virtual Size 0
Resident Set Size 0
Process:…

lyoben
- 9
- 4
0
votes
1 answer
Issue converting github.com/*/raw/* URLs to raw.githubusercontent.com URLS using AWK
Given the following example…

T145
- 1,415
- 1
- 13
- 33
0
votes
4 answers
How to create dynamic IF condition
I have been writing a script which will run in while loop infinite times
If all condition are met then only the script will break and execute another command
My code :
while true
do
# Note : below field will execute some command and generate…

kiric8494
- 195
- 1
- 7
0
votes
2 answers
How to check all multiple keyword should be present in given text or not
I am bit stuck with this how do I check all keyword should exist in my text
If any one keyword is not present then it should return me status : 1 or else 0
keyword='CAP\|BALL\|BAT\|CRICKET'
echo "HE AS CAP AND LOVE TO PLAY BALL BAT , ITS IS CALLED…

kiric8494
- 195
- 1
- 7
0
votes
1 answer
unbalanced parenthesis regex
!pip install emot
from emot.emo_unicode import EMOTICONS_EMO
def convert_emoticons(text):
for emot in EMOTICONS_EMO:
text = re.sub(u'\('+emot+'\)', "_".join(EMOTICONS_EMO[emot].replace(",","").split()), text)
return text
text =…

M J
- 379
- 2
- 8
0
votes
2 answers
Processing text with multiple delims in awk
I have a text which looks like -
Application.||dates:[2022-11-12]|models:[MODEL1]|count:1|ids:2320
Application.||dates:[2022-11-12]|models:[MODEL1]|count:5|ids:2320
I want the number from the count:1 columns so 1 and i wish to store these…

r4bb1t
- 1,033
- 2
- 13
- 36
0
votes
2 answers
How to extract branch name using regex and sed?
How can I extract the branch name from a string using bash? For example, I have the following command:
branch=$(git branch -a --contains $sha)
This may return:
* branch-1.0 (the prefix is always an asterisk)
branch-2.0 remotes/origin/branch-2.0…

John Doe
- 3
- 3
0
votes
1 answer
how can I sort a field form an Endnote Export File format where the Line contains GRAZ in the address as first line?
I have a Endote Export File, looking like this:
%0 Journal Article
%A Abu-Rous, M.
%A Ingolic, E.
%A Schuster, K. C.
%D 2006
%Z Cellulose
Article
CODEN: CELLE
%+ Christian Doppler-Laboratory of Fibre and Textile…

Walter Schrabmair
- 1,251
- 2
- 13
- 26
0
votes
1 answer
how to check if a field in one file does not contain list of values from another file in UNIX
I have two files, one has the data that is transactional value for that column. Suppose currency code and the another file has the valid/expected currency code.
File1…