Questions tagged [text-processing]

Mechanizing the creation or manipulation of electronic text.

Text processing includes basic processing jobs using filter, tokenization or normalization method to process text. This could be a pre-processing step for .

See also:

1959 questions
0
votes
1 answer

Cut off everything after a specific word - Powershell

I'm trying to remove everything after a word in html file with powershell., But its failing need assistance $s = get-content "C:\Users\admin\Desktop\File\sheet003.html" $pos = $s.IndexOf(">0<") $leftPart = $s.Substring(0, $pos) #$rightPart =…
0
votes
1 answer

What is the best Python library to use for generating XML from text/PDF in Python?

I am trying to automate scientific PDF to XML using Python. I would like to know: is there any Python library that can generate XML from text or from PDF documents?
0
votes
1 answer

How do I use a list of dictionaries to update values in the correct dictionary based on data in text file?

I'm trying to process data from a CSV file that checks if specific clinic codes are present in each line at index 2 and then update the corresponding dictionary. I previously had an if-elif chain to handle this, but as we are adding more clinics to…
dissemin8or
  • 126
  • 6
0
votes
1 answer

Remove function and its definition from python code

For example: old file def MyFunction(): some code inside MyFunction() def MyFunction1(): some code inside MyFunction1() if __name__ == "__main__": some code inside New file def MyFunction1(): some code inside…
akshay
  • 1
  • 1
0
votes
1 answer

Text substitution with single apostrophe

We caught a bug report for the following in our GNUmakefile. I'm still not quite clear on the reason for the bug (the report lacks some detail), but I want to ensure the substituion and assignment is valid for GNU Make. SUNCC_VERSION := $(subst…
jww
  • 97,681
  • 90
  • 411
  • 885
0
votes
0 answers

can I use textblob for classification besides sentimental analysis?

I want to extract questions from question paper.I'm labeling each question as q and other sentences as i in dataset. e.g. Why is this sector becoming important in India,q Describe any five public facilities needed for the development of a…
R. T
  • 11
  • 1
  • 7
0
votes
1 answer

Search using substring in python

I have a txt file that has two columns as below - LocationIndex ID P-1-A100A100 X000PY66QL P-1-A100A100 X000RE0RRD P-1-A100A101 X000R39WBL P-1-A100A103 X000LJ7MX1 P-1-A100A104 X000S5QZMH P-1-A100A105 X000MUMNOR P-1-A100A105 …
Ashish K
  • 905
  • 10
  • 27
0
votes
3 answers

cmd Search for files from list of partial filenames then copy to folder

I have a text file list of approx 120,000 filenames. Many of the files on the list are in a folder or it's subfolders, but with slight variations on the filenames. so I want to search using the list of partial filenames and copy the matches to…
Paul Cook
  • 27
  • 2
  • 4
0
votes
1 answer

Find all lines with keyword and extract number

I would like to find line which starts from word: "ERRORS" and exctract number from that line. Part of file: ... [ERROR] No keywords and test cases defined in file File path: libraries_instances.robot TEST SUITES SUMMARY: ERRORS: …
pb.
  • 321
  • 3
  • 4
  • 21
0
votes
1 answer

What is the best approach to extract keywords from different strings in python?

I am looking to extract important keywords from a set of text pieces which are actually text messages received after any transaction. Below is a sample dataset: {"message": "*boi star sandesh* rs 20 has been debited to your account xx2136 from…
Rahul
  • 115
  • 10
0
votes
3 answers

Find a variable length number inside a string from a file using awk?

I have many files on UNIX and wish to fetch number in that file associated with a specified pattern. Most of the file will have a unique pattern in file like below some text abc some text abc some text abc (3 rows) I want to print only number 3…
Minsec
  • 35
  • 6
0
votes
1 answer

Concat two padded senteces and insert to conv1d i tensorflow?

What dimensions are required in tf.nn.conv1d ? and how to perform max pooling afterwards?
a.kh
  • 25
  • 5
0
votes
4 answers

How to transform column with lists of strings into a new column of unique identifiers

Apologies if a similar question has been asked before, I was unable to find one, possibly because of wording of the question. Some current sample data looks like this, where the first column is a list of identifiers (genes) and the second column is…
cteno4
  • 3
  • 1
0
votes
1 answer

json output convert to other format

I have the following output from a script {"emeter":{"get_realtime":{"current":0.501730,"voltage":240.819788,"power":70.455025,"total":1.798000,"err_code":0}}} I need to convert it to this format for prometheus exporter collector: current…
anarchist
  • 383
  • 2
  • 5
  • 18
0
votes
1 answer

Fix line wraps in plaintext tables with Unix command-line tools

I'm trying to process a tab-separated table in which some of the cells have line-wraps. The tables were extracted from PDF tables automatically and look like this: 1 UNITED STATES OF 3797 AMERICA 2 CANADA 3855 3 ISLAMIC REPUBLIC …
Connor Harris
  • 421
  • 5
  • 14