2

I have a large text file containing filenames ending in .txt Some of the rows of the file have unwanted text after the filename extension. I am trying to find a way to search+replace or trim the whole file so that if a row is found with .txt, anything after this is simply removed. Example

C:\Test1.txt

C:\Test2.txtHelloWorld this is my problem

C:\Test3.txt_____Annoying stuff1234 .r

Desired result

C:\Test1.txt

C:\Test2.txt

C:\Test3.txt

I have tried with notepad++, or using batch/powershell, but got close, no cigar.

(Get-Content "D:\checkthese.txt") | 
Foreach-Object {$_ -replace '.txt*', ".txt"} | 
Set-Content "D:\CLEAN.txt"

My thinking here is if I replace anything (Wildcard*) after .txt then I would trim off what I need, but this doesnt work. I think I need to use regular expression, buy have the syntax wrong.

JasonMArcher
  • 14,195
  • 22
  • 56
  • 52
AndyR
  • 31
  • 2
  • 6

1 Answers1

5

Simply change the * to a .*, like so:

(Get-Content "D:\checkthese.txt") | 
Foreach-Object {$_ -replace '\.txt.*', ".txt"} | 
Set-Content "D:\CLEAN.txt"

In regular expressions, * means "0 or more times", and in this case it'd act on the final t of .txt, so .txt* would only match .tx, .txt, .txtt, .txttt, etc...

., however, matches any character. This means, .* matches 0 or more of anything, which is what you want. Because of this, I also escaped the . in .txt, as it otherwise could break on filenames like: alovelytxtfile.txt, which would be trimmed to alovel.txt.

For more information, see:

Sebastian Paaske Tørholm
  • 49,493
  • 11
  • 100
  • 118
  • That works nicely and thanks for the second hint at escaping the "." I would not have noticed that, cheers! – AndyR Jul 10 '11 at 15:18
  • 1
    Let me point out that your example is wrong. If you didn't escape the `.` at the beginning of your regex, it would cause the returned example file name to be `alovel.txt`, not `alovelytxt`. A quibble to an otherwise fine answer. ;-) – Mark Jul 11 '11 at 17:49