-3

For example: "How to Connect Serial ATA Hard HP 3D Drives"

I want to replace "ATA" to "Ata" by regex in ruby, but not including "3D". mostly English words.

another example: "CD/DVD STORAGE WALLET-80 PCS Nylon" => "CD/DVD Storage Wallet-80 Pcs Nylon"

D-Link DGS-1005G 6PORT CORRECT RESOURCES => D-Link Dgs-1005G 6Port Correct Resources

HP85 C9429A OEM PUT RETURNS BETWEEN => HP85 C9429A OEM Put Returns Between

FOREXAMPLE INDENT76 469-FUNCTIONS, 10x2 LINKS => Forexample Indent76 469-Functions, 10x2 Links

Thanks!

wkang
  • 411
  • 1
  • 6
  • 12
  • This is not a bidirectional transformation. You can upcase everything, but it is hard to reverse it. I guess there is no algorithm to do this reliably. Too many edge cases. – ayckoster Sep 09 '14 at 08:00
  • How to differentiate things that include numbers? Which of them to make lowercase? For instance you're lowercasing `INDENT76` and `6PORT` but not `1005G` nor `HP85` nor `C9429A` - why? – Lucas Trzesniewski Sep 09 '14 at 08:10
  • I think it's a brand -- common abbreviations, so it will not lowercase, anyway I will add a list to use programming language to skip it. Numbers with 3 and above chars will be lowercase. I think I need a regex to check 2 words or above are uppercase words, and the I will capitalize using programming language. – wkang Sep 09 '14 at 08:17

1 Answers1

1

You have to define what punctuation you consider as a word-breaking character. For instance, I can deduce from your example that you don't want to break words on / (because of CD/DVD) but you do want to break them on - (because of WALLET-80).

Such a regex would be:

(?<=$|[-\s])\p{Lu}+(?=$|[-\s])

Demo: http://regex101.com/r/nS7xB0/1

Add your own word-breaking characters to the [-\s] brackets.


EDIT: Ok, following your feedback, here's another regex for you:

\b(?=(?:\w*?\p{Lu}){3})\w+\b

This one will match any letter/digit combination containing at least 3 uppercase letters.

Demo: http://regex101.com/r/nS7xB0/2

Lucas Trzesniewski
  • 50,214
  • 11
  • 107
  • 158
  • Actually the word-breaking isn't the main, sometimes there is no word-breaking between words. I only want to find out the uppercase words except daily uppercase abbr and uppercase including numbers. – wkang Sep 09 '14 at 07:45
  • Well, then you need to provide a list of the abbreviations you want to skip. What do you mean by no word-breaking? Please edit your question to provide more examples. – Lucas Trzesniewski Sep 09 '14 at 07:47
  • Thanks, updated. For the abbreviations I will add a array to skip. – wkang Sep 09 '14 at 07:54