1

Been trying to extract only accent characters[a particular word] from a multiple text files in a folder. Don't want to remove or convert accent characters to normal characters but print only those characters which are accent in multiple text files and mixed files which has both accent[words] and normal characters. in JAVA

**only to extract all accent specific words. ** after searching and exploring for a while this a link below is a type of one solution, similar regex but doesn't work as required also select null values and normal characters. Regex accented Characters for special field

another solution found for that is ([a-zA-Z]|[à-ü]|[À-Ü]) it selects each letter separately not feasible as it not word specific and also selects both normal and accent.

RONNY
  • 13
  • 3
  • 1
    Please show your code with what you have tried already in a [mre] - Also read the [ask] page for tips on how to improve this questions. Welcome to [so] - Take the [tour] (and earn a badge while at it) – blurfus Feb 17 '23 at 06:21
  • Will definitely work on the way to ask question thanking you for pointing it out would read it clearly and update my question – RONNY Feb 17 '23 at 09:10
  • Note that sometimes accented characters are represented using the base character plus a "combining diacritic" so à might be 'a' and U+0300 (combining grave diacritc). You can use `java.text.Normalizer` to deal with this. – greg-449 Feb 17 '23 at 09:55

1 Answers1

1

If you want to match word that contains the accent letter you need to go with something like:

[a-zA-Zà-üÀ-Ü]*[à-üÀ-Ü][a-zA-Zà-üÀ-Ü]*

explenation:

  • [a-zA-Zà-üÀ-Ü]* - this will match all the accent and not accent letters (so we can have other accent/non-accent letters in our word) - the star * modifier is here to match zero or more letters
  • [à-üÀ-Ü] - this will match exactly one accent letter - to force matching only the words with an accent
dey
  • 3,022
  • 1
  • 15
  • 25
  • Thank you so much for answering, it cleared the doubt and thanks for the good explanation – RONNY Feb 17 '23 at 09:08