In Output of `man` does not match an apparently-identical string literal, I solved the problem by doing a hexdump() of the string, copying the hexdump output to my code as a string literal, and doing a comparison against that. But what if I'd like to match all lines that only contain capital letters (I am trying to extract all the headers of the man page, not just "NAME", but also "SYNOPSIS", "DESCRIPTION"...)
The following do not work. They didn't filter out the headings I wished to get:
$matches = array();
$ans = preg_match("/^[A-Z]+$/u", $text, $matches);
// then filter out all the lines where $ans = 1;
//OR:
$matches = array();
$ans = preg_match("/\s/u", $text, $matches);
// then filter out all the lines where $ans = 0, since headers do not have whitespace;
How do I do this? Should I try to convert the strings in each line into ASCII and/or UTF-8 first, then try to match? But I tried this, and it didn't work too:
$text = iconv(mb_detect_encoding($text, mb_detect_order(), true), "ASCII", $text);
// and then use the filtering code given above
What should I do?
(Also, what encoding could these strings be possibly in? And why is the man
output in such an encoding?)