I'm trying to write a single regular expression to convert all uppercase words to lowercase while excluding uppercase Roman numerals from being converted.
The only way I found was to convert all uppercased words that are followed by a space, comma, or period, as well as hyphenated words into lowercase. Then convert all Roman numerals back to uppercase.
I used this to convert to lowercase:
(\u+[ ,.-])
Then I had to go through and find and replace all suspected Roman numerals.
What is a better way to do this? I tried negative lookahead expressions with no luck but I'm not very strong at writing them.
The sample that I'm testing this on is the U.S. Constitution. Here's a sample of the input:
WE, the PEOPLE of the UNITED STATES, in order to form a more perfect union, establish justice, ensure domestic tranquility, provide for the common defence, promote the general welfare, and secure the blessings of liberty to ourselves and our posterity, do ordain and establish this Constitution for the United States of America.
ARTICLE I.
Sect. 1. ALL legislative powers, herein granted, shall be vested in a Congress of the United > States, which shall consist of a Senate and House of Representatives.
Sect. 2. The House of Representatives shall be composed of Members chosen every second year by all the people of the several States, and the Electors in each State shall have the qualifications requisite for Electors of the most numerous branch of the State Legislature. No person shall be a Representative who shall not have attained to the age of twenty-five years, and been seven years a citizen of the United States, and who shall not, when elected, be an inhabitant of that State in which he shall be chosen.
ARTICLE IV.
ARTICLE V.
ARTICLE VI.