The design of this is not meeting expectations:
# Explanation:
# Read split of splits until index of indexes reached. Apply underscore to split token with no space if split followed by another index
# Therefore line output should be: '7 Waitohu Road _York_Bay Co Manager _York_Bay Asst Co Dir _Central_Lower_Hutt General Hand _Wainuiomata School Caretaker'
# A list of suburb words and there index position in line
uniqueList = ['York', 3, 'Bay', 4, 'York', 7, 'Bay', 8, 'Central', 12, 'Lower', 13, 'Hutt', 14, 'Wainuiomata', 17]
# Using indexes = uniqueList[1::2] to reduce uniqueList down to just indexes
indexes = [3, 4, 7, 8, 12, 13, 14, 17]
# The line example
line = '7 Waitohu Road York Bay Co Manager York Bay Asst Co Dir Central Lower Hutt General Hand Wainuiomata School Caretaker'
# Split the line into tokens for counting indexes
splits = line.split(' ')
# Read index
for i in range(len(indexes)):
check = indexes[i]
for j in range(len(splits)):
if j == check and (i + 1 < len(indexes)):
# Determine if next index incremental
next = indexes[i + 1]
if 1 == next - check:
splits[j] = '_' + splits[j] + '_' + splits[j + 1]
else:
if j == check:
splits[j] = '_' + splits[j]
# Results here
newLine = ' '.join(splits)
print(newLine)
Output:
7 Waitohu Road _York_Bay Bay Co Manager _York_Bay Bay Asst Co Dir _Central_Lower _Lower_Hutt Hutt General Hand _Wainuiomata School Caretaker
How to:
- Not output/remove doubled up word
Bay
andHutt
- Deal with an additional underscored word to get
_Central_Lower_Hutt