10

I would like to read through a file and capitalize the first letters in a string using Python, but some of the strings may contain numbers first. Specifically the file might look like this:

"hello world"
"11hello world"
"66645world hello"

I would like this to be:

"Hello world"
"11Hello world"
"66645World hello"

I have tried the following, but this only capitalizes if the letter is in the first position.

with open('input.txt') as input, open("output.txt", "a") as output:
    for line in input:
        output.write(line[0:1].upper()+line[1:-1].lower()+"\n")

Any suggestions? :-)

jpp
  • 159,742
  • 34
  • 281
  • 339
Dino
  • 103
  • 1
  • 5

15 Answers15

6

Using regular expressions:

for line in output:
    m = re.search('[a-zA-Z]', line);
    if m is not None:
        index = m.start()
        output.write(line[0:index] + line[index].upper() + line[index + 1:])
jacob
  • 4,656
  • 1
  • 23
  • 32
  • Issue with using [a-z] only would be that it can match alphabet which might not be the first alphabet in the string (if the first alphabet is already in upper case). – Ankit Jaiswal Nov 01 '18 at 10:38
  • @AnkitJaiswal Yeah I wasn't being super formal, but there's a lot of error checking that can go into this answer to make it more robust. – jacob Nov 01 '18 at 10:41
3

You can write a function with a for loop:

x = "hello world"
y = "11hello world"
z = "66645world hello"

def capper(mystr):
    for idx, i in enumerate(mystr):
        if not i.isdigit():  # or if i.isalpha()
            return ''.join(mystr[:idx] + mystr[idx:].capitalize())
    return mystr

print(list(map(capper, (x, y, z))))

['Hello world', '11Hello world', '66645World hello']
jpp
  • 159,742
  • 34
  • 281
  • 339
  • 1
    This would work if there is no case where the first alphabet comes after a space. – Ankit Jaiswal Nov 01 '18 at 10:44
  • @AnkitJaiswal, Yup, this doesn't require any splitting. So it should work provided `str.isdigit` is acceptable for OP's user case. – jpp Nov 01 '18 at 10:45
  • Wouldn't it be better to reverse the condition and use `string.isalpha()` instead? – Ankit Jaiswal Nov 01 '18 at 10:47
  • @AnkitJaiswal, Not sure what's more efficient. In *most* cases, though not all, they'll be equivalent. Can you give an example where it wouldn't work? – jpp Nov 01 '18 at 10:57
3

You can use regular expression to find the position of the first alphabet and then use upper() on that index to capitalize that character. Something like this should work:

import re

s =  "66645hello world"
m = re.search(r'[a-zA-Z]', s)
index = m.start()
Ankit Jaiswal
  • 22,859
  • 5
  • 41
  • 64
2

How about this?

import re

text = "1234hello"
index = re.search("[a-zA-Z]", text).start()
text_list = list(text)
text_list[index] = text_list[index].upper()

''.join(text_list)

The result is: 1234Hello

Magd Kudama
  • 3,229
  • 2
  • 21
  • 25
1

May be worth trying ...

>>> s = '11hello World'
>>> for i, c in enumerate(s):
...     if not c.isdigit():
...         break
... 
>>> s[:i] + s[i:].capitalize()
'11Hello world'
Karn Kumar
  • 8,518
  • 3
  • 27
  • 53
1

You can find the first alpha character and capitalize it like this:

with open("input.txt") as in_file, open("output.txt", "w") as out_file:
    for line in in_file:
        pos = next((i for i, e in enumerate(line) if e.isalpha()), 0)
        line = line[:pos] + line[pos].upper() + line[pos + 1:]
        out_file.write(line)

Which Outputs:

Hello world
11Hello world
66645World hello
RoadRunner
  • 25,803
  • 6
  • 42
  • 75
0

Like this, for example:

import re

re_numstart = re.compile(r'^([0-9]*)(.*)')

def capfirst(s):
    ma = re_numstart.match(s)
    return ma.group(1) + ma.group(2).capitalize()
musbur
  • 567
  • 4
  • 16
0

try this:

with open('input.txt') as input, open("output.txt", "a") as output:
for line in input:
    t_line = ""
    for c in line:
        if c.isalpha():
            t_line += c.capitalize()
            t_line += line[line.index(c)+1:]
            break
        else:
            t_line += c
    output.write(t_line)

Execution result:

Hello world
11Hello world
66645World hello
0

There is probably a one-line REGEX approach, but using title() should also work:

def capitalise_first_letter(s):
    spl = s.split()
    return spl[0].title() + ' ' + ' '.join(spl[1:])

s = ['123hello world',
"hello world",
"11hello world",
"66645world hello"]


for i in s:
    print(capitalise_first_letter(i))

Producing:

Hello world
11Hello world
66645World hello
SqrtPi
  • 121
  • 2
  • 11
0

You can use regular expression for that:

import re

line = "66645world hello"

regex = re.compile(r'\D')
tofind = regex.search(line)
pos = line.find(tofind.group(0))+1

line = line[0:pos].upper()+line[pos:-pos].lower()+"\n"

print(line)

output: 66645World

Sharku
  • 1,052
  • 1
  • 11
  • 24
0

Okay, there is already a lot of answers, that should work.

I find them overly complicated or complex though...

Here is a simpler solution:

for s in ("hello world", "11hello world", "66645world hello"):
    first_letter = next(c for c in s if not c.isdigit())
    print(s.replace(first_letter, first_letter.upper(), 1))
Sebastian Loehner
  • 1,302
  • 7
  • 5
0

The title() method will capitalize the first alpha character of the string, and ignore the digits before it. It also works well for non-ASCII characters, contrary to the regex methods using [a-zA-Z].

From the doc:

str.title()

Return a titlecased version of the string where words start with an uppercase character and the remaining characters are lowercase. [...] The algorithm uses a simple language-independent definition of a word as groups of consecutive letters. The definition works in many contexts but it means that apostrophes in contractions and possessives form word boundaries, which may not be the desired result:

We can take advantage of it this way:

def my_capitalize(s):
    first, rest = s.split(maxsplit=1)
    split_on_quote = first.split("'", maxsplit=1)
    split_on_quote[0] = split_on_quote[0].title()
    first = "'".join(split_on_quote)

    return first + ' ' + rest

A few tests:

tests = ["hello world", "11hello world", "66645world hello", "123ça marche!", "234i'm good"]
for s in tests:
    print(my_capitalize(s))

# Hello world
# 11Hello world
# 66645World hello
# 123Ça marche!  # The non-ASCII ç was turned to uppercase
# 234I'm good    # Words containing a quote are treated properly
Thierry Lathuille
  • 23,663
  • 10
  • 44
  • 50
0

With re.sub and repl as a function:

If repl is a function, it is called for every non-overlapping occurrence of pattern. The function takes a single match object argument, and returns the replacement string.

def capitalize(m):
    return m.group(1) + m.group(2).upper() + m.group(3)

lines = ["hello world", "11hello world", "66645world hello"]
for line in lines:
    print re.sub(r'(\d*)(\D)(.*)', capitalize, line)

Output:

Hello world
11Hello world
66645World hello
ndpu
  • 22,225
  • 6
  • 54
  • 69
0

Using isdigit() and title() for strings:

s = ['123hello world', "hello world", "11hello world", "66645world hello"]
print [each if each[0].isdigit() else each.title() for each in s ]


# ['123hello world', 'Hello World', '11hello world', '66645world hello']                                                                          
Venfah Nazir
  • 320
  • 2
  • 6
-1

If you want to convert the strings starting with a character but not to capitalize the characters after a digit, you can try this piece of code:

def solve(s):
    str1 =""
    for i in s.split(' '):
        str1=str1+str(i.capitalize()+' ') #capitalizes the first character of the string
    return str1

>>solve('hello 5g')
>>Hello 5g
Idris
  • 29
  • 1
  • 6
  • 1
    This won't work for "123adf asdf". Expected answer is "123Adf Asdf". – SRIDHARAN May 22 '21 at 09:59
  • @SRIDHARAN didn't you see what I wrote before the code? I wanted to achieve exactly what you've just mentioned. I wanted to capitalize each word THAT STARTS WITH A CHARACTER but keep others lowercase if they do not start with a CHARACTER. – Idris May 22 '21 at 13:37
  • 1
    but the question asked above didn't need that. Please see the examples posted above. – SRIDHARAN May 22 '21 at 14:03