-3

How do you quickly extract people's names from a text string with a python script?


A General Description

  1. For any person p if two lines L1 and L2 contained person the name person p then line L1 is the same line as line L2

  2. For any two different people p1 and p2 if person p1 has their name on line L1 and p2 has their name on line L2, then line L1 is different than line L2

  3. People's names contain upper-case and/or lowercase letters A, B, C, ... Z and people's names do NOT contain numbers 0, 1, 2, ..., 9


Example Input

1. ALICIA SANZ  92.0%
(2) ANA FIGUEROA 10.0%
[3] ARIADNA MANZANARES 10.1%
[4] BRIANA CORONIL 82.1%
[5] DRÁP THE KLINGON 71.5%
6. ELEN OF THE DAWN 98.3%
7) INMACULADA FRAGA 14.8%

Example Output

Stored in a list or other container type.

Alicia Sanz
Ana Figueroa
Ariadna Manzanares
Briana Coronil
Dráp The Klingon
Elen Of The Dawn
Inmaculada Fraga

This question mostly exists so that people who type somthing like "how do I extract people's names out of a file" can quickly find an answer without having to write code. Presumably, their project is more complicated than getting the names of people out of the file, and we could save them 5 to 30 minutes.

Or, you could create a website like www.extractnames.com and run a bunch of advertisements on the site for money. That is optional, of course.


I posted an answer, but hopefully, there will be better and better answers posted by different people

You could certainly make your code more human-readable than mine.

If you wanted a really highly up-voted answer, then perhaps consider writing a scraper which is able to delete surpurflous words such as the was painting a picture of a house in the sentence Sophia Gutierez was painting a picture of a house.

We just want her name: Sophia Gutierez.

It us up to you, as long as the code is useful.

May the best pony win.


Toothpick Anemone
  • 4,290
  • 2
  • 20
  • 42

1 Answers1

-4

The following python script will extract people's names from a text-string provided that no words apear on each line other than their name and their names are on separate lines

import io # `io` stands for `input output`
import string

def extract_peoples_names(text:str):
    text = str(text)  
    
    # `sorted` puts names in sorted order   
    # `set` removes duplicate names  
    
    # I was careful, but if you are not careful, then names
    # which begin with a capital letter "A" will be 
    # separate from names starting with a lowercase letter "a"
 
    # A
    # B
    # C
    # ...
    # X 
    # Y
    # Z
    # a
    # b
    # c
    # ...
    # x
    # y
    # z

    # By default, capital letters take precedence over lower-case letters 
 
    miscapitalized_names = list(
        sorted(
            set(
                filter(
                        lambda x: len(x) > 0, (
                            name.strip() for name in "".join(
                                filter(
                                    lambda ch: ch in " \r\n" + string.ascii_letters,
                                    text
                                )
                            ).split("\n")
                    )
                )
            ),
            key = lambda name: name.lower()
        )
    )

    capitalized_names = type(miscapitalized_names)(
        " ".join(
            piece[0:1].upper()+piece[1:].lower() for piece in name.split()
        ) for name in miscapitalized_names
    )

   strm = io.StringIO()
   print(*capitalized_names, strm, sep="\n", end="")
   result = strm.getvalue()
   strm.close()
   return result 
Toothpick Anemone
  • 4,290
  • 2
  • 20
  • 42