0

I am trying to extract a set of alpha numeric characters from a text file.

below would be some lines in the file. I want to extract the '@' as well as anything that follows.

im trying to pull @bob from a file. this is a @line in the @file @bob is a wierdo

the below code is what I have so far.

def getAllPeople(fileName):
    #give empty list
    allPeople=[]
    #open TweetsFile.txt
    with open(fileName, 'r') as f1:
        lines=f1.readlines()
        #split all words into strings
        for word in lines:
            char = word.split("@")
            print(char)
    #close the file
    f1.close()

What I am trying to get is; ['@bob','@line','@file', '@bob']

modesitt
  • 7,052
  • 2
  • 34
  • 64
  • 1
    Possible duplicate of [regex for Twitter username](https://stackoverflow.com/questions/2304632/regex-for-twitter-username) – modesitt May 01 '19 at 21:27
  • Could also just split the string by white space, then filter the resulting array of strings to only have the ones containing the @ symbol. I like the regex idea though – Andrew Meservy May 01 '19 at 21:28
  • I thought of that but the .readlines puts it in a list, and you cant .split a list – Russ Schneider May 01 '19 at 21:33
  • just `.read()` @Russ? or `.read().replace('\n', '')` or `' '.join(f.readlines())`, etc... – modesitt May 01 '19 at 21:33

1 Answers1

1

If you do not want to use re, take Andrew's suggestion

mentions = list(filter(lambda x: x.startswith('@'), tweet.split()))

otherwise, see the marked duplicate.


mentions = [w for w in tweet.split() if w.startswith('@')]

since you apparently can not use filter or lambda.

modesitt
  • 7,052
  • 2
  • 34
  • 64