-1

Goal is: to remove everything that is not a NAME or ID from a text.

Example:

Paula Abdul @PaulaAbdul Dec 25 18:13:07 +0000 (GMT) via XYZ Web Client
( ... Some junk line to remove ... )
Michael Jackson @MichaelJackson Dec 27 16:03:01 +0000 (GMT) via XYZ Web Client
( ... Other stuff to remove / e.g. an empty line)
George Michael @GeorgeMichael Dec 28 19:23:15 +0000 (GMT) via XYZ Web Client

Goal is to extract Name and ID:

Paula Abdul @PaulaAbdul
Michael Jackson @MichaelJackson
George Michael @GeorgeMichael

What is the best way to about it? My idea is: 1. Select all lines that do not contain "@..." 2. AND select everything after "@..." to the line end

So far i know:

Ignore all Lines with <string>:  ^((?!@<string>).)*$

and how can i combine both searches into one?

i assume: <pattern1>|<pattern2>

I am using https://atom.io Editor for my RegExp search.

Stxle
  • 89
  • 6

1 Answers1

0

Ok, i found the solution after some research and trial and error.

First i found the code how to identify Twitter IDs: regex for Twitter username

@[\w{1,15}]+

This finds any ID that starts with "@" and has between 1-15 characters. "\w" stands for "word character", usually [A-Za-z0-9_]. Notice the inclusion of the underscore and digits.

Through trial and error i found the rest of the pattern:

^[\w. ]*(@[\w{1,15}]+)\b

This finds any number of words including " " and "." as long as there is a "@" + at least one word-character.

EXAMPLE SOURCE:

@PaulaAbdul Dec 25 18:13:07 +0000 (GMT) via XYZ Web Client
( ... Some junk line to remove ... )
Clint. @PaulaAbdul Dec 25 18:13:07 +0000 (GMT) via XYZ Web Client
abcdef.abcd 999 ( ... Some junk line to remove ... )
Paula Abdul @PaulaAbdul Dec 25 18:13:07 +0000 (GMT) via XYZ Web Client
Some Words ( ... Some junk line to remove ... )
Michael M. Jackson @MichaelJackson Dec 27 16:03:01 +0000 (GMT) via XYZ Web Client
( ... Other stuff to remove / e.g. an empty line)
George Michael @GeorgeMichael Dec 28 19:23:15 +0000 (GMT) via XYZ Web Client

RESULT:

@PaulaAbdul
Paula Abdul @PaulaAbdul
Michael M. Jackson @MichaelJackson
George Michael @GeorgeMichael
Community
  • 1
  • 1
Stxle
  • 89
  • 6