1

How can emojis also be filtered out of string using the following extension?

extension String {
    func removeCharacters() -> String {
        let removeCharacters: CharacterSet = [" ", ";", ",", ":", ".", "!", "%", "$", "&", "#", "*", "^", "@", "(", ")", "/"]
        let filteredString = self.unicodeScalars.filter { !removeCharacters.contains($0) }
        return String(String.UnicodeScalarView(filteredString))
    }
}

I understand isEmojiPresentation == false property can be used, just not sure how to add it to the function. To clarify all emoji should be removed from strings passed through the extension.

pkamb
  • 33,281
  • 23
  • 160
  • 191
John
  • 965
  • 8
  • 16
  • 2
    You should spend some more time reading about how `filter` works – Alexander Aug 19 '21 at 19:09
  • 1
    @Alexander I have already posted what OP needs to make filter return any type that conform to `RangeReplaceableCollection` in this case a String but not sure why he is asking that again https://stackoverflow.com/a/68852194/2303865 – Leo Dabus Aug 19 '21 at 19:11
  • 3
    @LeoDabus It's a pretty clear cut case of BSODD: blind StackOverflow-driven development. – Alexander Aug 19 '21 at 19:15

3 Answers3

4

I wrote this Swift 5 extension that removes emoji characters:

extension String {
    func withoutEmoji() -> String {
        filter { $0.isASCII }
    }
}

You can run this unit test to see if it works:

func test_removingEmoji_shouldReturnStringWithoutEmoji() {
        // Given
        let inputText = "Hell☺️‍‍‍o World❤️"
        
        // When
        let emojilessText = inputText.withoutEmoji()
        
        // Then
        let expectedString = "Hello World"
        XCTAssertEqual(emojilessText, expectedString, "\(String(describing: asData(emojilessText))) should be equal to \(String(describing: asData(expectedString)))")
    }
    
// Helper function that returns the string as Data. 
// Reveals any invisible characters that might have remained in the string. 
// This can help you to understand why two strings might not be considered equal, even if they look the same.
private func asData(_ text: String) -> NSData? {
    text.data(using: .utf8) as? NSData
}

Limitation: As "hotdougsoup.nl" pointed out in the comment, this also removes international characters.


Alternative solution

By building upon Leo Dabus great answer, I wrote this small extension that excludes emoji characters to the visible eye. However, the unit test revealed that some invisible characters remained. That's why I decided to go with the shorter snippet above (.isASCII).

// This does not remove all characters reliably. Use the solution above.
extension String {
    func withoutEmoji() -> String {
        unicodeScalars
            .filter { (!$0.properties.isEmojiPresentation && !$0.properties.isEmoji) || $0.properties.numericType == .decimal }
            .reduce(into: "") { $0 += String($1) }
    }
}
ambercoded
  • 77
  • 5
2

You need to access the Unicode.Scalar's property properties. Btw filter on a UnicodeScalarView already returns a UnicodeScalarView:

extension String {
    func removeCharacters() -> String {
        let removeCharacters: CharacterSet = .init(charactersIn: " ;,:.!%$&#*^@()/")
        return .init(
            unicodeScalars.filter {
                !removeCharacters.contains($0) &&
                $0.properties.isEmojiPresentation == false
            }
        )
    }
}

edit/update:

The reason a digit would return true for isEmoji is explained in the docs isEmoji

The final result is true because the ASCII digits have non-default emoji presentations; some platforms render these with an alternate appearance.

You can avoid removing them from your string adding an extra condition to your filter method:

extension RangeReplaceableCollection where Self: StringProtocol {
    var removingEmoji: Self  {
        filter { !($0.unicodeScalars.first?.properties.isEmoji == true && !("0"..."9" ~= $0)) }
    }
}

let noEmoji = "abc123".removingEmoji

noEmoji  // "abc123"

Expanding on that:

extension Character {
    var isEmoji: Bool { unicodeScalars.first?.properties.isEmoji == true && !isDigit }
    var isDigit: Bool { "0"..."9" ~= self }
    var isNotEmoji: Bool { !isEmoji }
}

extension RangeReplaceableCollection where Self: StringProtocol {
    var removingEmoji: Self  { filter(\.isNotEmoji) }
    var emojis: Self  { filter(\.isEmoji) }
}

Playground testing:

let notEmoji = "abc✈️123".removingEmoji
notEmoji  // "abc123"
let emojis = "abc✈️123".emojis
emojis  // "✈️"
Leo Dabus
  • 229,809
  • 59
  • 489
  • 571
  • Leo, why are airplane emoji still allowed? – John Aug 19 '21 at 19:20
  • show which emoji you are trying to remove – Leo Dabus Aug 19 '21 at 19:21
  • ✈️ still seems to appear even with your code - I think there may be more like it – John Aug 19 '21 at 19:22
  • Just use `isEmoji` property instead of `isEmojiPresentation` – Leo Dabus Aug 19 '21 at 19:23
  • Actually isEmoji is also removing numbers from appearing – John Aug 19 '21 at 19:50
  • Leo, I understand - but this property is causing a lot of confusion, it removes some but not all emojis and your suggestion ended up removing all emojis + numbers. If I mark your answer as correct do mind helping solve how to remove all emojis but keep numbers? – John Aug 19 '21 at 20:13
  • Just add another condition checking if the character is a digit – Leo Dabus Aug 19 '21 at 20:16
  • Yes its pretty strange – John Aug 19 '21 at 20:28
  • How can I make sure that digits are allowed and emojis are not? I am currently trying, just not sure with scalar property to use - `$0.properties.isEmoji == false && ($0.properties.numericValue != nil) == true` – John Aug 19 '21 at 20:37
0

In addition to the previous responses, it's important not to overlook the ZWJ (Zero Width Joiner) sequence in emojis. Failing to properly remove this can lead to the accumulation of ZWJ codes, so "\u{200D}" should also be removed correctly.

Exactly, it's like this.

func withoutEmoji() -> String {
    unicodeScalars
        .filter { (!$0.properties.isEmojiPresentation && !$0.properties.isEmoji && $0 != "\u{200D}") || $0.properties.numericType == .decimal }
        .reduce(into: "") { $0 += String($1) }
}
Karl
  • 1
  • 1