1

I have some tweets like the following:


df = structure(list(Date = structure(c(18946, 18946, 18946, 18946, 
18946, 18946, 18946, 18946, 18946, 18946), class = "Date"), Texts = c("Two $XVG charts I am watching tonight. The $usdt pair\nAnd the $BTC pair, where we are seen a nice backtest of this trendline!", 
"$BTC could be setting up for a huge short squeeze. \n\nPotentially $654,700,000 worth of shorts opened that are yet to close, would be a shame if we resumed the bull market <U+0001F972>", 
"IT'S GONNA HAPPEN!!! \n                       <U+26A1>\n             <U+26A1>              <U+26A1>\n     <U+26A1>                             <U+26A1>\n \n <U+26A1>         $200k  $BTC        <U+26A1>\n\n        <U+26A1>                          <U+26A1>\n              <U+26A1>              <U+26A1>\n                        <U+26A1>", 
"$BTC could be setting up for a huge short squeeze. \n\nPotentially $654,700,000 worth of shorts opened that are yet to close, would be a shame if we resumed the bull market <U+0001F972>", 
"Met several @Blockstream employees here in El Salvador for the #BitcoinWeek conference. Met @Excellion at breakfast and chatted for a few minutes. Considering conversion to $BTC maxi. <U+0001F60F>", 
"I spoke w 5-6 scammers trying to sell me the cards: they ask for personal info, bank details &amp; prefer to be paid through bitcoin. “If we have to work together we have to start by building trust,” one told me. He was among those flagged as a \"scammer\" in the warning groups.", 
"Trending on #LunarCrush:\n\n\"VanEck’s bitcoin futures ETF to list tomorrow\" via @TheBlock__\n\nTop coin mentions\n$btc\n\ #LunarShare", 
"Would you rather have: 1 $BTC, 245k $DOGE or 1B $SHIB? (All worth 1 BTC)", 
"@crypto_birb look at $BTC on the monthly<U+0001F440><U+0001F440><U+0001F525><U+0001F525><U+0001F525> about to take off", 
"Trending on #LunarCrush:\n\n\"MicroStrategy CEO predicts that Bitcoin is ‘going up forever’\" \n\nTop coin mentions\n$btc\n\n #LunarShare"
)), row.names = c(NA, 10L), class = "data.frame")

The dataset looks like this:

enter image description here

What I would like to do is the following. I want to extract from df, two columns: the emoticon as originally displayed in the image in one column (only the emoticon) and its encoding in another column.

For example, the result should look like something:

Image                      Encoding

visual (when viewed)        <U+26A1>
...
...

Can anyone help me?

Thanks!

s__
  • 9,270
  • 3
  • 27
  • 45
Rollo99
  • 1,601
  • 7
  • 15

0 Answers0