0

I am working on a script using Twitter's API and I am trying to find matches to exact phrases.

The API however doesn't allow queries for exact phrases so I've been trying to find a workaround however I am getting results that contain words from the phrases but not in the exact match as the phrase.

var search_terms = "buy now, look at this meme, how's the weather?"; 

let termSplit = search_terms.toLowerCase();    
let termArray = termSplit.split(', ');
//["buy now", "look at this meme", "how's the weather?"];
    
    client.stream('statuses/filter', { track: search_terms }, function (stream) {
    console.log("Searching for tweets...");
       stream.on('data', function (tweet) { 
        if(termArray.some(v => tweet.text.toLowerCase().includes(v.toLowerCase()) )){
          //if(tweet.text.indexOf(termArray) > 0 )
            console.log(tweet);
        }
      });
    });

Expected results should be a tweet with any text as long as it contains the exact phrase somewhere.

The results I am getting returns tweets that have an array value present but not an exact phrase match of the value.

Example results being returned - "I don't know why now my question has a close request but I don't buy it."

Example results I am expecting - "If you like it then buy now."

What am I doing wrong?

730wavy
  • 944
  • 1
  • 19
  • 57
  • While you can make some optimizations, there doesn't appear to be anything wrong with your code according to what you've written. However, without a [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example), it's difficult to tell what you are expecting if your expectations are not being met. Can you show us some example tweet JSON data that matches your track filter, along with the ones that you expect to match vs the ones that you don't? – jsejcksn Jul 11 '22 at 04:20
  • What exactly do you mean by "being returned" and "matching"? Do you mean being logged to the console in your `console.log(tweet)` statement? If yes, your question is still not clear. If not, and you mean the tweet results which match your [track stream parameter](https://developer.twitter.com/en/docs/twitter-api/v1/tweets/filter-realtime/guides/basic-stream-parameters#track) supplied to the API, then you might need to review the documentation again to adjust your expectations. – jsejcksn Jul 11 '22 at 04:32
  • 1
    From the `track` docs page I linked to above: "Exact matching of phrases (equivalent to quoted phrases in most search engines) is not supported." – jsejcksn Jul 11 '22 at 04:38
  • By being returned I am referring to the results found which I have logged to the console. I am aware of the track docs that's why in my question I said - 'The API however doesn't allow queries for exact phrases'. At this point, I'm not sure what about my question is not clear. I am looking to match exact phrases to strings found in the returned data. I understand about the api docs etc. Therefore I am looking to use an expression of some sort that will take the results from the api and filter them to match so I can then do other stuff. – 730wavy Jul 11 '22 at 18:11

2 Answers2

1

First, toward the future:

Twitter is planning to deprecate the statuses/filter v1.1 endpoint:

These features will be retired in six months on October 29, 2022.

Additionally, beginning today, new client applications will not be able to gain access to v1.1 statuses/sample and v1.1 statuses/filter. Developers with client apps already using these endpoints will maintain access until the functionality is retired. We are not retiring v1.1 statuses/filter in 6-months, only the ability to retrieve compliance messages. We will retire the full endpoint eventually.

So, now is a great time to start using the equivalent v2 API, Filtered Stream, which supports exact phrase matching, helping you avoid this entire scenario in your application code.


With that out of the way, below I've included a minimal, reproducible example for you to consider which demonstrates how to match exact phrases in streamed tweets, and even extract additional useful information (like which phrase was used to match it and at what index within the tweet text). It includes inline comments explaining things line-by-line:

<script type="module">

// Transform to lowercase, split on commas, and trim whitespace
// on the ends of each phrase, removing empty phrases
function getPhrasesFromTrackText (trackText) {
  return trackText.toLowerCase().split(',')
    .map(str => str.trim())
    .filter(Boolean);
}

const trackText = `buy now, look at this meme, how's the weather?`;
const phrases = getPhrasesFromTrackText(trackText);

// The callback closure which will be invoked with each matching tweet
// from the streaming response data
const handleTweet = (tweet) => {
  // Transform the tweet text once
  const lowerCaseText = tweet.text.toLowerCase();

  // Create a variable to store the first matching phrase that is found
  let firstMatchingPhrase;
  for (const phrase of phrases) {
    // Find the index of the phrase in the tweet text
    const index = lowerCaseText.indexOf(phrase);
    // If the phrase isn't found, immediately continue
    // to the next loop iteration, skipping the rest of the code block
    if (index === -1) continue;
    // Else, set the match variable
    firstMatchingPhrase = {
      index,
      text: phrase,
    };
    // And stop iterating the other phrases by breaking out of the loop
    break;
  }

  if (firstMatchingPhrase) {
    // There was a match; do something with the tweet and/or phrase
    console.log({
      firstMatchingPhrase,
      tweet,
    });
  }
};

// The Stack Overflow code snippet runs in a browser and doesn't have access to
// the Node.js Twitter "client" in your question,
// but you'd use the function like this:

// client.stream('statuses/filter', {track: trackText}, function (stream) {
//   console.log('Searching for tweets...');
//   stream.on('data', handleTweet);
// });

// Instead, the function can be demonstrated by simulating the stream: iterating
// over sample tweets. The tweets with a ✅ are the ones which
// will be matched in the function and be logged to the console:

const sampleTweets = [
  /* ❌ */ {text: `Now available: Buy this product!`},
  /* ✅ */ {text: `This product is availble. Buy now!`},
  /* ✅ */ {text: `look at this meme `},
  /* ❌ */ {text: `Look at how this meme was created`},
  /* ❌ */ {text: `how's it going everyone? good weather?`},
  /* ✅ */ {text: `Just wondering: How's the weather?`},
  // etc...
];

for (const tweet of sampleTweets) {
  handleTweet(tweet);
}

</script>
jsejcksn
  • 27,667
  • 4
  • 38
  • 62
  • Thank you for pointing out v1 will be retired. I will test with v2 and look into it more as well as testing your answer with v2. – 730wavy Jul 13 '22 at 05:52
0

You could try using regular expressions. Here's an example of a regular expression search for a phrase. It returns a positive number (the character where the match started) if there is a match, and -1 otherwise. I return the whole phrase if there is a match.

You can use quite sophisticated grammar's for matching particular phrases of interest, I'm just using simple words in this example.

regular_expression

dmc-au
  • 31
  • 3