0

We have problems related to time range search/filter in Lucene 4.0.0. We have indexed some tweets, and now we want to collect the tweets sent by a specific user in a specific time range. When we run the related query using the created filter, we obtain tweets out of specified time range. E.g. In the example below, we were not expecting to have the "exp tweet" since its timeStamp is less than lowerBound.

Can you give us any suggestion on how to perform this task, or what are the problems in our code?

Regards

Related Code

// time range, format "yyyyMMddHHmmss"
String upperBoundStr = 20110126024422;
String lowerBoundStr = 20110126021422;
String tweetTimeStr = 20110126022922;

//create filter
Filter lowerFilter = new QueryWrapperFilter( TermRangeQuery.newStringRange("creationTime",lowerBoundStr,tweetTimeStr,true,false));      
Filter upperFilter = new QueryWrapperFilter( TermRangeQuery.newStringRange("creationTime",tweetTimeStr,upperBoundStr,false,true));
Filter[] filters = new Filter[2];
filters[0] = lowerFilter;
filters[1] = upperFilter;
Filter chainFilter = new ChainedFilter(filters, ChainedFilter.OR);

// search
Query luceneQuery = new TermQuery(new Term("username", "userName1"));
SimpleFSDirectory index = new SimpleFSDirectory(new File("lucene_index"));
IndexReader reader = DirectoryReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
ScoreDoc[] hits = searchFilteredQuery(luceneQuery, searcher,chainFilter,maxNumberOfNewTweets);
List<RankResult> filteredtweets = convertHitsToRankResults(hits, searcher);

Example Output (Format: date dateIn("yyyyMMddHHmmss") userName)

base tweet: Wed Jan 26 02:29:22 VET 2011 20110126022922 userName1
exp tweet: Tue Jan 25 20:05:02 VET 2011 20110125200502 userName1
user2737636
  • 3
  • 1
  • 3

1 Answers1

0

You just want to get the tweets the has timeStamp between upperBoundStr and lowerBoundStr? If so, you should change Filter chainFilter = new ChainedFilter(filters, ChainedFilter.OR); to Filter chainFilter = new ChainedFilter(filters, ChainedFilter.AND);. Because OR means both timeStamp that are larger than lowerBoundStr and timeStamp that are smaller than upperBoundStr will be put in search result.

lbear
  • 790
  • 1
  • 9
  • 16
  • I want to get tweets that have timestamps larger than tweetTimeStr and smaller than tweetTimeStr, but not the one with tweetTimeStr. I think OR logic is true in this case. Even if I use AND logic, it does not solve my problem and I still get tweets out of range(e.g smaller than lowerBoundStr). – user2737636 Sep 06 '13 at 11:40
  • It seems like the code works good. The problem was related to how I printed out the results. Sorry for the inconvenience. – user2737636 Sep 13 '13 at 07:42