0

I am trying to develop a module in asp.net using C#.

I stored project's title, descriptions in SQL DB along with the supported files which can be .pdf,.doc,.xls etc formats.

I need to perform a search operation on all the entries in the tables along with the supported files(need to display the files if the searched key word is present inside a file).

How can I develop a search to achieve above functionality?

gturri
  • 13,807
  • 9
  • 40
  • 57
kranthiv
  • 174
  • 1
  • 1
  • 8

2 Answers2

0

Not using pure SQL, .pdf and old office documents (95-2003) are binary formats. To read what is inside those files you need to parse them in C# using one library or another. Such a search would take a while.....

SynerCoder
  • 12,493
  • 4
  • 47
  • 78
  • True, The search is taking a while to perform.I am having files above 500 count. Looking for more efficient way to do that... – kranthiv Dec 28 '13 at 14:04
  • Use an ssd. Bottle neck is more readspeed from harddisk then the parsing of the files, CPUs are often fast enough, but harddisks not so much. – SynerCoder Dec 28 '13 at 14:36
0

(Assuming you are using SQL server)
You can do this with a feature in sql server called full text search, but you need some extra effort to extract the text from the files before you can index them See MSDN for more info http://msdn.microsoft.com/en-us/library/ms142531%28v=sql.100%29.aspx and http://msdn.microsoft.com/en-us/library/ms142499%28v=sql.100%29.aspx

However if I was doing this I would move all the searching out to a SOLR instance, or Elastic Search, as these are much easier to query after the initial setup.

You could also use Lucene in .net, (SOLR/Elastic search are based on Lucene). More info on this can be found in this question Indexing .PDF, .XLS, .DOC, .PPT using Lucene.NET

Community
  • 1
  • 1
alastairtree
  • 3,960
  • 32
  • 49