0

First of all, thank you for comming here :

I wrote a script in SQL able to find words in a file, with a lot of help of some websites (http://dzapart.blogspot.fr/2012/04/full-text-search-with-pdf-in-microsoft.html)

The problem is here :

I have "C:\TP3_compte_rendu.pdf" which is a pdf file, and 'C:\TP3.txt' the text of this file :

So, the two files have the same text.

Then, I run my code to build a table / index / catalog :

  -- Creation de la table

CREATE TABLE dbo.DocumentFiles
(
DocumentId uniqueidentifier Primary KEY DEFAULT newsequentialid(),
Nom nvarchar(50) NOT NULL,
Extension nvarchar(10) NOT NULL,
Description nvarchar(1000) NULL,
FileStream_Id uniqueidentifier NOT NULL,
Fichier varbinary(MAX) NOT NULL DEFAULT (0x)
)  ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]


-- Creation de l'index    



EXEC sp_fulltext_service 'load_os_resources',1 
EXEC sp_fulltext_service 'verify_signature', 0
EXEC sp_fulltext_database 'enable'
GO
IF NOT EXISTS (SELECT TOP 1 1 FROM sys.fulltext_catalogs WHERE name = 'Ducuments_Catalog')
BEGIN
EXEC sp_fulltext_catalog 'Ducuments_Catalog', 'create';
END

DECLARE @indexName nvarchar(255) = (SELECT Top 1 i.Name from sys.indexes i
                                Join sys.tables t on  i.object_id = t.object_id
                                WHERE t.Name = 'DocumentFiles' AND i.type_desc = 'CLUSTERED')

                                PRINT @indexName

 -- Creation du catalogue

 EXEC sp_fulltext_table 'DocumentFiles', 'create', 'Ducuments_Catalog',  @indexName
 EXEC sp_fulltext_column 'DocumentFiles', 'Fichier', 'add', 0, 'Extension'
 EXEC sp_fulltext_table 'DocumentFiles', 'activate'
 EXEC sp_fulltext_catalog 'Ducuments_Catalog', 'start_full'
 EXEC sp_help_fulltext_system_components 'filter';  

 ALTER FULLTEXT INDEX ON [dbo].[DocumentFiles] ENABLE
 ALTER FULLTEXT INDEX ON [dbo].[DocumentFiles] SET CHANGE_TRACKING = AUTO

I use IFilter to make the fullText search on PDFfiles :

SELECT document_type, path FROM sys.fulltext_document_types WHERE document_type = '.pdf'

It shows me :

.pdf | C:\Program Files\Adobe\Adobe PDF iFilter 11 for 64-bit platforms\bin\PDFFilter.dll

Because I see ".pdf" and the good .dll , it's ok, IFilter is installed.

So, I put in this table the two files, .txt and .pdf with the same text :

-- Entrée du document

INSERT INTO dbo.DocumentFiles(Nom, Extension,FileStream_Id,  Fichier)
SELECT

  'TP3.pdf' AS Nom
  , 'pdf' AS Extension
  ,'0E984725-C51D-4BF4-9960-E1C80E27ABA0wrong' AS FileStream_Id
  , * FROM OPENROWSET(BULK 'C:\TP3_compte_rendu.pdf', SINGLE_BLOB) 
   AS Fichier;
GO

-- Entrée du document

INSERT INTO dbo.DocumentFiles(Nom, Extension,FileStream_Id,  Fichier)
SELECT

 'TP3.txt' AS Nom
 , 'txt' AS Extension
 ,'0E984725-C51D-4BF4-9960-E1C80E27ABB0wrong' AS FileStream_Id
 , * FROM OPENROWSET(BULK 'C:\TP3.txt', SINGLE_BLOB) 
 AS Fichier;
GO

Then, the search :

SELECT d.* FROM dbo.DocumentFiles d
WHERE Contains(d.Fichier, '%propose%')

And it shows me... Only the .txt.

With the same text, only the .txt is seen, but IFilyter is installed.. I really don't understand.

Artjom B.
  • 61,146
  • 24
  • 125
  • 222
Pingu
  • 53
  • 10
  • Solved : IFilter 11 is bugged, you must use the 9th version : but my codeis good, so, I'll hope it will help someone ! – Pingu Jun 29 '15 at 11:52

1 Answers1

0

You can check this:

  • Remember you need to add the PDF iFilter to the PATH. See the Adobe PDF iFilter installation guide.
  • Contains wildcard is *, not %. Only trailing wildcards are accepted, example: *ckoverflow* does not work, stackover* works.
Artjom B.
  • 61,146
  • 24
  • 125
  • 222