0

I am working for a company that has thousands of documents, mostly pdfs, stored in folders on their webserver. They are mostly user's manuals and tech documents for different products of the different brands they carry. They have a webpage that currenty displays links to all of the documents in the folders by iterating through them recursively, then generating an URL for each image based on the file path.

The manager is concerned about the fact that anytime someone changes the name of a top level folder on the server where the images are kept, it basically "breaks" the code, as those top level names are hard-coded in the app. He wants all of the URLs to be stored in the database to alleviate this issue, and has tasked me with basically replicating the current folder structure on the web server in a SQL Database, and then getting all of the URLs into that database. From the research I have done, it is no trivial task to implement a hierarchical structure like that in a relational database, and I am not a DBA - I'm a web developer.

So my question now is really how can I get the URLs to all of the thousands of images that are currently on the web server into the database? I was thinking maybe creating just a simple table called "Brands" that holds the root URLs for the brands, then another table called "Image links" or something like that, then writing a little utility to simply iterate through all of the image URLs and insert them into that table - does that sound like the way to go?

quantum kev
  • 108
  • 1
  • 9

1 Answers1

0

Seems to me like you could import all of the folder and file names pretty easily with a simple command-line program written in C# or PowerShell, which starts at the root and iterates all of the folders and files. You could have tables like these:

CREATE TABLE dbo.Folders
(
  FolderID   INT IDENTITY(1,1) PRIMARY KEY,
  ParentID   INT NULL FOREIGN KEY REFERENCES dbo.Folders(FolderID),
  FolderName NVARCHAR(255) NOT NULL
  -- , ... other columns ...
);

CREATE TABLE dbo.Documents
(
  DocumentID INT IDENTITY(1,1) PRIMARY KEY,
  FolderID   INT NOT NULL FOREIGN KEY REFERENCES dbo.Folders(FolderID),
  DocName    NVARCHAR(255) NOT NULL
  -- , ... other columns ...
);

You could populate the table with something like Directory.GetFiles() which will allow you to traverse the folders and files. You could also write a function to flatten out the path so you don't have to walk the whole hierarchy when building the path - but based on the above it should be pretty trivial to rename a folder and still always generate the correct path without updating each file (though you will have to change the folder and update the database and keep them in sync). Just an example with some fictional data:

INSERT dbo.Folders(ParentID, FolderName) SELECT NULL, N'root';
INSERT dbo.Folders(ParentID, FolderName) SELECT 1, N'sub1';
INSERT dbo.Folders(ParentID, FolderName) SELECT 1, N'sub2';
INSERT dbo.Folders(ParentID, FolderName) SELECT 2, N'subsub1';
INSERT dbo.Folders(ParentID, FolderName) SELECT 4, N'subsubsub1';

INSERT dbo.Documents(FolderID, DocName) SELECT 5, N'foo.pdf';
INSERT dbo.Documents(FolderID, DocName) SELECT 5, N'bar.pdf';

Here's a function that can flatten out the path, given a DocumentID:

CREATE FUNCTION dbo.ShowFullPath
(
    @FolderID int,
    @DocName  nvarchar(255)
)
RETURNS nvarchar(max)
AS
BEGIN
    DECLARE @FullPath nvarchar(max);

    ;WITH cte AS 
    (
      SELECT FolderID, FolderName, ParentID, rn = 1
        FROM dbo.Folders
        WHERE FolderID = @FolderID
      UNION ALL
      SELECT parent.FolderID, parent.FolderName, 
          parent.ParentID, child.rn + 1
        FROM dbo.Folders AS parent
        INNER JOIN cte AS child
        ON parent.FolderID = child.ParentID
    )
    SELECT @FullPath = STUFF((SELECT N'/' + FolderName 
       FROM cte ORDER BY rn DESC 
       FOR XML PATH(''),
       TYPE).value(N'./text()[1]', N'nvarchar(max)'), 1, 1, N'')
       + N'/' + @DocName
    FROM cte;

    RETURN(@FullPath);
END
GO

So you could always call that at runtime when you need to retrieve the path of a given document (and you could change a function or add a wrapper that takes the DocumentID instead, and provides the FolderID and DocName parameters), but it may make more sense to just use a computed column for that (unfortunately you won't be able to persist the column).

ALTER TABLE dbo.Documents ADD FullPath 
  AS CONVERT(NVARCHAR(MAX), dbo.ShowFullPath(FolderID, DocName));

SELECT DocumentID, FolderID, DocName, FullPath FROM dbo.Documents;

Results:

DocumentID FolderID DocName FullPath
1          5        foo.pdf root/sub1/subsub1/subsubsub1/foo.pdf
2          5        bar.pdf root/sub1/subsub1/subsubsub1/bar.pdf

Or you could create a view:

CREATE VIEW dbo.vDocuments
AS
  SELECT DocumentID, FolderID, DocName, dbo.ShowFullPath(FolderID, DocName)
  FROM dbo.Documents;
Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
  • Thanks for the answer - that definitely sounds like a good way to go about it. It would still be necessary to keep the database in synch with the folder names though, and I will definitely need to let my boss know that. – quantum kev May 14 '12 at 17:59
  • Well you may need to have strict control over who can modify these tables and/or rename folders in Windows at will. That said, you can always build a master app that can go in and completely rebuild the representation in the database from the root or any arbitrary folder... – Aaron Bertrand May 14 '12 at 18:00
  • That is probably the one thing you can't avoid when you separate the file itself from the pointer you store in the database. You could look into FILESTREAM (or FileTable in SQL Server 2012) but I'm not sure those will suit your needs any better. Or you may just suggest to him to invest in SharePoint - there's a reason people don't build these things manually from scratch anymore. – Aaron Bertrand May 14 '12 at 18:16
  • Wow, thanks for the detailed answer and example. I am pretty good at front end development, but not sure I ever could have come up with that SQL code. I appreciate your help and will see what my boss and his buddy think of this type of approach. And +1 on the Sharepoint idea - my thoughts exactly! Lol – quantum kev May 14 '12 at 18:22