Am building a website which will have articles, policies and laws and text stuffs. I am storing all the data (in some cases the articles with over 8000 characters) in MSSql 2008 database. I read some articles where they are saying text data should not be stored in databases. Where should they be stored? in .txt files or something? I also want to search through the data. If they are stored in DB i can use stored procedures etc. If stored in docs, i would need to use tools like Lucene. Am i right? Is my approach of using DB wrong for this project? Please enlighten me.
-
[citation-needed]. Text data should be stored in databases, that's what they're for. Are you sure you didn't read "**image** data should not be stored in databases"? (and even that is controversial) – Piskvor left the building Feb 22 '12 at 11:39
-
http://trycatchfail.com/blog/post/Introduction-to-LuceneNET.aspx in this article see the section "Why not use SQL Server?" – heaVenShaker Feb 22 '12 at 11:47
-
You're misquoting. It says "do not use SQL Server to store *and search* **large amounts of text**" (emphasis mine). 8000 characters is (from a database point of view) not a large amount of text, not in 2012. You'd need all your articles to be at least a hundred times larger until the problem would start manifesting. For such site as you're describing, MSSQL is quite sufficient. Note also that the article is on *full-text searching*, not just storing and retrieving. – Piskvor left the building Feb 22 '12 at 11:53
-
ok boss. as all the data is stored in database (in different tables), how should i search words from them? can i use lucene (or other tools) for searching database, or good old stored procedures are best? and another thing, i heard that SEO becomes hard if you store data in databasse as google spiders cant crawl over them. Is it right? any solution? – heaVenShaker Feb 22 '12 at 11:59
-
i WOULD need a full text searching wouldnt i? – heaVenShaker Feb 22 '12 at 12:04
-
That's just another unsubstantiated/misquoted rumor. Perhaps try looking at some tutorials for "mssql fulltext search"? (Who knows if you need full-text search; what do the requirements for the site say? Or, if it's your personal site, it's *you* who should know if you need it or not.) – Piskvor left the building Feb 22 '12 at 12:04
-
ok thanks i'll study more and get the matter resolved. thanks for your help. :) – heaVenShaker Feb 22 '12 at 12:17
1 Answers
You will be using a DB of some description for this project no matter how you look at it, whether it be: 1) An old fashioned flat file database (txt documents, not recommended for large scale projects imho) 2) A traditional text storing database 3) A database of documents
The argument whether to use a DB of text or a db of documents depends on which skills/knowledge you possess or are likely to get access to (or assistance with). It sounds to me like you are more comfortable with a DB of text and in my opinion there is nothing wrong with that - worst case scenario if there ends up being a genuine need for documents to be used in the long run rather than straight text storage you should be able to generate the documents automatically from a text database - I suspect doing the reverse would be a lot more tricky (converting a load of proprietary documents to text for storage and insertion). Generating a plain text file from a text databse is trivial, and most vendor document formats support the importing of plain text documents for subsequent formatting.
For a large project like this you really need to spend some time considering what your documents are likely to be used for and by whom, and what methods best match them. If you are providing a database for people that heavily use MS Word and want to download your data you probably need to consider using a document DB. If it's just the information you want to provide (and web-based tools) you want to consider how you want to manipulate your own data.
This is all opinion obviously, but my last advice would be make sure you use utf-8 text from the outset if you go down the text route (bitter experience).

- 351
- 1
- 11