4

I need to get a list of all documents in a site collection, which I believe I can do with either the alldocs table or the alluserdata table (MOSS 2007 SP1) but do not see how I can get the author information for the document. I do not need the contents of the document (e.g. AllDocStreams content)

Something like this:

SELECT     tp_DirName, tp_LeafName, tp_Version, tp_Modified, tp_Created
FROM         AllUserData
WHERE     (tp_ContentType = 'Document') 
AND (tp_LeafName NOT LIKE '%.css') 
AND (tp_LeafName NOT LIKE '%.jpg') 
AND (tp_LeafName NOT LIKE '%.png') 
AND (tp_LeafName NOT LIKE '%.wmf') 
AND (tp_LeafName NOT LIKE '%.gif') 
AND (tp_DirName NOT LIKE '%Template%') 
AND (tp_IsCurrentVersion = 1) 
AND (tp_LeafName NOT LIKE '%.xsl')
ORDER BY tp_SiteId, tp_ListId, tp_DirName, tp_LeafName, tp_IsCurrentVersion DESC

Is there a better way to go about this?

PapaDaniel
  • 267
  • 2
  • 5
  • 14

8 Answers8

5

People that claim that you cannot query SharePoint databases because it is not supported are wrong. From reading the documentation, it is fine to query the database as long as you use the 'With(NoLock)' clause. It is clearly not supported to update, delete, or insert records.

The below query is supported:

Select * 
From your_content_database.dbo.AllDocs With (NoLock)

I will post a query that provides the desired result in a few minutes.

C B
  • 1,677
  • 6
  • 18
  • 20
4

Why not use the sharepoint object model rather then using the raw database approach? I know that the object model approach does have a performance penalty compared to the database, but MS could change the db schema with the next path. On the other hand the likelyhood of MS breaking their own object model is far less, and as far as I know the recommended way is to use either the object model or the web services.

Kasper
  • 1,710
  • 2
  • 17
  • 31
4

Don't ever query the SharePoint database directly. This is completely unsupported and can get you into trouble moving forward (for instance, if a service-pack or hotfix modifies schema, then you app is broken).

  • Huh? What do you mean? How can querying corrupt your database? Aren't you talking about Create, Update, Delete? I would think Read would always be okay, wouldn't it? – pabrams May 12 '15 at 19:37
  • Oh, wait, now that I re-read again days later, maybe I misinterpreted your language. When you say "you app is broken", you're talking about his custom app that contains the above code, and not SharePoint itself, right? In that case I guess your answer is valid (I had been considering the situation of one-off queries, in which case no harm is done). It won't let me undo my downvote until you edit, though, so maybe you could change the language to be more clear. – pabrams May 21 '15 at 02:33
3

The below would return the top 100 largest documents that were added in the last 24 hours to the content database.

Select Top 100 
       W.FullUrl, 
       W.Title, 
       L.tp_Title as ListTitle, 
       A.tp_DirName, 
       A.tp_LeafName, 
       A.tp_id , 
       DS.Content , 
       DS.Size, 
       D.DocLibRowID, 
       D.TimeCreated, 
       D.Size, 
       D.MetaInfoTimeLastModified, 
       D.ExtensionForFile 
From your_content_database.dbo.AllLists L With (NoLock) 
join your_content_database.dbo.AllUserData A With (NoLock) 
  On L.tp_ID=tp_ListId 
join your_content_database.dbo.AllDocs D With (NoLock) 
  On A.tp_ListID=D.ListID 
 And A.tp_SiteID=D.SiteID 
 And A.tp_DirName=D.DirName 
 And A.tp_LeafName=D.LeafName 
join your_content_database.dbo.AllDocStreams DS With (NoLock) 
  On DS.SiteID=A.tp_SiteID 
 And DS.ParentID=D.ParentID 
 And DS.ID=D.ID 
join your_content_database.dbo.Webs W With (NoLock) 
  On W.ID=D.WebID 
 And W.ID=L.Tp_WebID 
 And W.SiteID=A.tp_SiteID 
Where DS.DeleteTransactionID=0x 
  And D.DeleteTransactionID=0x 
  And D.IsCurrentVersion=1 
  And A.tp_DeleteTransactionID=0x 
  And A.tp_IsCurrentVersion=1 
  And D.HasStream=1 
  And L.tp_DeleteTransactionId=0x 
  And ExtensionForFile not in('webpart','dwp','aspx','xsn','master','rules','xoml') 
  And D.MetaInfoTimeLastModified>DateAdd(d,-1,GetDate()) 
Order by DS.Size desc
C B
  • 1,677
  • 6
  • 18
  • 20
  • Query doesn't work in SP2013. Table `AllUserData` no longer has `tp_DirName` col or `tp_LeafName` col – DeanOC May 17 '18 at 03:08
2

I recommend that you have a look at the Camelot .NET Connector which allows you to query SharePoint 2007/2010 using standard SQL queries. Its a ADO.NET driver that can also be exposed through a simple WCF service and by that available through any programming language. Lets say one would like to select from "shared documents", you would write something like:

select * from `shared documents`

or with certain columns:

select id, title, filetype, filesize, created, createdby from `shared documents`

or with where statement:

select id, title, filetype, filesize, created, createdby from `shared documents` where filetype = '.gif'
C B
  • 1,677
  • 6
  • 18
  • 20
Ulf
  • 101
  • 2
1
  • Why don't you use a Content Query web part?
  • Why don't you use a search object to query the same? This would be my preferred solution. Search has most properties already and you can add more if you need them. Search is probably a lot quicker than querying content database(s).

Whether it is supported or not, it is still bad form to query the Content Database directly and any developer who would suggest this as a solution should get a lecture ;). For instance, what happens if an admin creates a second content database to your webapp? If you query goes across site collections it will not return the desired results until you provide for this in code.

ArjanP
  • 2,172
  • 2
  • 15
  • 22
0

MOSS provides many webservices out of the box which make life a little easier. They are always worth exploring.

For this particular instance, I think the article, Getting a list of files from a MOSS document library using a SharePoint web service, will be of assistance. If this isn't your exact scenario, it will get you on the right track.

If the Document service doesn't help you, the Search service will I'm sure. Check the documentation for usage.

cciotti
  • 710
  • 1
  • 7
  • 13
  • I need to do this at the server / SQL level since IIS is set to time out at 2 minutes (120 seconds) and we have no control over that infrastructure piece as well as in order to actually deploy a code product (e.g. app or webpart) requires a release process that is at least 7 days. – PapaDaniel Oct 17 '08 at 21:10
  • This is really about doing legal discovery and just getting a list of all documents in the whole site collection and who created them, when and what are they named. – PapaDaniel Oct 17 '08 at 21:12
0

You can get some of the information from the UserInfo table by joining AllUserData.tp_Author to UserInfo.tp_ID, but messing around in these tables is not recommended and can be very fragile, and also your queries are not guaranteed to work after applying any patches or service packs to SharePoint. I would use either webservices or the SharePoint object model to access the data.

Cruiser
  • 268
  • 1
  • 5