1

I've successfully migrated 1,000s of news items and other content from Sitefinity 5 to Wordpress after hours of excruciating analysis and sheer luck with guessing but have a few items that are still left over. Specifically the pages. I know a lot of the content is stored in very obscure ways but there has to be somebody who has done this before and can steer me in the right direction.

My research (and text-search against the DB) has found the page titles etc but when I search the content I get nothing. My gut tells me that the content is being stored in binary form, can anyone confirm if this is the case?

Sitefinity documentation is only helpful if you're a .net developer who has a site set up in Visual Studio (as far as I've seen).

Daniel
  • 2,167
  • 5
  • 23
  • 44

2 Answers2

13

This is probably the most obfuscated manner of storing content that I've ever encountered. After performing text searches against the database I've finally found where the content is stored but it's not a simple process to get it out.

Pages' master record appears to be sf_page_node, there are related tables:

  • sf_object_data (page_id is related to sf_page_node.content_id)
  • sf_draft_pages (page_id is related to sf_page_node.content_id)
  • sf_page_data (content_id is related to sf_page_node.content_id)
  • sf_control_properties (control_id is related to sf_object_data.id)

So you could get the info you need with a query like this:

select * from 
[sf_page_node]
join sf_object_data on sf_page_node.content_id = sf_object_data.page_id
join sf_control_properties on sf_object_data.id = sf_control_properties.control_id

Other things to consider:

  • the parent_id field is related to the sf_page_node table, so if you're writing a script, be sure to query this as well
  • the page may have a banner image, you will pick up the "place_holder" value as 'BannerHolder' with a caption of "Image" The image may be stored as blobs in sf_media_content, you should handle this separately. The "nme" value of 'ImageId' will have a GUID in the "val" column. You can query sf_media_content with this value as "content_id" the actual binary data is stored in sf_chunks, they relate on "file_id"

My revised query taking into account what I'll need to migrate content is below:

select 
    original.content_id,
    original.url_name_,
    original.title_,
    parent.id,
    parent.url_name_,
    parent.title_,
    place_holder,
    sf_object_data.caption_,
    sf_control_properties.nme,
    val
from [sf_page_node] original
join sf_object_data on original.content_id = sf_object_data.page_id
join sf_control_properties on sf_object_data.id = sf_control_properties.control_id
join sf_page_node parent on original.parent_id = parent.id

I hope this helps someone!

Daniel
  • 2,167
  • 5
  • 23
  • 44
1

You don't need the version items in this case - as you already found out, it stores the previous version of the pages in binary format.

The current live pages' data is available in sf_control_properties and sf_object_data tables. You need to join these together with sf_page_data and sf_page_node and you will get the full picture.

Depending on your requirements, it may be easier to do a GET request to each page and parse the returned html response.

Veselin Vasilev
  • 3,698
  • 1
  • 15
  • 21
  • Thanks Veselin, how does sf_control_properties table relate to the other tables? Also, what do you mean by "do a GET request on each page" I can't rely on scraping the live site as a lot of pages are no longer linked to, they do, however, still need to be migrated for historic purposes, this is why I'm working in the DB. – Daniel Feb 15 '17 at 07:02
  • Nevermind, some further text-searching on the DB lead me to the answer, they join on sf_object_data.id = sf_control_properties.control_id. Thanks for your input, I will update my answer – Daniel Feb 15 '17 at 07:59
  • Good you find it yourself. As for the GET - you can generate a sitemap file with all the pages in the site and then request them. – Veselin Vasilev Feb 15 '17 at 08:05