4

I am writing a Dexterity content type which contains plain text and HTML fields. I want to have a custom SearchableText() method which exposes these fields to portal_catalog and Plone full text search.

I assume for plain text I can just do string join with spaces. But how I should preprocess HTML content when exposing it in SearchableText()?

Mikko Ohtamaa
  • 82,057
  • 50
  • 264
  • 435
  • 2
    I think the meta-question is "What does Archetypes do that Dexterity (CMF) base SearchableText() method does not yet?" I would mimic whatever Archetypes does in your content class (use portal_transforms?). I'm guessing there is an explicit decision not to rely upon CMF tools (like portal_transforms) in plone.dexterity.content.DexterityContent and subclasses). This seems like a good opportunity to create an add-on base class to act as a bridge until Dexterity gets its own first-class transforms story. – sdupton Aug 05 '11 at 17:04
  • For plain text, in addition to joining it with spaces, you need to make sure it is utf8-encoded, not unicode. – David Glick Apr 10 '13 at 05:28

2 Answers2

9

for converting data in plone there is a tool called portal_transforms, which is quite intelligent in converting stuff (depending on your os / installation it may also be able to convert .doc, .pdf etc.):

from Products.CMFCore.utils import getToolByName
transforms = getToolByName(self.context, 'portal_transforms')
stream = transforms.convertTo('text/plain', html, mimetype='text/html')
text = stream.getData().strip()

for indexing fields in dexterity I propose to use collective.dexteritytextindexer (but there is no TTW support at the moment). -> http://pypi.python.org/pypi/collective.dexteritytextindexer -> https://github.com/collective/collective.dexteritytextindexer

cheers

jone
  • 1,864
  • 12
  • 11
1

Maybe collective.dexteritytextindexer can help you to get part of what you want.

Mark van Lent
  • 12,641
  • 4
  • 30
  • 52
gforcada
  • 2,498
  • 17
  • 26