3

I experience an issue while indexing my data in a batch. I want to index an Article list, with some @IndexedEmbedded on members where I need to get info. Article get additional infos from two others beans : Page and Articlefulltext.

The batch is updating correctly the database and adds new Document to my Lucene Index thanks to Hibernate Search Annotations. But the added documents have incomplete fields. It seems that Hibernate Search doesn't see all the annotations.

So when i look at the resulting lucene Index thanks to Luke, i have some fields about both Article and Page objects, but none about ArticleFulltext, but i have correct data in my database, which means that the persist() operation is done correctly ...

I really need some help here, because i don't see in what there is a difference between my Page and ArticleFullText ...

The weird thing is that if I use a MassIndexer, it correctly add Article + Page + Articlefulltext data into the lucene index. But i don't want to rebuild a millions document index each time i made a big update ...

I set log4j logging level to debug for hibernate search and lucene. They doesn't give me so much informations.

Here are my beans code and batch code.

Thanks in advance for your help,

Article.java :

@Entity
@Table(name = "article", catalog = "test")
@Indexed(index="articleText")
@Analyzer(impl = FrenchAnalyzer.class)
public class Article implements java.io.Serializable {

    @Id
    @GeneratedValue(strategy = IDENTITY)
    @Column(name = "id", unique = true, nullable = false)
    @DocumentId        
    private Integer id;

    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "firstpageid", nullable = false)
    @IndexedEmbedded
    private Page page;

    @Column(name = "heading", length = 300)
    @Field(name= "title", index = Index.YES, store = Store.YES)
    @Boost(2.5f)
    private String heading;

    @Column(name = "subheading", length = 300)
    private String subheading;

    @OneToOne(fetch = FetchType.LAZY, mappedBy = "article") 
    @IndexedEmbedded
    private Articlefulltext articlefulltext;
    [... bean methods etc ...]

Page.java

@Entity
@Table(name = "page", catalog = "test")
public class Page implements java.io.Serializable {

    private Integer id;
    @IndexedEmbedded
    private Issue issue;
    @ContainedIn
    private Set<Article> articles = new HashSet<Article>(0);
    [... bean method ...]

Articlefulltext.java

@Entity
@Table(name = "articlefulltext", catalog = "test")
@Analyzer(impl = FrenchAnalyzer.class)
public class Articlefulltext implements java.io.Serializable {

    @GenericGenerator(name = "generator", strategy = "foreign", parameters = @Parameter(name = "property", value = "article"))
    @Id
    @GeneratedValue(generator = "generator")
    @Column(name = "aid", unique = true, nullable = false)
    private int aid;

    @OneToOne(fetch = FetchType.LAZY)
    @PrimaryKeyJoinColumn
    @ContainedIn
    private Article article;

    @Column(name = "fulltextcontents", nullable = false)
    @Field(store=Store.YES, index=Index.YES, analyzer = @Analyzer(impl = FrenchAnalyzer.class), bridge= @FieldBridge(impl = FulltextSplitBridge.class))
    // This Field is not add to the Resulting Document ! I put a log into FulltextSplitBridge, and it's never called during a batch process. But if I use a MassIndexer, i see that FulltextSplitBridge is called for each Articlefulltext ...
    private String fulltextcontents;
    [... bean method ...]

And here is the code which is used for updating both Database and Lucene index

Batch Source code :

FullTextEntityManager em = null;

@Override
protected void executeInternal(JobExecutionContext arg0) throws JobExecutionException {
    ApplicationContext ap = null;
    EntityManagerFactory emf = null;
    EntityTransaction tx = null;


    try {
        ap = (ApplicationContext) arg0.getScheduler().getContext().get("applicationContext");
        emf = (EntityManagerFactory) ap.getBean("entityManagerFactory", EntityManagerFactory.class);
        em = Search.getFullTextEntityManager(emf.createEntityManager());
        tx = em.getTransaction();


        tx.begin();
                // [... em.persist() some things which aren't lucene related, so i skip them ....]
        for(File xmlFile : xmlList){
            Reel reel = new Reel(title, reelpath);
            em.persist(reel);
                    Article article = new Article();
                        // [... set Article fields, so i skip them ....]
                    Articlefulltext ft = new Articlefulltext();
                        // [... set Articlefulltext fields, so i skip them ....]
                    ft.setArticle(article);
                    ft.setFulltextcontents(bufferBlock.toString());
                    em.persist(ft); // i persist ft before article because of FK issues
                    em.persist(article); // there, the Annotation update Lucene index, but there's not updating fultextContent (see my first post)
            if ( nbFileDone % 50 == 0 ) {
                //flush a batch of inserts and release memory:
                em.flush();
                em.clear();
            }
        }
            tx.commit();
    }
    catch(Exception e){
        tx.rollback();
    }
    em.close();
}

1 Answers1

2

Hmm, you don't seem to set both sides of the relation. I can see a ft.setArticle(article), but not article.setFtArticle(ft). Both sides of the relation need to be set. in your case Articlefulltext is the owner of the relationship, but that does mean that you don't have to set both sides.

Hardy
  • 18,659
  • 3
  • 49
  • 65
  • well,you're right, thank you so much ... it's was so simple. The weird thing is that for _Page_, i only made one side of the relation, but it's work correctly. Is that because it's a ManyToOne relation ? – user1882300 Dec 08 '12 at 18:31
  • it depends which direction of the relation link we have to traverse to update the indexes. Likely the one for Page was filled in in the main direction (only) so working by chance. – Sanne Dec 09 '12 at 14:10
  • "By chance", so the "best" practice is to define all relations in the both sides ? That's good to know, it'll do that everywhere. Thanks to both of you ! – user1882300 Dec 09 '12 at 16:03
  • You don't have to define all relations as bi-directional, but if you do define it bi-directional you have to update both sides otherwise there will be confusion on which instance is providing the correct information. Search might need you to define relations as bi-directional to allow for a place to have a place to define both IndexedEmbedded and ContainedIn – Sanne Dec 09 '12 at 16:18