I am using Hibernate to store data from parsing html using jsoup. Here are my entities:
Sentence.hbm.xml
<class name="Sentence">
<id name="id">
<column name="SENTENCE_ID"/>
<generator class="native" />
</id>
<property name="content" type="text"/>
<many-to-one name="processedurl" class="src.model.ProcessedUrl">
<column name="PROCESSED_URL_ID" not-null="true" />
</many-to-one>
</class>
ProcessedUrl.hbm.xml
<class name="ProcessedUrl">
<id name="id">
<column name="url_id" />
<generator class="native"/>
</id>
<property name="url" type="text"/>
<property name="date" type="java.util.Date" />
<set name="sentences" cascade="all">
<key column="PROCESSED_URL_ID"/>
<one-to-many class="src.model.Sentence" />
</set>
</class>
POJO Sentence:
public class Sentence {
private long id;
private ProcessedUrl processedurl;
private String content;
public Sentence()
{
}
public Sentence(String content)
{
this.setContent(content);
//this.setUrl(url);
}
public Sentence(String content, ProcessedUrl processed_url) {
this.setContent(content);
this.setProcessedurl(processed_url);
}
public long getId() {
return id;
}
public void setId(long id) {
this.id = id;
}
public ProcessedUrl getProcessedurl() {
return processedurl;
}
public void setProcessedurl(ProcessedUrl processed_url) {
this.processedurl = processed_url;
}
public String getContent() {
return content;
}
public void setContent(String content) {
this.content = content;
}
}
POJO ProcessedUrl:
public class ProcessedUrl {
private long id;
private String url;
private Date date;
private Set<Sentence> sentences;
public ProcessedUrl() {
}
public ProcessedUrl(String url, Date date) {
this.setUrl(url);
this.setDate(date);
}
public ProcessedUrl(String url, Date date, Set<Sentence> sentences) {
this.setUrl(url);
this.setDate(date);
this.setSentences(sentences);
}
public long getId() {
return id;
}
public void setId(long id) {
this.id = id;
}
public String getUrl() {
return url;
}
public void setUrl(String url) {
this.url = url;
}
public Date getDate() {
return date;
}
public void setDate(Date date) {
this.date = date;
}
public Set<Sentence> getSentences() {
return this.sentences;
}
public void setSentences(Set<Sentence> sentences) {
this.sentences = sentences;
}
@Override
public boolean equals(Object obj) {
if(this == obj) return true;
if(!(obj instanceof ProcessedUrl)) return false;
ProcessedUrl that = (ProcessedUrl) obj;
EqualsBuilder eb = new EqualsBuilder();
eb.append(url, that.url);
return eb.isEquals();
}
@Override
public int hashCode() {
HashCodeBuilder hcb = new HashCodeBuilder();
hcb.append(url);
return hcb.toHashCode();
}
}
Indexing method:
public void indexWebPage(String url) throws IOException
{
Document doc = Jsoup.connect(url).get();
Elements elements = doc.body().select("*");
HashSet<Sentence> sentencesCollection = new HashSet<Sentence>();
ProcessedUrl processedUrl = new ProcessedUrl(url, new Date(), sentencesCollection);
for (Element element : elements)
{
if (element.ownText().trim().length() > 1)
{
for (String sentenceContent : element.ownText().split("\\. "))
{
Sentence sentence = new Sentence(sentenceContent, processedUrl);
sentencesCollection.add(sentence);
}
}
}
Session session = HibernateUtils.getSession();
Transaction transaction = session.beginTransaction();
session.persist(processedUrl);
transaction.commit();
session.close();
}
It works that this method is parsing some http://...
later I create a Sentence from it, add it to HashSet and after parsing everything I add the ProcessedUrl object with this populated HasSet to the DB and becasue in xml file there is cascade set on it will populate also Sentence table. And it works. But when I parse once again the same link it duplicates both tables. I made overrided equals()/hashcode() using url as the buiseness key and I thought it will see the difference between next parsed url and won't add it (so won't add sentences also). But apparently I didn't understand well how it works.
Some hints & clarifications? Maybe my way is totally dumb?