I'm storing a collection of free proxies in database. Proxy entity consists of:
- IP Address
- Port
- List of sources
Source is basically a website where I found this proxy information. Here's my schema:
proxy table:
+--------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+-------+
| id | varchar(45) | NO | PRI | NULL | |
| ip_address | varchar(40) | NO | | NULL | |
| port | smallint(6) | NO | | NULL | |
+--------------+-------------+------+-----+---------+-------+
source:
+----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| resource | varchar(200) | NO | | NULL | |
+----------+--------------+------+-----+---------+----------------+
proxy_sources which joins first two tables:
+-----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+-------+
| proxy_id | varchar(45) | NO | MUL | NULL | |
| source_id | int(11) | NO | MUL | NULL | |
+-----------+-------------+------+-----+---------+-------+
My Java ORM classes:
@Entity
@Table(name = "proxy")
public class Proxy {
@Id
@Column(name = "id")
private String id;
@Column(name = "ip_address")
private String ipAddress;
@Column(name = "port")
private int port;
@OneToMany(cascade = CascadeType.MERGE, fetch = FetchType.EAGER)
@JoinTable(
name = "proxy_sources",
joinColumns = @JoinColumn(name = "proxy_id"),
inverseJoinColumns = @JoinColumn(name = "source_id")
)
private List<Source> sources = new ArrayList<>();
...
}
@Entity
@Table(name = "source")
public class Source {
@Id
@Column(name = "id")
@GeneratedValue(strategy = GenerationType.IDENTITY)
private long id;
@Column(name = "resource")
private String resource;
...
}
Whenever I save a proxy object I want to avoid duplicating existing sources. For example:
Proxy object has 2 sources:
- with resource = "res1"
- with resource = "res2"
If source table already contains entry with source = "res1" I want to populate it's id property in java object from database to avoid creating duplicate.
For now I do it manually in my Repository class:
public String save(Proxy proxy) {
populate(proxy.getSources());
return (String) sessionFactory.getCurrentSession().save(proxy);
}
Here's populate method:
private void populate(List<Source> sources) {
if (sources.isEmpty()) {
return;
}
List<String> resources = sources.stream().map(Source::getResource).collect(toList());
List<Source> existing = sessionFactory.getCurrentSession()
.createQuery("FROM Source source WHERE source.resource IN (:resources)", Source.class)
.setParameterList("resources", resources)
.list();
sources.forEach(source -> existing.stream()
.filter(s -> s.getResource().equals(source.getResource()))
.findAny()
.ifPresent(s -> source.setId(s.getId())));
}
Basically what I do is checking for existence every source in sources collection. If source with same resource value already exists, I populate it's id from database. Non-empty id avoids creating duplicates.
It works, but probably there's a cleaner solution for this problem?