1

I'm storing a collection of free proxies in database. Proxy entity consists of:

  • IP Address
  • Port
  • List of sources

Source is basically a website where I found this proxy information. Here's my schema:

proxy table:

+--------------+-------------+------+-----+---------+-------+
| Field        | Type        | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+-------+
| id           | varchar(45) | NO   | PRI | NULL    |       |
| ip_address   | varchar(40) | NO   |     | NULL    |       |
| port         | smallint(6) | NO   |     | NULL    |       |
+--------------+-------------+------+-----+---------+-------+

source:

+----------+--------------+------+-----+---------+----------------+
| Field    | Type         | Null | Key | Default | Extra          |
+----------+--------------+------+-----+---------+----------------+
| id       | int(11)      | NO   | PRI | NULL    | auto_increment |
| resource | varchar(200) | NO   |     | NULL    |                |
+----------+--------------+------+-----+---------+----------------+

proxy_sources which joins first two tables:

+-----------+-------------+------+-----+---------+-------+
| Field     | Type        | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+-------+
| proxy_id  | varchar(45) | NO   | MUL | NULL    |       |
| source_id | int(11)     | NO   | MUL | NULL    |       |
+-----------+-------------+------+-----+---------+-------+

My Java ORM classes:

@Entity
@Table(name = "proxy")
public class Proxy {

    @Id
    @Column(name = "id")
    private String id;

    @Column(name = "ip_address")
    private String ipAddress;

    @Column(name = "port")
    private int port;

    @OneToMany(cascade = CascadeType.MERGE, fetch = FetchType.EAGER)
    @JoinTable(
            name = "proxy_sources",
            joinColumns = @JoinColumn(name = "proxy_id"),
            inverseJoinColumns = @JoinColumn(name = "source_id")
    )
    private List<Source> sources = new ArrayList<>();

    ...
}


@Entity
@Table(name = "source")
public class Source {

    @Id
    @Column(name = "id")
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private long id;

    @Column(name = "resource")
    private String resource;

    ...
}

Whenever I save a proxy object I want to avoid duplicating existing sources. For example:

Proxy object has 2 sources:

  1. with resource = "res1"
  2. with resource = "res2"

If source table already contains entry with source = "res1" I want to populate it's id property in java object from database to avoid creating duplicate.

For now I do it manually in my Repository class:

public String save(Proxy proxy) {
    populate(proxy.getSources());
    return (String) sessionFactory.getCurrentSession().save(proxy);
}

Here's populate method:

private void populate(List<Source> sources) {
    if (sources.isEmpty()) {
        return;
    }

    List<String> resources = sources.stream().map(Source::getResource).collect(toList());

    List<Source> existing = sessionFactory.getCurrentSession()
            .createQuery("FROM Source source WHERE source.resource IN (:resources)", Source.class)
            .setParameterList("resources", resources)
            .list();

    sources.forEach(source -> existing.stream()
            .filter(s -> s.getResource().equals(source.getResource()))
            .findAny()
            .ifPresent(s -> source.setId(s.getId())));
}

Basically what I do is checking for existence every source in sources collection. If source with same resource value already exists, I populate it's id from database. Non-empty id avoids creating duplicates.

It works, but probably there's a cleaner solution for this problem?

Dmitry Papka
  • 1,201
  • 3
  • 13
  • 24

1 Answers1

0

The first modification you can make to avoid duplication is creating a unique key in the source table, on the resource column. This way even if you make a mistake in your code the database will thrown an error if you try to save a duplicate register.

With that said, there is no easy way around to save only objects that don't exist in the database. You either make the resource column be your primary key and throw the id column out (which I don't believe is a good choice), or you have to make a select on the database.

This question has more details on the second option

If you are willing to change your application flow, one way that might fix this problem is to break the proxy saving into two steps. First you register all the sources and after you have all sources saved then you start registering proxies. This way you know that when you are saving the proxy it will already have all sources saved beforehand, making your job at that point only to link to existing sources on the Proxy entity.

Daniel Pereira
  • 2,720
  • 2
  • 28
  • 40