2

I have B2B2C business and I have few suppliers who keep update my data. Every week I need to update around 30GB JSON ( one file ). I would like to know about faster way than readline of BufferedReader the file type is ZStandard(zstd). For now its inserting 500Mb every 4 hours. is that make any sense that is so slow ?? the web app and database are deploying on unix server ( Tomcat apache 9.06)

my code :

    try {
        BufferedReader br;
        br = new BufferedReader(new FileReader("/opt/tomcat/" + fileName));

        String line = br.readLine();

        ObjectMapper om = new ObjectMapper();

        while (line != null) {
            

            if (!rootRepository.existsByAddress(om.readValue(line, Data.class).getAddress())) {
                rootRepository.save(om.readValue(line, Data.class));
            }

            line = br.readLine();
        }

        br.close();

    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }
    return "Completed";


}`
  • 1
    Have you tried batching multiple lines together? – g00se Aug 09 '21 at 10:49
  • every line is different object , I cannot batching them. – wolfizon contact Aug 09 '21 at 12:49
  • 1
    I would guess that your bottleneck is in the two database calls (`existsByAddress` and `save`). [g00se's comment](https://stackoverflow.com/questions/68710608/read-line-by-line-and-insert-data-16-gb-json-too-slow-mysql-java-spring-b?noredirect=1#comment121430169_68710608) suggests batching the inserts like in [this tutorial](https://www.baeldung.com/jpa-hibernate-batch-insert-update). – Piotr P. Karwasz Aug 09 '21 at 13:15
  • 2
    Collect 100 objects, then batch insert all 100 in a single `INSERT`. – Rick James Aug 09 '21 at 15:01
  • ok , the batch is working well but still very very slow after 2 GB... – wolfizon contact Aug 12 '21 at 05:18

0 Answers0