2

Most examples of RxJava I see have to do with network calls. I am new to the framework, so I am wondering if using it for something like parallel file parsing makes sense as well. I have a directory of files, whose data I need to parse into SQL tables. Can I do this with RxJava? I would like it to be as multithreaded as possible for efficiency.

Description of Data

My data has a hierarchal structure that starts with a bunch of Sections. Each Section contains one or more Subsection. Each Subsection contains one or more HTML files.

SQL Tables

sqlite> SELECT * FROM sections;
_id         ordinal     title     
----------  ----------  ----------
1           1           Management
2           2           Emergency Preparedness 

-- has a foreign key that references the sections table

sqlite> SELECT * FROM subsections;
_id         ordinal     chapter_id  title     
----------  ----------  ----------  ----------
1           A           1           General   
2           B           1           Resources

-- has foreign keys that references both the sections and subsections table

sqlite> SELECT * FROM html;
_id         chapter_id  subsection_id   number      html_filename             
----------  ----------  ----------  ----------  --------------
1           1           1           1           /1a-1.html
2           1           1           2           /1a-2.html
3           1           1           3           /1a-3.html
4           1           1           4           /1a-4.html
5           1           2           1           /1b-1.html
6           2           2           1           /2a-1.html
7           2           2           2           /2a-2.html
8           2           2           1           /2b-1.html

The _id field is a auto incrementing primary key (this will not match the ordinal every time). The subsections table is dependent on receiving the primary key for its relevant section. Meaning once Section 1 has been inserted, Sections 1a, 1b, 1c, etc can be inserted (but not 2a)

Directory Structure

      //Section 1
/1.title
      //Subsection A
/1a.title
      //html files for 1a
/1a-1.html
/1a-2.html
      //Subsection B
/1b.title
      //html files for 1b
/1b-1.html
/1b-2.html
      //Section 2
/2.title
/2a.title
      //..etc

Each SQL insert can be built with a java builder class, which for /1b-2.html would look like this

db.insert(HTML_TABLE, null, new HTML.Builder()
                .chapterId(section1)
                .letterId(subsectionB)
                .number(2)
                .build());

I will end up having about 50-60 sections, but each SQL insert of a whole section, its subsections, and their HTML files can be inserted in parallel. Does using RxJava make sense for something like this?

ZakTaccardi
  • 12,212
  • 15
  • 59
  • 107

1 Answers1

0

Rx (on any platform) is not well suited for most forms of parallel processing. It deals with streams, which are inherently serialized. It sounds like you are looking for some kind of ETL tool.

James World
  • 29,019
  • 9
  • 86
  • 120