From my point of view, you have three different alternatives. Anyway, to be honest, I'd personally choose the third one.
1 - One [consumer-producer] thread
In this scenario, you just have one thread that is responsible of:
1-Reading from Kafka
2-Process/Store in I
3-Process/Store in II
4-Process/Store in III
5-Process/Store in IV
All that, in sequential order, as you just have one thread that both consumes and process the messages.
kafka-->(read)-->(process 1)-->(process 2)-->(process 3)-->process(4)
In this case, if any of the 2 to 5 steps gets "damaged" and the speed of processing decreases at some point, your entire process will slow down. And with that, the kafka topic's lag, which will increase as far as the thread doesn't finish the 5th step earlier than new message arrives at Kafka.
For me, this is a no-no regarding performance and fault-tolerance
2 - Four [consumer-producer]s
This uses the same paradigm as the first scenario: the thread that reads also is responsible of the processing.
But, thanks to consumer-groups, you can paralellize the whole process. Create 4 different groups and assign each one to a consumer.
For simplicity, let's just create one thread per consuemr group.
In this sceenario, you have something like:
CONSUMER CG1
1-Reading from Kafka
2-Process/Store in I
CONSUMER CG2
1-Reading from Kafka
2-Process/Store in II
CONSUMER CG3
1-Reading from Kafka
2-Process/Store in III
CONSUMER CG4
1-Reading from Kafka
2-Process/Store in IV
|-->consumer 1-->(process1)-->T1
kafka|-->consumer 2-->(process2)-->T2
|-->consumer 3-->(process2)-->T3
|-->consumer 4-->(process4)-->T4
Advantages: each thread is responsible of a limited number of tasks. This will help with the lag of each consumer group.
Furthermore, if some of the storing tasks fail or decreases its performance, that won't affect the other three threads: They will continue reading and processing from kafka by their own.
3. Decouple consuming and processing
This is by far, in my oppinion, the best possible solution.
You divide the tasks of reading and the tasks of processing. This way, you can for example launch:
One consumer thread
This just reads the messages from kafka and stores it in an in-memory queues, or similiar structures that are accesible from the worker threads, and that's all. Just continue reading and putting the message in queues.
X worker threads (in this case, 4)
This threads are responsible of getting the messages that the consumer put in the queues (or queues, depending on how you want to code it), and processing/storing the messages in each table.
Something like:
|--> queue1 -----> worker 1 --> T1
kafka--->consumer--(msg)--|--> queue2 -----> worker 2 --> T2
|--> queue3 -----> worker 3 --> T3
|--> queue4 -----> worker 4 --> T4
What you get here is: paralellization, decoupling of processing and consuming. Here kafka's lag will , at 99% of the time, 0.
In this approach, the queues are the ones that act like buffers if some of the workers get stuck. The other whole system (mainly Kafka) will not be affected by the processing logic.
Note that even Kafka won't start lagging and possibly lossing messages due to retention, the internal queues must be monitorized, or configured properly to send the lagged messages inside the queue to a dead-letter queue, in order to avoid the consumer get stuck.
This is from the KafkaConsumer
javadoc, which better explains the pros and contras of each paradigm:


A simple diagram showing the advantages of the third scenario:

Consumer thread just consumes. This avoids kafka lagging, delays in the data that must be processed (remember, this should be near real-time) and loss of messages because of retention kicking in.
The other x workers are responsible of the actual processing logic. If something fails in one of them, no other consumer or worker thread gets affected.