You cannot avoid the latency between threads when using an SC_CTHREAD. When writing to an sc_signal from one CTHREAD, the value change will only be visible to another CTHREAD at the next clock edge.
If you must use a CTHREAD (i.e. using high-level synthesis), then the only way to avoid the cross-thread latency is to place both functionalities within a single CTHREAD.
If you only need a behavioral model for simulation, then you could use SC_THREADs and sc_events. One thread can generate an sc_event that is being waited on by the second thread. When the second thread wakes on that event, it can observe sc_signal changes done by the first thread, and then produce an output (aligned with the clock edge if desired). Using sc_events gives the opportunity to sample and update signals "between" clock edges.