First let's sum up what the fundamental problem of threading really is-- two threads try to access the same piece of memory at the same time. You can imagine that when this happens we can't guarantee that a piece of memory is in a valid state, and our program might be incorrect.
Trying to keep this very high level, part of the way processors work is by throwing interrupts which basically tell a thread to stop what is doing and do something else. This is where much of the problem of threading lies. A thread can be interrupted in the middle of task. Imagine one thread is interrupted in the middle of an operation and some intermediate garbage value exists because the thread hasn't finished its task. Another thread could come along and read this value and destroy the correctness of your program.
The OS achieves this with Atomic instructions. Without getting into the details, image that there were some instructions that were guaranteed to either be completed or not completed. This means that if a thread checks the result of an instruction it won't see an intermediate results. So an atomic add method would either show the value before the add or after the add, but not during the add when their might be some intermediate state.
Now if you have a few atomic instructions you might be able to imagine that you could build higher level abstractions that deal with threads and thread safety on the back of these. Maybe the most basic example in a lock created with the test and set primitive. Take a look at this wikipedia article https://en.wikipedia.org/wiki/Test-and-set. Now that was probably a lot because these things get pretty complex. But I will attempt to given an example that clarifies. If you have two processes running that are trying to access some section of code, a very naive solution would be to create a lock variable
boolean isLocked = false;
Anytime a process tried to acquire this lock you could merely check isLocked==false and wait until isLocked ==true before executing some code. For example...
while(isLocked){
//wait for isLocked == false
}
isLocked = true;
// execute the code you want to be locked
isLocked = false;
Of course, we know that something as simple as setting or reading a boolean can be interrupted and cause threading mayhem. So, the good folks that developed kernels and processors and hardware created an atomic test and set operation which returns the old value of a boolean and sets the new value to true. So of course you can implement our lock above by doing something like.
while(testAndSet(isLocked)){ //wait until the old value returned is
false so the lock is unlocked } //do some critical stuff
//unlock after you did the critical stuff lock = false;
I only show the implementation of a basic lock above to prove the point that it is possible to build higher level abstractions on atomic instructions. Atomic instruction are about as low level as you can get conceptually, in my opinion, without delving into hardware specifics. You can imagine though that within hardware, the hardware must somehow set a flag of some sort when memory is being read that precludes another thread from accessing the same memory.
Hope that helps!