For Linux on x86 platforms, the assembly equivalent of asking for memory is basically a call into the kernel using int 0x80
with some parameters for the call set into some registers. The interrupt is set at boot by the OS to be able to answer for the request. It is set in the IDT.
An IDT descriptor for 32 bits systems looks like:
struct IDTDescr {
uint16_t offset_1; // offset bits 0..15
uint16_t selector; // a code segment selector in GDT or LDT
uint8_t zero; // unused, set to 0
uint8_t type_attr; // type and attributes, see below
uint16_t offset_2; // offset bits 16..31
};
The offset is the address of the entry point of the handler for that interrupt. So interrupt 0x80 has an entry in the IDT. This entry points to an address for the handler(also called ISR). When you call malloc(), the compiler will compile this code to a system call. The system call returns in some register the address of the allocated memory. I'm pretty sure as well that this system call will actually use the sysenter x86 instruction to switch into kernel mode. This instruction is used alongside an MSR register to securely jump into kernel mode from user mode at the address specified in the MSR (Model Specific Register).
Once in kernel mode, all instructions can be executed and access to all hardware is unlocked. To provide with the request the OS doesn't "ask RAM for memory". RAM isn't aware of what memory the OS uses. RAM just blindly answers to asserted pins on it's DIMM and stores information. The OS just checks at boot using the ACPI tables that were built by the BIOS to determine how much RAM there is and what are the different devices that are connected to the computer to avoid writing to some MMIO (Memory Mapped IO). Once the OS knows how much RAM is available (and what parts are usable), it will use algorithms to determine what parts of available RAM every process should get.
When you compile C code, the compiler (and linker) will determine the address of everything right at compilation time. When you launch that executable the OS is aware of all memory the process will use. So it will set up the page tables for that process accordingly. When you ask for memory dynamically using malloc(), the OS determines what part of physical memory your process should get and changes (during runtime) the page tables accordingly.
As to paging itself, you can always read some articles. A short version is the 32 bits paging. In 32 bits paging you have a CR3 register for each CPU core. This register contains the physical address of the bottom of the Page Global Directory. The PGD contains the physical addresses of the bottom of several Page Tables which themselves contain the physical addresses of the bottom of several physical pages (https://wiki.osdev.org/Paging). A virtual address is split into 3 parts. The 12 bits to the right (LSB) are the offset in the physical page. The 10 bits in the middle are the offset in the page table and the 10 MSB are the offset in the PGD.
So when you write
char* p = (char*)malloc(4096);
p[0] = 'a';
p[1] = 'b';
you create a pointer of type char* and making a system call to ask for 4096 bytes of memory. The OS puts the first address of that chunk of memory into a certain conventional register (which depends on the system and OS). You should not forget that the C language is just a convention. It is up to the operating system to implement that convention by writing a compatible compiler. It means that the compiler knows what register and what interrupt number to use (for the system call) because it was specifically written for that OS. The compiler will thus take the address stored into this certain register and store it into this pointer of type char* during runtime. On the second line you are telling the compiler that you want to take the char at the first address and make it an 'a'. On the third line you make the second char a 'b'. In the end, you could write an equivalent:
char* p = (char*)malloc(4096);
*p = 'a';
*(p + 1) = 'b';
The p is a variable containing an address. The + operation on a pointer increments this address by the size of what is stored in that pointer. In this case, the pointer points to a char so the + operation increments the pointer by one char (one byte). If it was pointing to an int then it would be incremented of 4 bytes (32 bits). The size of the actual pointer depends on the system. If you have a 32 bits system then the pointer is 32 bits wide (because it contains an address). On a 64 bits system the pointer is 64 bits wide. A static memory equivalent of what you did is
char p[4096];
p[0] = 'a';
p[1] = 'b';
Now the compiler will know at compile time what memory this table will get. It is static memory. Even then, p represents a pointer to the first char of that array. It means you could write
char p[4096];
*p = 'a';
*(p + 1) = 'b';
It would have the same result.