First Question: Scalability:
In the context of distributed systems what exactly can be refered to as scalability? I'm guessing it is the ability of the system to be used by as many distributed devices as one wants without having to change the code. Is this notion I have correct?
Almost, but I would define it a little bit different. A system is scalable if you can increase its performance and ability by adding more resources. That means that if a system is scalable, and suddenly your compute demands rises, you are able to scale out and meet those demands by adding more resources (typically more machines). Notice that being scalable and being performant are two very different things. A system can be very performant (handling its current load very quickly) without being scalable.
The most canonical case of a scalable system is a system where the compute-load is split among the available machines, so when you add more machines you can deal with a proportional increase in the load as well, this is often referred to as "linear scalability", which is usually the most desirable type of scalability but often that is very hard to achieve as when you add more resources (e.g machines) you often have to pay a price of overhead for communication between the machines, which means that the scaling becomes sub-linear. If you are interested in more details about this I would suggest that you read about Amdahls Law (paper1, paper2).
Second Squestion: Mutual Exclusion
I was told there are differences when we want to assure mutual exclusion between processes executing on a single machine or on a distributed system, however I don't see how, can anyone explain and say what are the differences?
I would say that the end-goal and the problem itself of distributed mutual exclusion and single-machine mutual exclusion are the same, but the means of achieving it is very different. One of the main factors defining a distributed system is that you do not have any shared memory between your processes (as they may reside on different machines), that you have in a single-machine system (although you can simulate a shared memory by implementing the distributed shared memory abstraction). In a single-machine mutual exclusion setting you typically implement it by using shared memory and mutexes, whereas in a distributed setting you have to use message passing and deal with things such as delays, partial failures, failure detection etc, that makes things a lot harder.