First off, I recommend not to think about asympotic bounds as a part of computer science; it's a mathematical tool. Do not associate "Big O" strictly with "worst case" or such. Big O gives an asymptotic upper bound. That's it. It happens to be useful in computer science to describe the worst case running time of an algorithm, but it's math, not computer science that describes how it works.
But that's just my opinion, mind you.
Take this definition for Big O notation. (This is the formal definition I've learnt first.)
From Introduction to Algorithms 3rd Edition (I'm still learning about algorithms myself through that book), page 47:
For a given function g(n), we denote by O(g(n)) [...] the set of
functions
O(g(n)) = { f(n) : there exist positive constants c and n0
such that 0 ≤ f(n) ≤ cg(n) for all n ≥ n0 }.
Observe Big O notation denotes a set and as such, it can be part of a superset and it can be a superset to other subsets. Writing "n = O(n)" is just a formally incorrect way of saying that the function f(n) = n is a member of the set O(n) (n ∈ O(n)).
Abusing Big O notation (i.e., placing it in equations) like this enables us to use it in equations and to write stuff like T(n) = 1/2n + O(n), which basically means that T(n) equals 1/2n plus some function for which the definition of Big O notation given above holds true.
So, you wonder why something like n2 = O(n3)? Let's show through our formal definition:
g(n) = n3
f(n) = n2
0 ≤ n2 ≤ cn3 for all n ≥ n0
Intuitively, we can easily see that there must be some c and n0 (actually, c ≥ 1 and n0 = 0) for this inequality to be true because a cubic function grows asymptotically faster than a quadratic one.
Do the same for g(n) = n2 and you'll see that n2 = O(n2) as well (for c ≥ 1 and n0 = 0).
So, n2 = O(n3) and n2 = O(n2)? This is possible because O(n3) and O(n2) are not disjoint; O(n2) is a subset of O(n3) because every quadratic function can be bounded from above by a cubic function.
As you see, asympotic bounds need not be tight; n = O(n65536) is true. However, tight bounds are obviously preferred.