Short answer: This is a convention based solely on the width of the underlying data bus
An n-bit program is a program that is optimized for an n-bit CPU. Said otherwise a 64-bit program is a binary program compiled for a 64 bit CPU. A 64 bit CPU, in turn, is one taking advantage of a 64-bit data bus for the exchange of data between CPU and memory.
That's as simple, but you can read more below.
The definition actually redirects to understanding what is a 32/64 bit CPU, indirectly to what is a 32/64 bit operating system, and how compilers optimize binaries for a given architecture.
Optimization here encompasses the format of the binary itself. 32 bit and 64-bit binaries for a given OS, e.g. a Windows binary, have different formats. However, a given 64 bit OS, e.g. Windows 64, will be able to read and launch a 32-bit binary file written for the 32-bit version and a 32-bit wide data bus.
32/64 bit CPU, first definition
The CPU can store/recall a certain quantity of data in memory in a single instruction. A 32-bit CPU can transfer 4 bytes (32 bits) at once and a 64-bit CPU can transfer 8 bytes (64 bits) at once. So "32/64 bit" prefix comes from the quantity of RAM transferred in a single read/write cycle.
This quantity impacts the execution time: The fewer transfer cycles are required, the less the CPU waits for the memory, the program executes faster. It's like carrying a large quantity of water with a small or a large bucket.
The size of the bucket (the number of bits used for data transfer) is used to indicate how efficient the architecture is, hence for the same CPU, a 32-bit application is less efficient than a 64-bit application.
32/64 bit CPU, technical definition
Obviously, the RAM and the CPU must be both able to manage a 32/64-bit data transfer, which in turn determines the number of wires used to connect the CPU to the RAM (system bus). 32/64 bit is actually the number of wires/tracks composing the data bus (usually named the bus "width").

(Wikipedia: System bus - The data bus width determines the prefix 32/64 bit for a CPU, a program, an OS, ...)
(Another bus is the address bus, which is usually wider, but the address bus width is irrelevant in naming a CPU as 32 or 64 bit CPU. This address bus width determines the total quantity of RAM which can be reached / "addressed" by the CPU, e.g. 2 GB or 32 GB. As for the control bus, it is a small bus used to synchronize everything connected to the data bus, in particular, it indicates when the data bus is stable and ready to be sampled in a data transfer operation).
When bits are transferred between the CPU and the RAM, the voltage on the different copper tracks of the data bus must be stable prior to reading data on the bus, else one or more bit values would be wrong. It takes less time to stabilize 8 bits than 64 bits, so increasing the data bus width is not without problems to solve.
32/64 bit program: A compiler matter
Programs don't always need to transfer 4 bytes (32-bit data bus) or 8 bytes (64-bit data bus), so they use different instructions to read 1 byte, 2 bytes, 4 bytes, and 8 bits, for performance reasons.
Binaries (native assembly language programs) are written either with the 32-bit architecture in mind, or the 64-bit architecture, and the associated instruction set. So the name 32/64 bit program.
The choice of the target architecture is a matter of compiler/compiler options used when converting the source program into a binary. Most compilers are able to produce a 32 bit or a 64 binary from the same source program. That's why you'll find both versions of an application when downloading your preferred program or tool.
However, most programs rely on ready-made libraries written by other programmers (e.g. a video editing program may use FFmpeg library). To produce a fully 64-bit application, the compiler (actually the link editor, but let's keep it simple) needs to access a 64-bit version of any library used, which may not be possible.
This also applies to operating systems themselves, as an OS is just a suite of individual programs and libraries. However, an OS is itself a kind of big library for the user programs, acting as a gateway between the computer hardware and the user programs, for efficiency and security reasons. The way OS is written car prevent the user programs to access the full potential of the underlying CPU architecture.
32-bit program compatibility with 64-bit CPU
A 64-bit operating system is able to run a 32-bit binary on a 64-bit architecture, as the 64 bit CPU instruction set is retro-compatible. However, some adjustments are required.
In addition of the data bus width and read/write instructions subset, there are many other differences between 32 bit and 64 bit CPU (register operations, memory caches, data alignment/boundaries, timing, ...).
Running a 32-bit program on a 64-bit architecture:
- is more efficient than running it on an older 32-bit architecture (almost solely due to CPU clock speed improvement compared to older 32/64 bit CPU generations)
- is less efficient than running the same application compiled into a 64-bit binary to take advantage of the 64-bit architecture, in particular, the ability to transfer 64 bits at once from/to memory.
When compiling a source into a 32-bit binary, the compiler will still use small buckets, instead of the larger available with the 64-bit data bus. This has the largest impact on execution speed, compared to the same application compiled to use large buckets.
For information, the applications compiled into 16 bit Windows binaries (earlier versions of Windows running on 80-286 CPU with a 16-bit data bus) are not fully supported anymore, though there is still a possibility on Windows 10 to activate NTVDM.
The case of .NET, Java and other interpreted "byte-code"
While until recent years, compilers were used to translate a source program (e.g. a C++ source) into a machine language program, this method is now in regression.
The main problem is that machine language for some CPU is not the same than for another (think about differences between a smartphone using an ARM chip and a server using an Intel chip). You definitely can't use the same binary on both hardware, they are not talking the same language, and even if this were possible it would be inefficient on both machines due to the huge differences in how they work.
The current idea is to use an intermediate representation (IR) of the instructions, derived from the source. Java (Sun, sadly now Oracle) and IL (Microsoft) are such intermediate representations. The same IR file can be used on any OS supporting the IR.
Once the OS opens the file, it performs the final compilation into the "local" machine language understood by the actual CPU and taking into account the final architecture on which to run the program. For example, for Microsoft .NET, the universal version is executed by a CoreCLR virtual machine located on the final computer. There is usually no notion of data bus width in such intermediate languages, hence less and less application will have this n-bit prefix.
However we cannot forget the actual architecture, so there will be still 32 and 64 bit versions produced for the CoreCLR to optimize the final code, even if the application itself, at the IR level, is not optimized for a given architecture (only one IR version to download and install).