If by 'size' you mean how big the XML document would be if it were serialized as text ('to disk'),
string-length(xdmp:quote( doc('file.xml') ))
Will give you the number of characters using the default encoding and serialization options.
That will vary from 1:1 (characters to bytes) to 1:3, if using UTF8, depending on the distribution of Unicode characters and the difference between the serializaiton options specified to xdmp:quote() and the analogous formatting before ingestion (or after exporting).
For Latin languages and default settings it is usually close to 1:1 --
To get more accurate you need to specific the exact serialization and encoding options and either save the document to file system or convert to binary and take the binary length. Even then it will be file system and OS dependent (block size, text encoding etc).
If by 'size' you mean how much disk / memory the document 'uses' inside marklogic that can determined statistically by taking a snapshot of the disk space used in all data directories, then inserting a large number of documents and taking another snapshot then dividing by the # of documents.
It will vary, possibly greatly, depending on many factors such as indexing settings, similarity between documents, merge rates and limits etc.
Documents are stored in a highly compressed form, typically much smaller then the text size, but indexing options add to the total size ... Both depend greatly how much similarity of terms/tokens/substrings different documents share.
If by size you mean how much memory a document will take when accessed, that is even more variable and less easily measurable.
It can range from 0x (queries entirely resolved by index) to 10x or more for highly structured documents with little or no text content.