Uploading emojis as NVARCHAR to MSSQL eats up characters at end of data field

Question

SCENARIO:

I'm using Pandas.to_sql() with parameter dtype={'COLUMN': NVARCHAR} to upload COLUMN containing text with emojis to a MSSQL DB via FreeTDS. NVARCHAR is imported from sqlalchemy.types import NVARCHAR. The COLUMN is fed as a DataFrame from an Excel file.

PROBLEM:

The strange thing is that for each emoji I put, a character at the end of the NVARCHAR column disappears.

I know that NVARCHAR has a max length of 4000, but how could it be reaching it with a text so short as:

" DUMMY TEXT The following four letters will be cut: abcd"

After upload:

" DUMMY TEXT The following four letters will be cut: "

I noticed that there is also some extra spacing between emojis after upload.

Is this problem caused by the emojis or should we be using another dtype?

Thanks,

Doyuno

PS: The length of DUMMY TEXT doesn't seem to affect how many characters are truncated at the end of the sentence. I've tried with varying length of DUMMY TEXT and it always truncates as many letters as emojis there is.

NVARCHAR does not have a max of 4000 unless the column in the db is explicitly set to that length. If your column derived from Excel it will default to 50 unless you change it on import. According to the documentation, creating NVARCHAR without specifying length defaults to 30: https://learn.microsoft.com/en-us/sql/t-sql/data-types/nchar-and-nvarchar-transact-sql not sure if pandas has it's own default but the tl;dr version here is: check the column size. — Jacob H, Feb 06 '18 at 15:29
@JacobH On DBeaver client it says "Length: 1,073,741,823". I've also tried explicitly `NVARCHAR(length=4000)` in the `dtype` paramenter. Regarding the Excel, the output is fine when reading it from the Excel, it's only after uploading to the DB that it truncates the last few letters. — Doyuno Dawae, Feb 06 '18 at 15:48
Well logically, the unicode characters are double (you can do a quick test copy into notepad and they will show as 2 columns each). So it "makes sense" in a way, that the additional characters would be truncated. How to prevent that though? Not 100% sure. You might be able to do something ugly like append extra spaces to the end. I'm not familiar enough with pandas though to offer any more insight. — Jacob H, Feb 06 '18 at 15:56
@JacobH, it worked! It's not a clean fix, but I can trim the trailing space afterwards. How can I give you some points for helping me? — Doyuno Dawae, Feb 06 '18 at 16:30

Uploading emojis as NVARCHAR to MSSQL eats up characters at end of data field

0 Answers0