2
IF OBJECT_ID('TEMPDB.DBO.#language_check', 'U') IS NOT NULL
    DROP TABLE #language_check;

CREATE TABLE #language_check (
    id INT identity(1, 1) NOT NULL,
    language_Used VARCHAR(50),
    uni_code NVARCHAR(2000)
    )

INSERT INTO #language_check (
    language_Used,
    uni_code
    )
VALUES (
    'English',
    N'Hello world! this is a text input for the table'
    ),
    (
    'German',
    N'Gemäß den geltenden Datenschutzgesetzen bitten wir Sie, sich kurz Zeit zu nehmen und die wichtigsten Punkte der Datenschutzerklärung von Google durchzulesen. Dabei geht es nicht um von uns vorgenommene Änderungen, sondern darum, dass Sie sich mit den wichtigsten Punkten vertraut machen sollten. Sie können Ihre Einwilligung jederzeit mit Wirkung für die Zukunft widerrufen.'
    ),
    (
    'Spanish',
    N'Bajo Rs 30.000, tanto los teléfonos tienen sus propios méritos y deméritos. El OnePlus 3 puntuaciones claramente muy por delante del Xiaomi MI 5 y también mucho más que muchos teléfonos inteligentes insignia en el mercado.'
    ),
    (
    'Czech',
    N'To je příklad.'
    ),
    (
    'Arabic',
    N'هذا مثال على ذلك.'
    )

SELECT *
FROM #language_check

This will results as follows.

enter image description here

then added a varchar column to the above table and the result as follows.

enter image description here

when i insert check the result, both uni_code and nonunicode columns have same values , but for the arabic language nonunicode column was "?????".

  1. can any one elaborate the reason behind the same value for uni_code and nonunicode columns.
  2. why for arabic language both columns were not same.

thanks in advance

Smart003
  • 1,119
  • 2
  • 16
  • 31
  • The more correct question would be **why there are the same for the others?** – FDavidov Aug 02 '16 at 09:40
  • i think when you will retrieve the data.. The values will be in correct format. Give it a try. – Abhishek Aug 02 '16 at 09:44
  • 2
    Not sure exactly *what* you're asking - point is: the European languages (German, English, Spanish, Czech) will be supported by `varchar` as well - and in this case, the results are the same in both columns (**as expected**) - but for supporting two-byte languages like Arabic, Hebrew, Cyrillic, Far-Eastern languages, you **must** use `nvarchar` to get proper results. This is the standard, default, **well documented** behavior - so what **exactly** are you asking us here!?!?!?!?! – marc_s Aug 02 '16 at 09:51
  • @marc_s, i was asking the reason behind for some languages varchar and nvarchar values will be same – Smart003 Aug 02 '16 at 10:15
  • 2
    Languages that "fit" into 1 byte (like the Western and Central European langauges) will be the same in `varchar` (1 byte per character) and `nvarchar` (2 byte per character) columns; languages that *require* 2 bytes per character (Arabic, Hebrew, Cyrillic, Japanese, Korean, Chinese and other Far-Eastern languages) **cannot** be handled by `varchar` – marc_s Aug 02 '16 at 10:17
  • @Abhishek I just used a 'select * from #language_check'. hence i was unable to get the required one – Smart003 Aug 02 '16 at 11:34
  • @FDavidov thanks for your suggestion – Smart003 Aug 02 '16 at 11:35

2 Answers2

1

VARCHAR and CHAR data types allow characters from the ASCII and EXTENDED ASCII sets. Arabic is not a part of that set. By comparison, NVARCHAR and NCHAR support nearly all characters. This SO post explains the ASCII types well enough.

Which "special" characters are allowed in SQL Server varchar fields?

Community
  • 1
  • 1
Vinnie
  • 3,889
  • 1
  • 26
  • 29
1

Varchar and char datatypes allows only a certain set of characters including characters from german , english , spanish and czech and many others. Out of all these languages except a few characters the entire character list remains same. To insert languages like arabic, use datatype as Nvarchar. You may also force a column of your table to use a specific collation to store and retrieve arabic but u may lose the ability to retrieve characters not supported by that specific collation. While creating your table , use collate as shown below

CREATE TABLE #language_check 
    (
    id            INT identity(1, 1) NOT NULL,
    language_Used VARCHAR(50),
    uni_code      NVARCHAR(2000) ,
    nonuni_code   VARCHAR (2000) COLLATE Arabic_CI_AS
    )

This will allow arabic strings into nonuni_code varchar column and retrieves the exact data. But this will not support few characters like (Ää Ëë Ḧḧ Ï ï N̈n̈ Öö T̈ẗ Üü Ẅẅ Ẍẍ Ÿÿ). When you query #language_check your result will be as shown in the image.

enter image description here

It not only depends on datatype but also the collation. in the above metion table we had explicitly metioned the collation name for Arabic_CI_AS, hence arabic language and english language will be accepeted with some special charachters, hence in the results we get both unicode and nonunicode columns were same for arabic and english language only and rest were not equal

Community
  • 1
  • 1
TharunRaja
  • 707
  • 7
  • 28