1

Details : Oracle Database 12c, Database version : 12.1.0.2.0

Problem Description and requirement :

Part - 1 :

  1. I have JSON data stored in NCLOB column.
  2. I have requirement to search the text from JSON data.
  3. Considering performance, I have option to use data type BLOB as data is JSON and In BLOB column.. I can use JSON index Full-Text Search.

Significant performance improvement noted when searched using JSON text search index on BLOB column comparitive to simple query (LIKE) on NCLOB, Hence using BLOB column.

About JSON search Index :

A JSON search index is a type of full text index specifically for JSON data. The optimiser will only consider using the index if the database uses a character set of AL32UTF8 or WE8ISO8859P1, and only for JSON data in VARCHAR2, BLOB, or CLOB columns, not NVARCHAR2 and NCLOB columns.

Part - 2 :

We have two Database types i.e. Single byte and Multi-byte deployment

Character sets for single byte :

NLS_NCHAR_CHARACTERSET--->AL16UTF16, NLS_CHARACTERSET--------->WE8MSWIN1252

Character sets for Multi-byte :

NLS_NCHAR_CHARACTERSET--->AL16UTF16, NLS_CHARACTERSET--------->AL32UTF8

Part - 3 :

In multibyte deployment, I am able to store and search (Using JSON_TEXTCONTAINS) multibyte text like 'məharaːʂʈrə'. Reason : Because Database uses a character set of AL32UTF8 (BLOB Column) Note : JSON_TEXTCONTAINS works only with Index.

In Single byte deployment, I am unable to store same text i.e. 'məharaːʂʈrə' (Since it is multibyte, conversion results in data loss like 'm?hara???r?')

Question : How to store multibyte data i.e. 'məharaːʂʈrə' in BLOB column of single byte deployment having character set : WE8MSWIN1252.

As stated by oracle : The optimiser will only consider using the index if the database uses a character set of AL32UTF8 or WE8ISO8859P1 Please note, I am able to use JSON search Index i.e. JSON_TEXTCONTAINS in single byte environment as well (For simple search i.e. single byte).

Once I am able to store the data as it is, I will be able to search using JSON_TEXTCONTAINS.

Code for sample (Try it in both single byte and Multi-byte deployment: Please note : https://livesql.oracle.com/ is multibyte deployment.

Create table :

create table departments_json (
  department_id   integer not null primary key,
  department_data blob not null
);

Data should be strict Json (so we can traverse with dot notation):

alter table departments_json
  add constraint dept_data_json 
  check ( department_data is JSON FORMAT JSON STRICT );

Create Index :

create index deptj_ctx_ix 
  on departments_json (department_data)
    indextype is ctxsys.context     parameters ('section group  CTXSYS.JSON_SECTION_GROUP  sync (on commit)');

Insert simple json

insert into departments_json 
  values ( 140, utl_raw.cast_to_raw ('{"department_list":[{"Deptname":"DEPT-A", "value" : "məharaːʂʈrə"}]}'));

if any error(In single byte), convert it :

insert into departments_json 
  values ( 140, UTL_RAW.convert(utl_raw.cast_to_raw ('{"department_list":[{"Deptname":"DEPT-A", "value" : "məharaːʂʈrə"}]}'),'AL32UTF8','WE8MSWIN1252'));

Value stored :

SELECT json_value(department_data format json, '$.department_list.value'  ) FROM departments_json JS WHERE DEPARTMENT_ID=140;

Observation : It will result in data loss in single byte environment.

Now search :

SELECT *
FROM   departments_json
WHERE JSON_TEXTCONTAINS(department_data,  '$.department_list.value', 'məharaːʂʈrə');

Observation : multibyte will result row but single byte will result no rows.

Able to use JSON search Index i.e. JSON_TEXTCONTAINS in single byte environment as well (For simple search i.e. single byte) -- below query results a row:

SELECT *
FROM   departments_json
WHERE JSON_TEXTCONTAINS(department_data,  '$.department_list.Deptname', 'DEPT-A');

1 Answers1

0

You can save data by .NET (C#):

System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);
var value = System.Text.Encoding.GetEncoding("windows-1252").GetString(System.Text.Encoding.UTF8.GetBytes("məharaːʂʈrə"));
string h = "{\"department_list\":[{\"Deptname\":\"DEPT-A\", \"value\" : \"" + value + "\"}]}";
var byteArraySource = System.Text.Encoding.Unicode.GetBytes(h);
byte[] byteArray = System.Text.Encoding.Convert(System.Text.Encoding.Unicode, 
System.Text.Encoding.UTF8, byteArraySource);
using (var orclCon = new Oracle.ManagedDataAccess.Client.OracleConnection(connectionString))
{
    orclCon.Open();
    var cmd = orclCon.CreateCommand();
    cmd.CommandType = System.Data.CommandType.Text;
    cmd.CommandText = "insert into departments_jsonvalues (140, :paramValue)";
    cmd.Parameters.Add(new Oracle.ManagedDataAccess.Client.OracleParameter("paramValue", Oracle.ManagedDataAccess.Client.OracleDbType.Blob, byteArray, System.Data.ParameterDirection.Input));
    cmd.ExecuteNonQuery();
}

For search result in .NET:

string h ="məharaːʂʈrə";    
var read_command = orclCon.CreateCommand();
read_command.CommandType = System.Data.CommandType.Text;
read_command.Parameters.Add(new Oracle.ManagedDataAccess.Client.OracleParameter("searchText", System.Text.Encoding.GetEncoding("windows-1252").GetString(System.Text.Encoding.UTF8.GetBytes(h))));
read_command.CommandText = "select * from departments_json JSON_TEXTCONTAINS(department_data,  '$.department_list.value', :searchText)";
var dr = read_command.ExecuteReader();
using (dr)
{
  if (dr.HasRows)
  {
    while (dr.Read())
    {
      Oracle.ManagedDataAccess.Types.OracleBlob BLOB = dr.GetOracleBlob(1);
      var sr = new System.IO.StreamReader(BLOB);
      var content = sr.ReadToEnd();
    }
  }
}

Insert in SQL:

insert into departments_json values (140, utl_raw.cast_to_raw ('{"department_list":[{"Deptname":"DEPT-A", "value" :'|| UTL_RAW.convert("məharaːʂʈrə",'UTF8','WE8MSWIN1252')||'}]}'));

Search result in SQL:

SELECT * FROM   departments_json WHERE JSON_TEXTCONTAINS(department_data, '$.department_list.value', CONVERT('məharaːʂʈrə', 'UTF8', 'WE8MSWIN1252'));