Details : Oracle Database 12c, Database version : 12.1.0.2.0
Problem Description and requirement :
Part - 1 :
- I have JSON data stored in NCLOB column.
- I have requirement to search the text from JSON data.
- Considering performance, I have option to use data type BLOB as data is JSON and In BLOB column.. I can use JSON index Full-Text Search.
Significant performance improvement noted when searched using JSON text search index on BLOB column comparitive to simple query (LIKE) on NCLOB, Hence using BLOB column.
About JSON search Index :
A JSON search index is a type of full text index specifically for JSON data. The optimiser will only consider using the index if the database uses a character set of AL32UTF8 or WE8ISO8859P1, and only for JSON data in VARCHAR2, BLOB, or CLOB columns, not NVARCHAR2 and NCLOB columns.
Part - 2 :
We have two Database types i.e. Single byte and Multi-byte deployment
Character sets for single byte :
NLS_NCHAR_CHARACTERSET--->AL16UTF16, NLS_CHARACTERSET--------->WE8MSWIN1252
Character sets for Multi-byte :
NLS_NCHAR_CHARACTERSET--->AL16UTF16, NLS_CHARACTERSET--------->AL32UTF8
Part - 3 :
In multibyte deployment, I am able to store and search (Using JSON_TEXTCONTAINS) multibyte text like 'məharaːʂʈrə'. Reason : Because Database uses a character set of AL32UTF8 (BLOB Column) Note : JSON_TEXTCONTAINS works only with Index.
In Single byte deployment, I am unable to store same text i.e. 'məharaːʂʈrə' (Since it is multibyte, conversion results in data loss like 'm?hara???r?')
Question : How to store multibyte data i.e. 'məharaːʂʈrə' in BLOB column of single byte deployment having character set : WE8MSWIN1252.
As stated by oracle : The optimiser will only consider using the index if the database uses a character set of AL32UTF8 or WE8ISO8859P1 Please note, I am able to use JSON search Index i.e. JSON_TEXTCONTAINS in single byte environment as well (For simple search i.e. single byte).
Once I am able to store the data as it is, I will be able to search using JSON_TEXTCONTAINS.
Code for sample (Try it in both single byte and Multi-byte deployment: Please note : https://livesql.oracle.com/ is multibyte deployment.
Create table :
create table departments_json (
department_id integer not null primary key,
department_data blob not null
);
Data should be strict Json (so we can traverse with dot notation):
alter table departments_json
add constraint dept_data_json
check ( department_data is JSON FORMAT JSON STRICT );
Create Index :
create index deptj_ctx_ix
on departments_json (department_data)
indextype is ctxsys.context parameters ('section group CTXSYS.JSON_SECTION_GROUP sync (on commit)');
Insert simple json
insert into departments_json
values ( 140, utl_raw.cast_to_raw ('{"department_list":[{"Deptname":"DEPT-A", "value" : "məharaːʂʈrə"}]}'));
if any error(In single byte), convert it :
insert into departments_json
values ( 140, UTL_RAW.convert(utl_raw.cast_to_raw ('{"department_list":[{"Deptname":"DEPT-A", "value" : "məharaːʂʈrə"}]}'),'AL32UTF8','WE8MSWIN1252'));
Value stored :
SELECT json_value(department_data format json, '$.department_list.value' ) FROM departments_json JS WHERE DEPARTMENT_ID=140;
Observation : It will result in data loss in single byte environment.
Now search :
SELECT *
FROM departments_json
WHERE JSON_TEXTCONTAINS(department_data, '$.department_list.value', 'məharaːʂʈrə');
Observation : multibyte will result row but single byte will result no rows.
Able to use JSON search Index i.e. JSON_TEXTCONTAINS in single byte environment as well (For simple search i.e. single byte) -- below query results a row:
SELECT *
FROM departments_json
WHERE JSON_TEXTCONTAINS(department_data, '$.department_list.Deptname', 'DEPT-A');