Regex/Method to comment a Japanese text

Question

I have a large text file of the below format.

{
    "glossary": {
        "title": "example glossary",
        cm="私は今プログラミングーをしています"; 
        "text2": "example glossary",
        cm="私はABあああをしています"
}

I need to comment out the line which includes Japanese characters. There are 4 or multiple tabs at the start of this line. Tab count varies on each line. I need to change the above file as below:

{
    "glossary": {
        "title": "example glossary",
        */cm="私は今プログラミングーをしています";*/
        "text2": "example glossary",
        */cm="私はABあああをしています";*/
}

Environment:

★ I can run a batch file.

★ I can run a VB script.

★ I can use the Sakura Editor. (preferred)

★ I cannot use/download 3rd party software.

Things I have tried.

■ Using regex ➞ I tried to replace the Japanese text with "" using regex \p{Hiragana} and then \p{Katakana} after that \p{Han} but these still remained the symbols.

■ Using VBA I have tried to read each line of text file using vba and replace the matching line with "*/" I don't know why but it replaced the whole file. The code I used is as below:

Set objFSO = CreateObject("Scripting.FileSystemObject")
If objFSO.FileExists("C:\Users\s162138\Desktop\test.txt") then
Set objFile = objFSO.OpenTextFile("C:\Users\s162138\Desktop\test.txt", 1)

Do Until objFile.AtEndOfStream
strLine = objFile.Readline
If strNextLine = "cm=*" then
strLine = "text"+ strLine + "text"
End If

strNewText = strLine + vbcrlf
Loop
Set objFile = Nothing

Set objFile = objFSO.OpenTextFile("C:\Users\s162138\Desktop\test.txt", 2)
objFile.Write strNewText
Set objFile = Nothing
End If

I would be grateful if anyone could help me out..

Have you read through [ask], and each of its sub-links? This is a programming help site, if you want our help, we need you to post the code you'd like us to help you with. If you have no code, you're on the wrong site! Also we assist with a single issue per question, using [[tag:batch-file]] and [[tag:vbscript]] tags, suggests you're looking for an answer using either scripting language. Choose one only, write some code, test it, and if it fails to work as intended edit your question to include a [mcve] of it and any supporting information we can use to recreate your reported issue. — Compo, Aug 18 '20 at 11:43
Maybe this is a stupid question, but why are you trying to replace the Japanese characters in the first place? — user692942, Aug 18 '20 at 12:09
@DipakPoudel Did you mean something like this [Demo](https://regex101.com/r/BVjrec/1/) — Hackoo, Aug 18 '20 at 12:34
@Hackoo That is what I am trying to do but, the text inside "" varies each line. — Dipak Poudel, Aug 18 '20 at 12:39
@DipakPoudel Check the updated Regex [Demo](https://regex101.com/r/BVjrec/3) — Hackoo, Aug 18 '20 at 12:50
@Hackoo Thank you very much for updating the regex, regex in the demo website works as I needed. but unfortunately when I tried to do the same on sakura editor using this regex i got the error "Too short multibyte code string". — Dipak Poudel, Aug 18 '20 at 13:17
@Hackoo Thanks a lot, It worked, I just had to replace (cm=\x22\S+\x22;) to (cm=\"\S+\";) — Dipak Poudel, Aug 18 '20 at 13:23
@DipakPoudel You still haven't explained why you need this, what is the reason for removing the characters? Is it an encoding thing? Because if it is, you should fix the root cause rather than just placing a band-aid on it. — user692942, Aug 18 '20 at 13:23
@Lankymart Well, I am upgrading JP1 Job Management System V 8 to V10. v8 was multilingual in which I do not have to worry about character, Even though I install v8 with language EN I was sitll able to use Japanese character. but from v10 which is not possilbe. I have downloaded master file (which includes Japanese text), untill I remove or comment out Japanese text I have to create all master data by manual. So I was thinking if I coulld upload the master file without Japnese char, It will take less time just to add Japnese text later. which is why I need this solution. — Dipak Poudel, Aug 18 '20 at 13:28
@DipakPoudel thanks for the explanation, voted to reopen. It might have helped to include that information in the initial question using [edit]. — user692942, Aug 18 '20 at 13:40

score 0 · Answer 1 · answered Aug 18 '20 at 20:16

Use the Japanese regex provided at https://gist.github.com/ryanmcgrath/982242 like this:

^([ \t]*)(.*?(?:[\u3000-\u303F]|[\u3040-\u309F]|[\u30A0-\u30FF]|[\uFF00-\uFFEF]|[\u4E00-\u9FAF]|[\u2605-\u2606]|[\u2190-\u2195]|\u203B).*?)([ \t]*)$

Replace with $1/*$2*/$3. See proof.

EXPLANATION

                         EXPLANATION
--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [ \t]*                   any character of: ' ', '\t' (tab) (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (                        group and capture to \2:
--------------------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
--------------------------------------------------------------------------------
    (?:                      group, but do not capture:
--------------------------------------------------------------------------------
      [\u3000-\u303F]          punctuation
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      [\u3040-\u309F]          hiragana
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      [\u30A0-\u30FF]          katakana
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      [\uFF00-\uFFEF]          Full-width roman + half-width katakana
                               
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      [\u4E00-\u9FAF]          Common and uncommon kanji
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      [\u2605-\u2606]          Stars
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      [\u2190-\u2195]          arrows
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      \u203B                    Weird asterisk thing
--------------------------------------------------------------------------------
    )                        end of grouping
--------------------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
--------------------------------------------------------------------------------
  )                        end of \2
--------------------------------------------------------------------------------
  (                        group and capture to \3:
--------------------------------------------------------------------------------
    [ \t]*                   any character of: ' ', '\t' (tab) (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \3
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string

Regex/Method to comment a Japanese text

1 Answers1