How can I transliterate Hindi text between Hindi and Latin characters using Java?

Question

How to convert a hindi meanig written in english alphabet to Hindi using Java?

eg.

Input text is: anil NE lath marke apko Ganga me hi Fenk diya.

in Hindi

Output text is: अनिल ने लात मार्के आपको गंगा में ही फेंक दिया

How to convert using Java or any Java API?

i fond a api other than google called Jitter but getting an error

Source is: inko
Input is: a2b45xdsfsdf
Output is: 
Matches is: 0
Exception in thread "main" org.jtr.transliterate.CharacterParseException: No valid delimiter found for start of expression
    at org.jtr.transliterate.Perl5Parser.parsePerlString(Perl5Parser.java:133)

source code is

/*
 * Copyright (c) 2001-2005, Nicholas Cull
 * All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 *
 *  * Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 *
 *  * Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 *
 *  * Neither the name "jtr" nor the names of its contributors
 *    may be used to endorse or promote products derived from this software
 *    without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
 * A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR
 * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
 * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
 * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
 * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
 * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
 * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 */

package org.jtr.transliterate;

/**
 * A utility class for providing Perl 5 syntactic sugar on top of the
 * {@link CharacterParser} class. This parses Perl-style transliteration strings
 * into a form suitable for <code>CharacterParser</code>. For instance, the
 * string <code>"tr/a-zA-Z/0-9a-zA-Z/cd"</code> is parsed into two strings
 * <code>"a-zA-Z"</code>, <code>"0-9a-zA-Z"</code>, and the flags
 * <code>COMPLEMENT_MASK | DELETE_UNREPLACEABLES_MASK</code>.
 *
 * @author <a href="mailto:run2000@users.sourceforge.net">Nicholas Cull</a>
 * @version $Id: Perl5Parser.java,v 1.3 2005/03/14 06:40:18 run2000 Exp $
 * @since 1.1
 */
public final class Perl5Parser {

    /** Initial state of the sequence parser. */
    private static final short INITIAL_STATE = 0;
    /** Sequence parser encountered an escape character. */
    private static final short ESCAPE_STATE = 1;
    /** Sequence parser encountered the delimiter. */
    private static final short DELIMITER_STATE = 2;

    /** Private constructor to indicate this is a static class. */
    private Perl5Parser() {
    }

    /**
     * <p>Parses the given string in Perl syntax and returns a populated
     * {@link CharacterReplacer} object that can be used to perform the
     * specified transliteration.</p>
     *
     * <p>This is a simple factory method that calls the {@link #parsePerlString
     * parsePerlString} method below, creates a new {@link CharacterReplacer}
     * object and populates it with the results.</p>
     *
     * @param source the String to be parsed and compiled
     * @return a new CharacterReplacer ready for transliterations
     * @throws CharacterParseException something went wrong during parsing
     * @throws NullPointerException source is <code>null</code>
     */
    public static CharacterReplacer makeReplacer( String source )
            throws CharacterParseException {

        StringBuffer input = new StringBuffer();
        StringBuffer output = new StringBuffer();
        int flags = 0;
        CharacterReplacer replacer;

        flags = parsePerlString( source, input, output );
        replacer = new CharacterReplacer( input.toString(), output.toString() );
        replacer.setFlags( flags );
        return replacer;
    }

    /**
     * <p>Parses the given Perl-style transliteration string into two parts:</p>
     * <ol>
     * <li>The input character string to be transliterated
     * <li>The replacement character string
     * </ol>
     * <p>These strings can then be fed into the constructors for
     * {@link CharacterReplacer}. It also returns any flags encountered at the
     * end of the string into a form suitable for CharacterReplacer.</p>
     *
     * @param source the string to be parsed
     * @param input (out) the characters to be transliterated
     * @param output (out) the replacement characters
     * @return any flags parsed at the end of the character sequence
     * @throws CharacterParseException there was a problem parsing the source
     * String
     * @throws NullPointerException source is <code>null</code>
     */
    public static int parsePerlString( String source, StringBuffer input,
            StringBuffer output ) throws CharacterParseException {

        int length = source.length();
        int pos = 0;
        int flags = 0;
        char delimiter;

        if( length < 3 ) {
            throw new CharacterParseException(
                    "Source is too small to be parsed", pos );
        }

        if( input == null || output == null ) {
            throw new CharacterParseException(
                    "String buffers have not been initialized", pos );
        }

        if( source.startsWith( "tr" )) {
            pos = 2;
        } else if( source.startsWith( "y" )) {
            pos = 1;
        }

        delimiter = source.charAt( pos );
        if( delimiter == '-' || delimiter == '\\' ||
                Character.isLetterOrDigit( delimiter )) {
            throw new CharacterParseException(
                    "No valid delimiter found for start of expression", pos );
        }

        pos++;
        pos = parseSequence( source, pos, delimiter, input );

        if( pos == length ) {
            throw new CharacterParseException(
                    "Cannot parse replacement sequence, no character sequence found",
                    pos );
        }

        pos = parseSequence( source, pos, delimiter, output );

        // Parse any flags at the end
        while( pos < length ) {
            char flag = source.charAt( pos );
            switch( flag ) {
                case 'c':
                    flags = flags | CharacterParser.COMPLEMENT_MASK;
                    break;

                case 'd':
                    flags = flags | CharacterParser.DELETE_UNREPLACEABLES_MASK;
                    break;

                case 's':
                    flags = flags | CharacterParser.SQUASH_DUPLICATES_MASK;
                    break;

                default:
                    throw new CharacterParseException(
                            "Unknown flag passed into character parser", pos );
            }
            pos++;
        }
        return flags;
    }

    /**
     * Parse the first or second character sequence and place the parsed result
     * into the given StringBuffer. The source string is scanned from initial
     * position pos until an unescaped delimiter character is found. We use a
     * simple finite state machine to determine when we encounter an escape
     * character or a delimiter.
     *
     * @param source the source String to be scanned
     * @param pos the starting position for the scan
     * @param delimiter the delimiter character to indicate the end of the
     * sequence
     * @param buffer the character buffer to store the parsed result
     * @return the new position of the parser
     * @throws NullPointerException source or buffer are <code>null</code>
     */
    private static int parseSequence( String source, int pos, char delimiter,
                                      StringBuffer buffer ) {
        int length = source.length();
        short state = INITIAL_STATE;
        int startPos = pos;
        char curr = '\0';

        while(( pos < length ) && ( state != DELIMITER_STATE )) {
            curr = source.charAt( pos );
            switch( state ) {
                case INITIAL_STATE:
                    if( curr == '\\' ) {
                        state = ESCAPE_STATE;
                    } else if( curr == delimiter ) {
                        // Copy the current source to the buffer
                        buffer.append( source.substring( startPos, pos ));
                        state = DELIMITER_STATE;
                    }
                    break;

                case ESCAPE_STATE:
                    if( curr == delimiter ) {
                        // Previous character was to escape the delimiter.
                        // Have to add the previous characters to the buffer.
                        buffer.append( source.substring( startPos, pos - 1 ));
                        startPos = pos;
                    }
                    state = INITIAL_STATE;
                    break;
            }
            pos++;
        }

        if( state != DELIMITER_STATE ) {
            buffer.append( source.substring( startPos ));
        }
        return pos;
    }

    /**
     * A simple test case for this class.
     *
     * @param args ignored
     * @throws Exception if an exception is encountered, throw it to the caller
     */
    public static void main( String args[] ) throws Exception {
        String source = "inko";
        String input = "a2b45xdsfsdf";
        String output = "";
        int matches = 0;

        try {
            CharacterReplacer replacer = makeReplacer( source );
            output = replacer.doReplacement( input );
            matches = replacer.getMatches();
        } finally {
            System.out.println( "Source is: " + source );
            System.out.println( "Input is: " + input );
            System.out.println( "Output is: " + output );
            System.out.println( "Matches is: " + matches );
        }
    }
}

at org.jtr.transliterate.Perl5Parser.makeReplacer(Perl5Parser.java:82)
at org.jtr.transliterate.Perl5Parser.main(Perl5Parser.java:240)

Really... you want to use a better version of Google translate as a java API. — Bharat Sinha, Jul 25 '12 at 06:26
I rephrased your question because you example doesn't actually show *translation*. It only shows *transliteration*, which is quite different! — Joachim Sauer, Jul 25 '12 at 06:41
Google translator API will do it, but it would translate it like `hello aap kaise ho` to `नमस्ते आप कैसे हो`, See http://translate.google.com/#auto|hi|hello%20aap%20kaise%20ho — jmj, Jul 25 '12 at 06:50
@JigarJoshi how to do with programing any idea! google Transliterate (Deprecated) — Sonia Gupta, Jul 25 '12 at 08:29
Google provide [REST api](https://developers.google.com/translate/v2/using_rest) for it, You could consume rest from java very easily — jmj, Jul 25 '12 at 09:05
@JigarJoshi Google Translate API is paid service. i need to develop a application???? — Sonia Gupta, Jul 25 '12 at 11:47
@Jigar Joshi i need transliteration and its language detection — Sonia Gupta, Jul 27 '12 at 09:48
[_You can translate text from one language to another language by sending an HTTP GET request to its URI. The URI for a request has the following format:_](https://developers.google.com/translate/v2/using_rest#Translate) — jmj, Jul 27 '12 at 10:02

score 1 · Answer 1 · answered Jul 25 '12 at 06:39

1

If I understand you correctly, then you want to be able to transliterate to/from Hindi text. (i.e. transform from one writing system to another). That's usually a lot easier than translation (i.e. convert from one language to another).

I don't know of a library to do that, but this Wiktionary page on Hindi transliteration might get you started.

answered Jul 25 '12 at 06:39

Joachim Sauer

302,674
57
556
614

2

@SoniaGupta: I already wrote "I don't know of a library to do that". – Joachim Sauer Jul 25 '12 at 06:50

Amit Kumar Gupta · Answer 2 · 2013-04-30T15:42:20.993

I seen an android dictionary hinkhoj which does the same thing without internet access. So definitely it is possible in java.

Replacing each letter or group of letters with corresponding Hindi unicode will work. Like replacing 'ma' with '\u092E'

Sample working program;

public class EnglishToHindiTranslator {

    static Map<String, String> phoneticMap = new HashMap<String, String>();
    static Map<String, String> maatraMap = new HashMap<String, String>();

    static {
        phoneticMap.put("ma","\u092E");
        phoneticMap.put("ta","\u0924");
        phoneticMap.put("t","\u0924\u094D");
        phoneticMap.put("ra","\u0930");
        maatraMap.put("a","\u093E");
    }

    public static void main(String[] args) throws UnsupportedEncodingException {
        String engWord = "maatra";
        engWord = engWord.replaceAll("ma",phoneticMap.get("ma") );
        engWord = engWord.replaceAll("t",phoneticMap.get("t") );
        engWord = engWord.replaceAll("ra",phoneticMap.get("ra") );
        engWord = engWord.replaceAll("a",maatraMap.get("a") );

        System.out.println(new String(engWord.getBytes("UTF-8")));
    }
}

How can I transliterate Hindi text between Hindi and Latin characters using Java?

2 Answers2