1

does someone work with fineReader abbyy sdk 10? I am curious if is even possible to get the success rate of data mining after image ocr processing.

For scenario that we have workflow for data collecting from images and if recognized result is less then 90% then we put our batch to visual validation/corrections.

For sdk handling I am using .net - its not so important to know but ... just for in case

How can I achieve that number? Thanks for advice

  • alright just make it little bit clear. What I am looking for is the summary of char confidence for whole scan - is there any Engine object fnc possibility? in RAW output file is confidence for each char but its too detailed ... – Martin Staník Mar 04 '13 at 15:50
  • probably you should ask this at ABBYY forum here: http://forum.ocrsdk.com – Tomato Mar 04 '13 at 17:11

3 Answers3

1

There is no "Global recognition confidence" property. The developers are expected to calculate it by themselves using their own confidence criterias. The most simple way is iterating through each character, checking CharParams.IsSuspicious property. Here is a code sample (C#) for FREngine 11

    //Statistics counters 

    //Count of all suspicious symbols in layout
    private int suspiciousSymbolsCount;
    //Count of all unrecognized symbols in layout
    private int unrecognizedSymbolsCount;
    //Count of all nonspace symbols in layout
    private int allSymbolsCount;
    //Count of all words in layout
    private int allWordsCount;
    //Count of all not dictionary word in layout
    private int notDictionaryWordsCount;
    private void processImage()
    {
        // Create document
        FRDocument document = engineLoader.Engine.CreateFRDocument();

        try {
            // Add image file to document
            displayMessage( "Loading image..." );
            string imagePath = Path.Combine( FreConfig.GetSamplesFolder(), @"SampleImages\Demo.tif" );

            document.AddImageFile( imagePath, null, null );

            //Recognize document
            displayMessage( "Recognizing..." );
            document.Process( null );

            // Calculate text statistics
            displayMessage( "Calculating statistics..." );
            clearStatistics();
            for( int i = 0; i < document.Pages.Count; i++ ) {
                calculateStatisticsForLayout( document.Pages[i].Layout );
            }

            //show calculated statistics
            displayStatistics();

        } catch( Exception error ) {
            MessageBox.Show( this, error.Message, this.Text, MessageBoxButtons.OK, MessageBoxIcon.Error );
        }
        finally {
            // Close document
            document.Close();
        }
    }
    private void calculateStatisticsForLayout( Layout layout )
    {    
        LayoutBlocks blocks = layout.Blocks;
        for( int index = 0; index < blocks.Count; index++ ) {
            calculateStatisticsForBlock( blocks[index] );
        }
    }

    void calculateStatisticsForBlock( IBlock block )
    {           
        if( block.Type == BlockTypeEnum.BT_Text ) {
            calculateStatisticsForTextBlock( block.GetAsTextBlock() );
        } else if( block.Type == BlockTypeEnum.BT_Table ) {
            calculateStatisticsForTableBlock( block.GetAsTableBlock() );
        }
    }

    void calculateStatisticsForTextBlock( TextBlock textBlockProperties )
    {
        calculateStatisticsForText( textBlockProperties.Text );
    }

    void calculateStatisticsForTableBlock( TableBlock tableBlockProperties )
    {
        for( int index = 0; index < tableBlockProperties.Cells.Count; index++ ) {
            calculateStatisticsForBlock( tableBlockProperties.Cells[index].Block );
        }
    }

    void calculateStatisticsForText( Text text ) 
    {
        Paragraphs paragraphs = text.Paragraphs;
        for( int index = 0; index < paragraphs.Count; index++ ) {
            calculateStatisticsForParagraph( paragraphs[index] );
        }
    }

    void calculateStatisticsForParagraph( Paragraph paragraph )
    {
        calculateCharStatisticsForParagraph( paragraph );

        calculateWordStatisticsForParagraph( paragraph );
    }

    void calculateCharStatisticsForParagraph( Paragraph paragraph )
    {
        for( int index = 0; index < paragraph.Text.Length; index++ )
        {
            calculateStatisticsForChar( paragraph, index );
        }
    }

    void calculateStatisticsForChar( Paragraph paragraph, int charIndex )
    {
        CharParams charParams = engineLoader.Engine.CreateCharParams();
        paragraph.GetCharParams( charIndex, charParams );
        if( charParams.IsSuspicious ) 
        {
            suspiciousSymbolsCount++;
        }

        if( isUnrecognizedSymbol( paragraph.Text[charIndex] ) ) 
        {
            unrecognizedSymbolsCount++;
        }

        if( paragraph.Text[charIndex] != ' ' ) 
        {
            allSymbolsCount++;
        }
    }

    void calculateWordStatisticsForParagraph( Paragraph paragraph )
    {
        allWordsCount += paragraph.Words.Count;

        for( int index = 0; index < paragraph.Words.Count; index++ ) 
        {
            if( !paragraph.Words[index].IsWordFromDictionary ) 
            {
                notDictionaryWordsCount ++;
            }
        }
    }

    bool isUnrecognizedSymbol( char symbol )
    {
        //it is special constant used by FREngine recogniser
        return ( symbol == 0x005E );
    }

    void displayStatistics()
    {
        labelAllSymbols.Text = "All symbols: " + allSymbolsCount.ToString();
        labelSuspiciosSymbols.Text = "Suspicious symbols: " + suspiciousSymbolsCount.ToString();
        labelUnrecognizedSymbols.Text = "Unrecognized symbols: " + unrecognizedSymbolsCount.ToString();

        labelAllWords.Text = "All words: " + allWordsCount.ToString();
        labelNotDictionaryWords.Text = "Non-dictionary words: " + notDictionaryWordsCount.ToString();
    }
Nadia Solovyeva
  • 207
  • 1
  • 7
  • There is also a character-by-character 1-100 confidence score at: characterParameters.SelectedCharacterRecognitionVariant.CharConfidence – Chris Steele Apr 04 '17 at 17:18
  • You would rather prefer not to use it in this case. Value from 0-100 is not an indication of a particular character absolute reading confidence. It's a relevant value helping you to find out how good selected char variant is among all other possible characters in this location. – Nadia Solovyeva Apr 05 '17 at 04:53
0

IMHO there is no such 'global confidence' value - but you could very easily obtain this by taking the confidence of each character and making an average of the total. However, I think you should direct your request to ABBYY's forum or support email address to see what their advice is.

It is not really possible to tell you what level of confidence I might get if I were using the engine because all this is so dependant on the quality of the image, the size of the font and so on: there's no such thing as an 'average document' that the industry uses to base their data on.

Good luck!

Leafdoc
  • 46
  • 3
0

FRE SDK recognized result has text only in Text or Table blocks. I'd suggest you have a global word count variable.

  1. Run a async method to iterate through words and get number of suspicious characters in the words. (IsSuspicious)
  2. Find total number of words in each page with suspicious characters
  3. (words with suspicious char)/(Total number of words) and multiply the result by 100.

    2/4 equals 0.5. multiply 0.5 * 100 = 50%. That's your accuracy.The code sample for checking suspicious char and confidence is given above in another answer from abbyy.

Sakthivel
  • 1,890
  • 2
  • 21
  • 47