Character error rate python - Исправление ошибок и поиск оптимальных решений проблем

Содержание

Char Error RateВ¶
Module InterfaceВ¶
Functional InterfaceВ¶
jiwer 2.5.1
Навигация
Ссылки проекта
Статистика
Метаданные
Классификаторы
Описание проекта
JiWER: Similarity measures for automatic speech recognition evaluation
Installation
Usage
pre-processing
transforms
Compose
ReduceToListOfListOfWords
ReduceToSingleSentence
RemoveSpecificWords
RemoveWhiteSpace
RemovePunctuation
RemoveMultipleSpaces
Strip
RemoveEmptyStrings
ExpandCommonEnglishContractions
Evaluate OCR Output Quality with Character Error Rate (CER) and Word Error Rate (WER)
Key concepts, examples, and Python implementation of measuring Optical Character Recognition output quality
Contents
Importance of Evaluation Metrics
Error Rates and Levenshtein Distance
Character Error Rate (CER)
(i) Equation
(ii) Illustration with Example
(iii) CER Normalization
(iv) What is a good CER value?
Word Error Rate (WER)
Python Example (with TesseractOCR and fastwer)
Summing it up

Char Error RateВ¶

Module InterfaceВ¶

Character Error Rate (CER) is a metric of the performance of an automatic speech recognition (ASR) system.

This value indicates the percentage of characters that were incorrectly predicted. The lower the value, the better the performance of the ASR system with a CharErrorRate of 0 being a perfect score. Character error rate can then be computed as:

is the number of substitutions,

is the number of deletions,

is the number of insertions,

is the number of correct characters,

is the number of characters in the reference (N=S+D+C).

Compute CharErrorRate score of transcribed segments against references.

kwargsВ¶ ( Any ) вЂ“ Additional keyword arguments, see Advanced metric settings for more info.

Character error rate score

Initializes internal Module state, shared by both nn.Module and ScriptModule.

Calculate the character error rate.

Character error rate score

Store references/predictions for computing Character Error Rate scores.

predsВ¶ ( Union [ str , List [ str ]]) вЂ“ Transcription(s) to score as a string or list of strings

targetВ¶ ( Union [ str , List [ str ]]) вЂ“ Reference(s) for each speech input as a string or list of strings

Functional InterfaceВ¶

character error rate is a common metric of the performance of an automatic speech recognition system. This value indicates the percentage of characters that were incorrectly predicted. The lower the value, the better the performance of the ASR system with a CER of 0 being a perfect score.

predsВ¶ ( Union [ str , List [ str ]]) вЂ“ Transcription(s) to score as a string or list of strings

targetВ¶ ( Union [ str , List [ str ]]) вЂ“ Reference(s) for each speech input as a string or list of strings

Источник

jiwer 2.5.1

pip install jiwer Скопировать инструкции PIP

Выпущен: 6 сент. 2022 г.

Evaluate your speech-to-text system with similarity measures such as word error rate (WER)

Ссылки проекта

Статистика

Метаданные

Лицензия: Apache Software License (Apache-2.0)

Требует: Python >=3.7, nikvaessen

Классификаторы

License
- OSI Approved :: Apache Software License
Programming Language
- Python :: 3
- Python :: 3.7
- Python :: 3.8
- Python :: 3.9
- Python :: 3.10

Описание проекта

JiWER: Similarity measures for automatic speech recognition evaluation

This repository contains a simple python package to approximate the Word Error Rate (WER), Match Error Rate (MER), Word Information Lost (WIL) and Word Information Preserved (WIP) of a transcript. It computes the minimum-edit distance between the ground-truth sentence and the hypothesis sentence of a speech-to-text API. The minimum-edit distance is calculated using the Python C module Levenshtein.

Installation

You should be able to install this package using poetry:

Or, if you prefer old-fashioned pip and you’re using Python >= 3.7 :

Usage

The most simple use-case is computing the edit distance between two strings:

Similarly, to get other measures:

You can also compute the WER over multiple sentences:

We also provide the character error rate:

pre-processing

It might be necessary to apply some pre-processing steps on either the hypothesis or ground truth text. This is possible with the transformation API:

By default, the following transformation is applied to both the ground truth and the hypothesis. Note that is simply to get it into the right format to calculate the WER.

transforms

We provide some predefined transforms. See jiwer.transformations .

Compose

jiwer.Compose(transformations: List[Transform]) can be used to combine multiple transformations.

Note that each transformation needs to end with jiwer.ReduceToListOfListOfWords() , as the library internally computes the word error rate based on a double list of words. `

ReduceToListOfListOfWords

jiwer.ReduceToListOfListOfWords(word_delimiter=» «) can be used to transform one or more sentences into a list of lists of words. The sentences can be given as a string (one sentence) or a list of strings (one or more sentences). This operation should be the final step of any transformation pipeline as the library internally computes the word error rate based on a double list of words.

ReduceToSingleSentence

jiwer.ReduceToSingleSentence(word_delimiter=» «) can be used to transform multiple sentences into a single sentence. The sentences can be given as a string (one sentence) or a list of strings (one or more sentences). This operation can be useful when the number of ground truth sentences and hypothesis sentences differ, and you want to do a minimal alignment over these lists. Note that this creates an invariance: wer([a, b], [a, b]) might not be equal to wer([b, a], [b, a]) .

RemoveSpecificWords

jiwer.RemoveSpecificWords(words_to_remove: List[str]) can be used to filter out certain words. As words are replaced with a character, make sure to that jiwer.RemoveMultipleSpaces , jiwer.Strip() and jiwer.RemoveEmptyStrings are present in the composition after jiwer.RemoveSpecificWords .

RemoveWhiteSpace

jiwer.RemoveWhiteSpace(replace_by_space=False) can be used to filter out white space. The whitespace characters are , t , n , r , x0b and x0c . Note that by default space ( ) is also removed, which will make it impossible to split a sentence into a list of words by using ReduceToListOfListOfWords or ReduceToSingleSentence . This can be prevented by replacing all whitespace with the space character. If so, make sure that jiwer.RemoveMultipleSpaces , jiwer.Strip() and jiwer.RemoveEmptyStrings are present in the composition after jiwer.RemoveWhiteSpace .

RemovePunctuation

jiwer.RemovePunctuation() can be used to filter out punctuation. The punctuation characters are defined as all unicode characters whose catogary name starts with P . See https://www.unicode.org/reports/tr44/#General_Category_Values.

RemoveMultipleSpaces

jiwer.RemoveMultipleSpaces() can be used to filter out multiple spaces between words.

Strip

jiwer.Strip() can be used to remove all leading and trailing spaces.

RemoveEmptyStrings

jiwer.RemoveEmptyStrings() can be used to remove empty strings.

ExpandCommonEnglishContractions

jiwer.ExpandCommonEnglishContractions() can be used to replace common contractions such as let’s to let us .

Currently, this method will perform the following replacements. Note that ␣ is used to indicate a space ( ) to get around markdown rendering constrains.

Источник

Evaluate OCR Output Quality with Character Error Rate (CER) and Word Error Rate (WER)

Key concepts, examples, and Python implementation of measuring Optical Character Recognition output quality

Importance of Evaluation Metrics

Great job in successfully generating output from your OCR model! You have done the hard work of labeling and pre-processing the images, setting up and running your neural network, and applying post-processing on the output.

The final step now is to assess how well your model has performed. Even if it gave high confidence scores, we need to measure performance with objective metrics. Since you cannot improve what you do not measure, these metrics serve as a vital benchmark for the iterative improvement of your OCR model.

In this article, we will look at two metrics used to evaluate OCR output, namely Character Error Rate (CER) and Word Error Rate (WER).

Error Rates and Levenshtein Distance

The usual way of evaluating prediction output is with the accuracy metric, where we indicate a match ( 1) or a no match ( 0). However, this does not provide enough granularity to assess OCR performance effectively.

We should instead use error rates to determine the extent to which the OCR transcribed text and ground truth text (i.e., reference text labeled manually) differ from each other.

A common intuition is to see how many characters were misspelled. While this is correct, the actual error rate calculation is more complex than that. This is because the OCR output can have a different length from the ground truth text.

Furthermore, there are three different types of error to consider:

Substitution error: Misspelled characters/words
Deletion error: Lost or missing characters/words
Insertion error: Incorrect inclusion of character/words

The question now is, how do you measure the extent of errors between two text sequences? This is where Levenshtein distance enters the picture.

Levenshtein distance is a distance metric measuring the difference between two string sequences. It is the minimum number of single-character (or word) edits (i.e., insertions, deletions, or substitutions) required to change one word (or sentence) into another.

For example, the Levenshtein distance between “ mitten” and “ fitting” is 3 since a minimum of 3 edits is needed to transform one into the other.

The more different the two text sequences are, the higher the number of edits needed, and thus the larger the Levenshtein distance.

Character Error Rate (CER)

(i) Equation

CER calculation is based on the concept of Levenshtein distance, where we count the minimum number of character-level operations required to transform the ground truth text (aka reference text) into the OCR output.

It is represented with this formula:

S = Number of Substitutions
D = Number of Deletions
I = Number of Insertions
N = Number of characters in reference text (aka ground truth)

Bonus Tip: The denominator N can alternatively be computed with:
N = S + D + C (where C = number of correct characters)

The output of this equation represents the percentage of characters in the reference text that was incorrectly predicted in the OCR output. The lower the CER value (with 0 being a perfect score), the better the performance of the OCR model.

(ii) Illustration with Example

Let’s look at an example:

Several errors require edits to transform OCR output into the ground truth:

g instead of 9 (at reference text character 3)
Missing 1 (at reference text character 7)
Z instead of 2 (at reference text character

With that, here are the values to input into the equation:

Number of Substitutions (S) = 2
Number of Deletions ( D) = 1
Number of Insertions ( I) = 0
Number of characters in reference text ( N) = 9

Based on the above, we get (2 + 1 + 0) / 9 = 0.3333. When converted to a percentage value, the CER becomes 33.33%. This implies that every 3rd character in the sequence was incorrectly transcribed.

We repeat this calculation for all the pairs of transcribed output and corresponding ground truth, and take the mean of these values to obtain an overall CER percentage.

(iii) CER Normalization

One thing to note is that CER values can exceed 100%, especially with many insertions. For example, the CER for ground truth ‘ ABC’ and a longer OCR output ‘ ABC12345’ is 166.67%.

It felt a little strange to me that an error value can go beyond 100%, so I looked around and managed to come across an article by Rafael C. Carrasco that discussed how normalization could be applied:

Sometimes the number of mistakes is divided by the sum of the number of edit operations ( i + s + d ) and the number c of correct symbols, which is always larger than the numerator.

The normalization technique described above makes CER values fall within the range of 0–100% all the time. It can be represented with this formula:

where C = Number of correct characters

(iv) What is a good CER value?

There is no single benchmark for defining a good CER value, as it is highly dependent on the use case. Different scenarios and complexity (e.g., printed vs. handwritten text, type of content, etc.) can result in varying OCR performances. Nonetheless, there are several sources that we can take reference from.

An article published in 2009 on the review of OCR accuracy in large-scale Australian newspaper digitization programs came up with these benchmarks (for printed text):

Good OCR accuracy: CER 1‐2% (i.e. 98–99% accurate)
Average OCR accuracy: CER 2-10%
Poor OCR accuracy: CER >10% (i.e. below 90% accurate)

For complex cases involving handwritten text with highly heterogeneous and out-of-vocabulary content (e.g., application forms), a CER value as high as around 20% can be considered satisfactory.

Word Error Rate (WER)

If your project involves transcription of particular sequences (e.g., social security number, phone number, etc.), then the use of CER will be relevant.

On the other hand, Word Error Rate might be more applicable if it involves the transcription of paragraphs and sentences of words with meaning (e.g., pages of books, newspapers).

The formula for WER is the same as that of CER, but WER operates at the word level instead. It represents the number of word substitutions, deletions, or insertions needed to transform one sentence into another.

WER is generally well-correlated with CER (provided error rates are not excessively high), although the absolute WER value is expected to be higher than the CER value.

Ground Truth: ‘my name is kenneth’
OCR Output: ‘myy nime iz kenneth’

From the above, the CER is 16.67%, whereas the WER is 75%. The WER value of 75% is clearly understood since 3 out of 4 words in the sentence were wrongly transcribed.

Python Example (with TesseractOCR and fastwer)

We have covered enough theory, so let’s look at an actual Python code implementation.

In the demo notebook, I ran the open-source TesseractOCR model to extract output from several sample images of handwritten text. I then utilized the fastwer package to calculate CER and WER from the transcribed output and ground truth text (which I labeled manually).

Summing it up

In this article, we covered the concepts and examples of CER and WER and details on how to apply them in practice.

While CER and WER are handy, they are not bulletproof performance indicators of OCR models. This is because the quality and condition of the original documents (e.g., handwriting legibility, image DPI, etc.) play an equally (if not more) important role than the OCR model itself.

I welcome you to join me on a data science learning journey! Give this Medium page a follow to stay in the loop of more data science content, or reach out to me on LinkedIn. Have fun evaluating your OCR model!

Источник

Project description

A PyPI package for fast word/character error rate (WER/CER) calculation

fast (cpp implementation)
sentence-level and corpus-level WER/CER scores

Installation

pip install fastwer

Example

import fastwer
hypo = ['This is an example .', 'This is another example .']
ref = ['This is the example :)', 'That is the example .']

# Corpus-Level WER: 40.0
fastwer.score(hypo, ref)
# Corpus-Level CER: 25.5814
fastwer.score(hypo, ref, char_level=True)

# Sentence-Level WER: 40.0
fastwer.score_sent(hypo[0], ref[0])
# Sentence-Level CER: 22.7273
fastwer.score_sent(hypo[0], ref[0], char_level=True)

Contact

Changhan Wang (wangchanghan@gmail.com)

Download files

Download the file for your platform. If you’re not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

Источник

Char Error RateВ¶

Module InterfaceВ¶

Character Error Rate (CER) is a metric of the performance of an automatic speech recognition (ASR) system.

is the number of substitutions,

is the number of deletions,

is the number of insertions,

is the number of correct characters,

is the number of characters in the reference (N=S+D+C).

Compute CharErrorRate score of transcribed segments against references.

kwargsВ¶ ( Any ) вЂ“ Additional keyword arguments, see Advanced metric settings for more info.

Character error rate score

Initializes internal Module state, shared by both nn.Module and ScriptModule.

Calculate the character error rate.

Character error rate score

Store references/predictions for computing Character Error Rate scores.

predsВ¶ ( Union [ str , List [ str ]]) вЂ“ Transcription(s) to score as a string or list of strings

targetВ¶ ( Union [ str , List [ str ]]) вЂ“ Reference(s) for each speech input as a string or list of strings

Functional InterfaceВ¶

predsВ¶ ( Union [ str , List [ str ]]) вЂ“ Transcription(s) to score as a string or list of strings

targetВ¶ ( Union [ str , List [ str ]]) вЂ“ Reference(s) for each speech input as a string or list of strings

Источник

jiwer 2.5.1

pip install jiwer Скопировать инструкции PIP

Выпущен: 6 сент. 2022 г.

Evaluate your speech-to-text system with similarity measures such as word error rate (WER)

Ссылки проекта

Статистика

Метаданные

Лицензия: Apache Software License (Apache-2.0)

Требует: Python >=3.7, nikvaessen

Классификаторы

License
- OSI Approved :: Apache Software License
Programming Language
- Python :: 3
- Python :: 3.7
- Python :: 3.8
- Python :: 3.9
- Python :: 3.10

Описание проекта

JiWER: Similarity measures for automatic speech recognition evaluation

Installation

You should be able to install this package using poetry:

Or, if you prefer old-fashioned pip and you’re using Python >= 3.7 :

Usage

The most simple use-case is computing the edit distance between two strings:

Similarly, to get other measures:

You can also compute the WER over multiple sentences:

We also provide the character error rate:

pre-processing

It might be necessary to apply some pre-processing steps on either the hypothesis or ground truth text. This is possible with the transformation API:

By default, the following transformation is applied to both the ground truth and the hypothesis. Note that is simply to get it into the right format to calculate the WER.

transforms

We provide some predefined transforms. See jiwer.transformations .

Compose

jiwer.Compose(transformations: List[Transform]) can be used to combine multiple transformations.

Note that each transformation needs to end with jiwer.ReduceToListOfListOfWords() , as the library internally computes the word error rate based on a double list of words. `

ReduceToListOfListOfWords

ReduceToSingleSentence

RemoveSpecificWords

RemoveWhiteSpace

RemovePunctuation

RemoveMultipleSpaces

jiwer.RemoveMultipleSpaces() can be used to filter out multiple spaces between words.

Strip

jiwer.Strip() can be used to remove all leading and trailing spaces.

RemoveEmptyStrings

jiwer.RemoveEmptyStrings() can be used to remove empty strings.

ExpandCommonEnglishContractions

jiwer.ExpandCommonEnglishContractions() can be used to replace common contractions such as let’s to let us .

Currently, this method will perform the following replacements. Note that ␣ is used to indicate a space ( ) to get around markdown rendering constrains.

Источник

jitsi/jiwer

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more.

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md