Text run is not in unicode normalization form c как исправить

1 / 1 / 0

Регистрация: 02.02.2017

Сообщений: 14

16.05.2022, 10:52. Показов 809. Ответов 4

Добрый день.
Валидатор выдаёт ошибку: Text run is not in Unicode Normalization Form C
Ошибка на этой строке:

HTML5

<p><a href="https://www.youtube.com/watch?v=W4MIiV4nZDY&ab_channel=BogdanStashchuk">Полный Курс HTML Для Начинающих. Ссылка <b>на очень понятный</b> видеоурок по теме</a></p>

Вот ссылка на сайт: http://g91697u1.beget.tech/

Либо вот весь код страницы:

HTML5

<!doctype html>
<html lang="ru">
<head>
    <meta charset="UTF-8">
    <title>Белки</title>
</head>
<body>
    <h1>Белок – основа жизни на планете, ее основной элемент.</h1>
    <h2>Белки — это углеродные вещества, состоящие из цепочки аминокислот.</h2>
    <h3>Белки состоят из следующих химических элементов: углерод (50-55%), кислород (20-23%), азот (12-19%), водород (6-7%), сера (0,2-3,0%).</h3>
    <h4>Другое название белков – протеины – было выбрано неслучайно, т.к. с греческого слово «proteois» переводится как «первостепенной важности».</h4>
    <h5>Белки подразделяются на: животные (мясо, рыба, птица, молочные продукты) и растительные (орехи, соя, горох, фасоль). При этом на животные белки должно приходится около 60%.</h5>
    <h6>Взрослый человек на 60 % состоит из воды, на 19 % из белков, на 15 % из жиров, 5 % в нем минеральных веществ и 1 % углеводов.</h6>
    <p>Аминокислоты - это строительный материал для всех белков в организме, из которых образуются мышцы, сухожилия, связки, кожа, волосы. В фитнесе и бодибилдинге они необходимы для повышения эффективности тренировок и наращивания мышечной массы. Аминокислоты помогают быстро восстановиться и избавиться от болей после интенсивных занятий.</p>
    <img src="https://bud-v-forme.ru/upload/_resize/780_430/5312_900x420.jpg"/>орлр
    <p><a href="https://www.youtube.com/watch?v=W4MIiV4nZDY&ab_channel=BogdanStashchuk">Полный Курс HTML Для Начинающих. Ссылка <b>на очень понятный</b> видеоурок по теме</a></p>
</body>
</html>

Что здесь не так, подскажите пожалуйста

__________________
Помощь в написании контрольных, курсовых и дипломных работ, диссертаций здесь

Источник

The href attribute of an <a> element contains an invalid character, that should be properly encoded as a URI percent-encoded character.

The src attribute on an <img> element contains an invalid character, that should be properly encoded as a URI percent-encoded character.

The accept attribute may be specified to provide browsers with a hint of what file types will be accepted on an <input> element. It expects a comma-separated list of allowed file types. Refer to the list of media types to check the accepted tokens. In this example, the first line is invalid while the second is valid:

<input name='file' type='file' accept='doc, docx, pdf' />

<input name='file' type='file' accept='text/doc, text/docx, application/pdf' />

When was the last time you validated your whole site?

Keep your sites healthy checking for A11Y/HTML issues on an automated schedule.

Space characters are not allowed in href attributes. Instead, they should be converted to %20. In this example, the first line is invalid and the second is valid:

<a href="https://example.com#some term">invalid</a>
<a href="https://example.com#some%20term">valid</a>

The href attribute on an <a> tag contains an space, which is not allowed. Consider replacing space characters with “%20”.

12,500 Accessibility and HTML checks per week. Fully automated.

Let our automated scanner check your large sites using Axe Core and W3C Validator.

Space characters are not allowed in src attributes. Instead, they should be converted to %20. In this example, the first line is invalid and the second is valid:

<img src="https://example.com/?s=some term" alt="description" />
<img src="https://example.com/?s=some%20term" alt="description" />

The src attribute on an <img> tag is not allowed to contain space characters. You should replace them with “%20“.

An HTML tag could not be parsed, most probably because of a typo.

A character has been found in the document that is not allowed in the charset encoding being used.

6,250 HTML checks per week. Fully automated.

Save time using our automated web checker. Let our crawler check your web pages on the W3C Validator.

Источник

emf

My webpage:

https://files.nyu.edu/emf202/public/fr/limericks.html

checks OK with the W3C Validation Service as HTML5, but it triggers 8
warnings that have to do with the use of the following Greek characters:

? U0387 Greek Ano Teleia
? U03C2 Greek Small Letter Final Sigma
? U03AC Greek Small Letter Alpha With Tonos
? U03AF Greek Small Letter Iota With Tonos

These are all basic characters of the Greek alphabet and
non-replaceable, the trigger, however, the warning

«Text run is not in Unicode Normalization Form C.»

Can somebody explain to me what this means?

Thanks,

emf

Jukka K. Korpela

My webpage:

https://files.nyu.edu/emf202/public/fr/limericks.html

checks OK with the W3C Validation Service as HTML5, but it triggers 8
warnings that have to do with the use of the following Greek characters:

? U0387 Greek Ano Teleia
? U03C2 Greek Small Letter Final Sigma
? U03AC Greek Small Letter Alpha With Tonos
? U03AF Greek Small Letter Iota With Tonos

These are all basic characters of the Greek alphabet and
non-replaceable, the trigger, however, the warning

«Text run is not in Unicode Normalization Form C.»

This is explained fairly well at
http://stackoverflow.com/questions/5465170/text-run-is-not-in-unicode-normalization-form-c
As I remark in a comment that I added there now, the message used to be
an error, but it was changed to a warning after the discussion started by
http://lists.w3.org/Archives/Public/www-validator/2011May/0031.html

See also
http://stackoverflow.com/questions/8766675/normalizing-unicode-according-to-the-w3c-in-php

So it is not about conformance to HTML5 (which is a vague concept as
such, since HTML5 is mutable) but about general opinions of the W3C on
normalization.

In this case, the warning about GREEK ANO TELEIA is understandable,
since that character has canonical decomposition to U+00B7 MIDDLE DOT,
on in normalization to Normalization Form C (NFC), GREEK ANO TELEIA is
replaced by MIDDLE DOT. This is an example of inadequacy of NFC in many
situations: these characters have different glyphs in many fonts, and
GREEK ANO TELEIA is, not surprisingly, usually a much better choice for
the Greek punctuation mark (it usually sits around the x-height, whereas
the middle dot tends to be considerably lower).

Regarding the other warnings, I really don’t understand. The characters
seem to be in NFC. And for example, on line 409, there are two
occurrences of GREEK SMALL LETTER ALPHA WITH TONOS, but only the latter
has been flagged. I thought it might relate to the position (at the end
of the line), but the next line contains the same character at the end
of the line, with no warning issued.

So this seems to be a bug in the validator.

The bad thing is that we cannot tell which of the warnings are real, in
the sense that the text is actually not NFC. The Greek alpha with tonos
*could* be written as decomposed, and in general it would be best to
avoid that and especially using precomposed and decomposed form in the
same document, as they *might* get rendered differently (and the
precomposed one then probably renders better).

Jukka K. Korpela

2013-04-04 9:44 said:

Regarding the other warnings, I really don’t understand.

Now I do.

The characters seem to be in NFC.

They are. All the warnings about something not being NFC are caused by
GREEK ANO TELEIA. The validator just misrepresents the location by
highlighting, in red, the last letter of a line, no matter where the
issue is on that line.

So this seems to be a bug in the validator.

Well, it’s a bug in highlighting — and in counting, since it says
«Validation Output: 8 Warnings» but shows only 6 warnings (and there are
6 occurrences of GREEK ANO TELEIA on the page).

So, ignore the warnings.

Jukka K. Korpela

2013-04-04 10:07 said:

Well, it’s a bug in highlighting — and in counting, since it says
«Validation Output: 8 Warnings» but shows only 6 warnings (and there are
6 occurrences of GREEK ANO TELEIA on the page).

I have submitted a bug report:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=21577

emf

Thanks for your explanations. BTW, the Greek Ano Teleia is not
equivalent to the Middle Dot, not in reality, but somehow it was
considered so and it has been impossible to change it with the
authorities, though it has been tried.

In Greek grade school you learn to put the ano teleia at the same height
as the upper dot of the colon or the dot of the Greek question mark,
which is like the Latin semicolon. Some old fonts still place it there,
though newer wants misplace it lower, following the wrong official
guidelines. See discussion at
https://bugs.freedesktop.org/show_bug.cgi?id=31285 by a Greek university
professor.

This is not the only misadventure of ano teleia: When they decided on
the Greek computer keyboard, they forgot (!) to include it, and so it
still is not included. Eventually I found and installed a small program
that permits me to use it with a key combination; it may look like the
middle dot, but it’s better than nothing.

Unfortunately, once things get established, it’s difficult to change
them, though I imagine that at one point it happens, unless ano teleia
is deprecated in the Greek grammar after long years of limited use
because of its problematic use in computers, despite the insistence of
some like me to keep using it when appropriate.

emf

Jukka K. Korpela

BTW, the Greek Ano Teleia is not
equivalent to the Middle Dot, not in reality, but somehow it was
considered so and it has been impossible to change it with the
authorities, though it has been tried.

Yes, that’s what I meant. It’s a different character, but it was unified
(in terms of canonical equivalence). There has been a lot of criticism
on Unicode unification, and this is a particularly striking example. But
it’s too late to change that. NFC has been carved into stone. There is a
large amount of software that relies on NFC as currently defined. Or at
least that’s what the Unicode Consortium thinks.

In Greek grade school you learn to put the ano teleia at the same height
as the upper dot of the colon or the dot of the Greek question mark,
which is like the Latin semicolon. Some old fonts still place it there,
though newer wants misplace it lower, following the wrong official
guidelines.

Well yes, the problem is that once MIDDLE DOT has been defined as a
strongly polysemic symbol, its design in fonts needs to be tolerable for
many uses, implying that it won’t be really *good* for anything. It’s
rather similar to HYPHEN-MINUS in this respect, except that instead of
HYPHEN-MINUS we can use, between consenting adults at least,
semantically much more accurate characters like HYPHEN, NON-BREAKING
HYPHEN, EN DASH, MINUS SIGN, etc.

This is not the only misadventure of ano teleia: When they decided on
the Greek computer keyboard, they forgot (!) to include it, and so it
still is not included.

Tragicomically, MIDDLE DOT cannot be conveniently typed in most
keyboards either, and it is not used much. But when used, it might be
used in the original meaning (as in Catalan), or as raised decimal point
(as in British usage), or as multiplication dot (instead of the more
correct DOT OPERATOR), etc. etc.

Eventually I found and installed a small program
that permits me to use it with a key combination; it may look like the
middle dot, but it’s better than nothing.

You can use GREEK ANO TELEIA in HTML. Browsers won’t punish you. It’s
just a W3C opinion that it should not be used. Even though canonical
equivalence is supposed to mean identity of rendering, the reality is
different. Canonically equivalent characters may have different glyphs.

In theory, you could use MIDDLE DOT and some CSS to suggest that it be
rendered using a suitable glyph variant. Modern browsers generally
support OpenType features and let you specify such things, though IE 9
and older don’t get such things. But the main problem is that most fonts
commonly available on people’s computers, as well as most free fonts
that you could use as downloadable fonts, have limited or no OpenType
features.

Источник

Пока я пытался проверить свой сайт (http://dvartora.com/DvarTora/), я получаю следующую ошибку:

Текстовый прогон не находится в форме нормализации Unicode C

A: Что это значит?

B: Могу ли я исправить это с помощью notepad++ и как?

C: Если B нет, как я могу исправить это с помощью бесплатных инструментов (а не Dreamweaver)?

Ответ 1

а. Это означает, что он говорит (см. Объяснение dan04s для краткого ответа и Unicode Standard для длинного), но это просто указывает, что авторы валидатора хотели выдать предупреждение. Правилам HTML5 не требуется форма нормализации C (NFC); это скорее что-то вообще одобренное W3C.

B. Нет необходимости ничего исправить, если вы не решите, что использование NFC на самом деле будет лучше. Если да, то есть различные инструменты для автоматического преобразования в NFC, такие как бесплатный BabelPad редактор. Если вам нужно только иметь дело с одним символом не в NFC, вы можете использовать репозитории информации о символах, такие как поиск символа Fileformat.info, чтобы узнать каноническую разложение символа и его использование.

Используете ли вы NFC или нет, зависит от многих соображений и от задействованных символов. Как правило, NFC работает лучше, но в некоторых случаях альтернативная презентация, отличная от NFC, обеспечивает более подходящую визуализацию или улучшает работу в некоторой конкретной обработке.

Например, в дублированном вопросе ссылка Ω была сообщена как инициирующая сообщение. (Валидатор фактически проверяет символы, введенные в качестве таких ссылок, также, а не только проверку уровня NFC на уровне текста.) Ссылка означает U + 2126 OHM SIGN «Ω», которая определена как каноническая эквивалентная U + 03A9 GREEK CAPITAL ПИСЬМО OMEGA «Ω». В стандарте Unicode явно указано, что последний является предпочтительным. Он также лучше распространяется на шрифты. Но если у вас есть специальная причина использовать OHM SIGN, вы можете сделать это, не нарушая действующих правил HTML5, и вы можете игнорировать предупреждение валидатора.

Ответ 2

Что это значит?

От W3C:

В Unicode можно производить тот же текст с разными последовательностями символов. Например, возьмите Венгерское слово világ. Четвертый письмо может быть сохранено в памяти как precomposed U + 00E1 ЛАТИНСКОЕ МАЛОЕ ПИСЬМО A С ОСТРОМ (одно символ) или как разложенпоследовательность U + 0061 LATIN SMALL LETTER A, за которым следует U + 0301 КОМБИНИРОВАНИЕ ОСТРЫ ACCENT (два символа).

világ = világ

Стандарт Unicode позволяет либо эти альтернативы, но требует, чтобы оба они считаются идентичными. к повысить эффективность, приложение обычно будет нормализовать текст до выполнения поисков или сравнений. Нормализация в этом случае означает преобразование текста для использования всех предварительно составленные или все разложенные символы.

Существует четыре формы нормализацииуказанный в стандарте Unicode: NFC, NFD, NFKC и NFKD. Стойки С для (предварительно), и D для разлагаются. K означает совместимость. Улучшить совместимость, W3C рекомендует использование стандартного текста NFCв Интернете.

Кроме того, «чтобы улучшить взаимодействие», предварительно помеченный текст обычно выглядит лучше, чем разлагает текст.

Как я могу исправить это с помощью бесплатных инструментов

Используя функцию, эквивалентную Python text = unicodedata.normalize('NFC', text) на вашем любимом языке программирования.

(Или, если вы не планируете писать программу, ваш вопрос должен быть перенесен на суперпользователя или веб-мастера.)

Источник

Вам решать, исходя из цели и характера вашего приложения, применять ли вы нормализацию при чтении пользовательского ввода или хранении его в базе данных или при его написании или вообще. Подводя итог длинной теме, упомянутой в комментариях к вопросу, также доступны в официальном архиве списка на http://validator.w3.org/feedback.html

Предупреждающее сообщение исходит из экспериментальной «проверки HTML5» (которая действительно является литерной, применяя субъективные правила в дополнение к некоторым формальным тестам).
Сообщение не основано на каких-либо требованиях в проектах HTML5, а на мнениях о том, что может вызвать проблемы в некоторых программах.
Мнения, изначально сделанные «Проверка HTML5», вызывают сообщение об ошибке, теперь предупреждение.

Конечно, возможно, хотя и необычно, получать ненормализованные данные в качестве пользовательского ввода. Это не зависит от нормализации, выполняемой браузерами (они не делают таких вещей, хотя они, возможно, могут быть в будущем), а на методах ввода и привычках. Например, методы ввода буквы ü (u umlaut, или u с диарезисом) имеют тенденцию генерировать символ в предкомпозиционной форме, как это было нормировано. Люди могут производить это как ненормализованное, в разложенной форме, как письмо u, за которым следует сочетание диарезиса, но у них обычно нет причин для этого, и большинство людей даже не знают, как это сделать.

Если вы выполняете сравнение строк в своем программном обеспечении, они могут или не могут (в зависимости от используемых подпрограмм сравнения) относиться, например. предварительно согласованный ü как равный разложенному представлению. Простые реализации рассматривают их как разные, поскольку они определенно различаются на простом уровне символов (кодовые точки Юникода).

Одна из причин нормализации в какой-то момент, на этапе написания последней, заключается в том, что прекомпозированные символы обычно отображаются более надежно. Чтобы представить нормализованный ü, программа просто должна получить глиф из шрифта. Чтобы представить разложенную ü, программа должна либо распознать ее как канонически эквивалентную нормализованной ü, либо записать букву u с соответствующим символом, расположенным над ним, с должным вниманием к графическим свойствам глифа для u, а многие программы не выполняются в этом.

С другой стороны, в редких случаях, когда ненормализованные данные принимаются в качестве пользовательского ввода, у пользователя вполне может быть причина для его создания. У него может быть идея, что нормализованные ü и ненормализованные ü различны и должны рассматриваться как таковые.

Источник

emf

Advertisements

Jukka K. Korpela

Jukka K. Korpela

Jukka K. Korpela

emf

Advertisements

Jukka K. Korpela

Advertisements

Ответ 1

Ответ 2

Читайте также: