Из всего вразумительного, что нашлось в гугле по данной ошибке это вот это сообщение:
Я заменил свой код
$xml = simplexml_load_string(file_get_contents($filename));
на
$xml = simplexml_load_string(file_get_contents($filename), 'SimpleXMLElement', LIBXML_COMPACT | LIBXML_PARSEHUGE);
но это не помогает.
Вообще ситуация странная ибо ошибка проявляется странно, а именно: xml, который приходит — в одну строку и при парсинге libxml_get_errors()
возвращает мне «Huge input lookup, строка 1, столбец 7894566». В данном случае столбец — это символ. Я нахожу этот символ (в нём нет ничего криминального) и перед тегом, в котором этот символ используется ставлю конец строки (энтер), а также после закрывающего тега тоже ставлю конец строки. Итого у меня получается файл в 3 строки:
- строка до тега, который содержит «проблемный» символ,
- строка, в которой открывающий тег, текст, содержащий «проблемный» символ и закрывающий тег
- остальная часть файла в одну строку
Пример:
...<offer id="43081" bid="11" available="true"><priceBase>167.6</priceBase><price>358.94</price><currencyId>UAH</currencyId>
<categoryId>000015</categoryId>
<Pack1>1</Pack1><Pack2>0</Pack2><delivery>false</delivery><local_delivery_cost>0</local_delivery_cost><name>Ручк...
Сохраняю, парсю. «Проблемный» символ уже становится не проблемный, а проблема находится в третьей строке. С третьей строкой я делаю всё тоже самое, что описал выше и опять проблема находится ещё дальше, т.е. в той части файла, которая осталась после переносов строк в нём.
Т.е. тыкая обрывы строк между …><… я «отодвигаю» проблемный символ на конец файла. Очевидно, что проблема в данном случае не в конкретном символе.
Однако, когда я добрался до последнего товара и «проблемный» символ был найден в нём, я проставил обрывы строк между всеми тегами этого товара. Получил вполне нормальный вид
<offer id="412081" bid="13" available="true">
<priceBase>1167.6</priceBase>
<price>758.94</price>
<currencyId>UAH</currencyId>
<categoryId>000015</categoryId>
<Pack1>1</Pack1>
<Pack2>0</Pack2>
<Pack3>1</Pack3>
<delivery>false</delivery>
<local_delivery_cost>0</local_delivery_cost>
<name>Ручка шар/масл "GLYCER+&ampquot дисплей-бокс, 244 шт микс цветов 0,7 мм &ampquotLINC"</name>
<vendor>LINC</vendor>
<description>Ручки Linc известны во всем мире качеством и долговечностью. Шарико-масляные чернила дают насыщенное письмо даже при минимальном нажиме, а ударопрочный пластик сохраняет ручку целой при падении. Игольчатый пишущий узел делает записи более аккуратными. Linc – это мягкое и чистое письмо ручкой. Без надавливания и усилий.</description>
<barcode>8906081050540</barcode>
<country_of_origin>Iндiя</country_of_origin>
</offer>
Получаю всё ту же ошибку. Она ругается на 17 символ в строке < description>. В данном случае это буква «к». Я не вижу ничего кримиинального в этом коде. Я тупо удаляю строку с тегом < description>…</ description> и парсинг проходит нормально.
Соответственно вопрос: в чём суть ошибки и как её побороть?
Содержание
- Huge file simplexml_load_string() throwing warning of huge content in lookup #1371
- Comments
- Service Inventory:74: parser error : internal error: Huge input lookup #12
- Comments
- Huge input lookup when parsing (SAX) #2028
- Comments
- Huge input lookup error for large XLF files #5720
- Comments
Huge file simplexml_load_string() throwing warning of huge content in lookup #1371
it will be fixed by adding constant LIBXML_PARSEHUGE in line 457 phpexcelClassesPHPExcelReaderExcel2007.php
The text was updated successfully, but these errors were encountered:
solution also given please some one fix this issue because i have tested in my local system with 2lack records in xlsx file, thisissue is causing only in linux not in windows
error:
Warning: simplexml_load_string(): Entity: line 2: parser error : internal error: Huge input lookup in /var/www/html/Projectsnew/Anil/sms/vendor/phpoffice/phpexcel/Classes/PHPExcel/Reader/Excel2007.php on line 457
Warning: simplexml_load_string(): दाबाद, पुणे, नाशिक, जयपूर वगळता) in /var/www/html/Projectsnew/Anil/sms/vendor/phpoffice/phpexcel/Classes/PHPExcel/Reader/Excel2007.php on line 457
Warning: simplexml_load_string(): ^ in /var/www/html/Projectsnew/Anil/sms/vendor/phpoffice/phpexcel/Classes/PHPExcel/Reader/Excel2007.php on line 457
Warning: simplexml_load_string(): Entity: line 2: parser error : internal error: Huge input lookup in /var/www/html/Projectsnew/Anil/sms/vendor/phpoffice/phpexcel/Classes/PHPExcel/Reader/Excel2007.php on line 457
Warning: simplexml_load_string(): दाबाद, पुणे, नाशिक, जयपूर वगळता) in /var/www/html/Projectsnew/Anil/sms/vendor/phpoffice/phpexcel/Classes/PHPExcel/Reader/Excel2007.php on line 457
Warning: simplexml_load_string(): ^ in /var/www/html/Projectsnew/Anil/sms/vendor/phpoffice/phpexcel/Classes/PHPExcel/Reader/Excel2007.php on line 457
Источник
Service Inventory:74: parser error : internal error: Huge input lookup #12
The text was updated successfully, but these errors were encountered:
It seems related:
The error is fixed by:
However, with the CCP stencils you point, the result is «empty» SVGs.
In fact the SVGs are not empty, libvisio/librevenge places most of the content outside the image (ex with «Actively Processing», the content is positioned at x=»3456.0000″ y=»-756.0000″, with an image with dimensions of width=»38.6615″ height=»38.7120″ it’s rather strange. ).
vss2xhtml (the tool from libvisio, which generate SVG to, without handling emf/wmf blobs) gives the same weird values.
Thanks for fix. I will look into this tomorrow in more detail. I don’t know these libraries much so probably, but understand in general what is wrong here. Thanks for explanation.
Well actually these stencils are core of my design toolset, so lets try to realize where the root cause lies.
There is not a lot of things I could do except opening a bug in libvisio bugzilla.
This project is only a glue between:
- libvisio & librevenge
- libwmf
- libemf2svg (which I maintain to)
I’ve opened a ticket there:
Otherwise, even if it’s not the most practical thing to do, you can rework the generated SVG in inkscape or directly by editing directly the SVG file.
I see :-), so in minimum we have workaround.
Источник
Huge input lookup when parsing (SAX) #2028
Apologies if this has been submitted already. I have problem parsing a big XML file that has
20mb base64 encoded file attached. It seems that when parsing from IO and content has «rn» as line separators causes «Huge input lookup» error.
Reproduced «successfully» this with 1.10.9 and older versions
Environment
The text was updated successfully, but these errors were encountered:
@juskoljo Thanks for reporting, and sorry you’re having a problem. I’ll try to take a look later today.
Hi, any updates on this? 👍
@juskoljo Hi! Apologies for the slow response, it’s been hard for me to find time to work on OSS recently.
I’m having difficulty reproducing what you’re seeing, here’s what I get using the script you provided:
This is because the generated XML has multiple XML declarations.
I took a few minutes to rewrite the script to avoid multiple decls (as well as multiple roots) and still can’t reproduce. Here’s what I did:
which prints out four «ok»s.
Can you help me reproduce this? Or help me discover what I’m doing differently from you (or what’s different about my system)?
No worries! :). You are right, the first script generated something that was not my intention. Note to myself: Never edit a script online while posting ;). Following script should reproduce the issue.
The script is simulating a scenario when there is a big XML node (base64 encoded file) with CR and/or LF after every 77 chars.
My nokogiri seem to be the same as yours except:
Hi Mike, any updates on this? The snippet in my previous post should fire the exception 👍
@juskoljo Thanks for your patience, and apologies for for not replying sooner — your reply on May 31 fell through the cracks of my inbox (and I’m still struggling to spend time on OSS).
I’ll take a look today, I have some time blocked out.
OK, I’ve explored this a bit and found something interesting. This script:
Here’s the call stack when the error is raised:
, data2=data2@entry=93824998074320) at eval.c:1128 #13 0x00007ffff34b9e73 in parse_with (self= , sax_handler=93824999974800) at ../../../../ext/nokogiri/xml_sax_parser_context.c:126 #14 0x00007ffff7c84eaa in vm_call_cfunc_with_frame (empty_kw_splat= , cd=0x555555bd7cc0, calling= , reg_cfp=0x7ffff7017ef8, ec=0x555555758600) at vm_insnhelper.c:2514 #15 vm_call_cfunc (ec=0x555555758600, reg_cfp=0x7ffff7017ef8, calling= , cd=0x555555bd7cc0) at vm_insnhelper.c:2539 #16 0x00007ffff7c904b6 in vm_sendish (block_handler= , method_explorer= , cd= , reg_cfp= , ec= ) at vm_insnhelper.c:4023 #17 vm_exec_core (ec=0x4d2, initial=140737488329608, initial@entry=0) at insns.def:801 #18 0x00007ffff7c96f3f in rb_vm_exec (ec=0x555555758600, mjit_enable_p=mjit_enable_p@entry=1) at vm.c:1929 #19 0x00007ffff7ca2310 in rb_iseq_eval_main (iseq=iseq@entry=0x5555557811a8) at vm.c:2179 #20 0x00007ffff7acf68a in rb_ec_exec_node (ec=ec@entry=0x555555758600, n=n@entry=0x5555557811a8) at eval.c:277 #21 0x00007ffff7ad6039 in ruby_run_node (n=0x5555557811a8) at eval.c:335 #22 0x000055555555496b in main (argc= , argv= ) at ./main.c:50″>
I’ve narrowed this down to what I think is a libxml2 edge case in parsing elements in xmlParseCharDataComplex. Will spend some more time on it this weekend.
OK, I’ve found the problem and I think it’s a bug in libxml2. I’ll write a brief description here, but will submit a bug report upstream and will think about patching Nokogiri’s vendored library in the meantime.
In brief: two things are happening simultaneously within libxml2/parser.c :
- the text node is big enough to exceed XML_MAX_TEXT_LENGTH (which defaults to 10,000,000 bytes)
- this bug triggers an edge case where the optimized path in xmlParseCharData passes control to xmlParseCharDataComplex which is limited by XML_MAX_TEXT_LENGTH
If only one or the other of these happens, nobody notices:
- if the text node is smaller than XML_MAX_TEXT_LENGTH then there’s simply a performance penalty for triggering (2), but functionally it works fine (e.g., in your script if we repeat less than 128,000 times the text node comes in under this size)
- if the bug isn’t hit, then the optimized path is followed (e.g., in your script if n is 77 or 79 then the bug isn’t triggered and so the optimized path is followed and the memory limit isn’t triggered)
The bug is related to SAX parsing: by default libxml2 will read the doc in chunks of size xmlIO.c ‘s MINLEN (which is 4,000 bytes). When, after a read, the first byte is 0x0A (aka n ), then the xmlParseCharData function gives up and calls xmlParseCharDataComplex .
You can actually see this in action by making sure the first line of the node’s text is:
where the importance of 3894 is 4000-106, where 106 is the number of bytes occurring in the document before the node’s text begins. Any boom node in this document that is longer than 10,000,000 characters will fail if its 3895th character is a newline. CRAZYTOWN.
Here’s the patch that fixes this problem (note that it looks like the same bug exists in xmlParseComment ):
Источник
Huge input lookup error for large XLF files #5720
Cannot commit any new translations when XLF file grows to about 32MB.
Tested in versions from 3.10 to 4.5.1
I already tried
Searched for the problem and found, that it is a limitation of lxml, which could be bypassed by specifying huge_tree=True when creating the XML parser, but I was not successful.
To Reproduce the issue
Steps to reproduce the behavior: Use XLF file larger than 32MB.
Exception traceback
In 4.5.1 I get only log lines like:
WARNING Translation parse error: XMLSyntaxError: internal error: Huge input lookup, line 379625, column 1 (, line 379625)
WARNING Failed to parse file on commit: FileParseError: internal error: Huge input lookup, line 379625, column 1 (, line 379625)
ERROR project/xxxxx/cs: skipping commit due to error: internal error: Huge input lookup, line 379625, column 1 (, line 379625)
Server configuration and status
Weblate installation: PyPI installed on Debian Buster amd64.
- Weblate: 4.5.1
- Django: 3.1.7
- siphashc: 1.3
- translate-toolkit: 3.3.3
- lxml: 4.4.2
- Pillow: 7.0.0
- bleach: 3.1.5
- python-dateutil: 2.8.1
- social-auth-core: 4.1.0
- social-auth-app-django: 4.0.0
- django-crispy-forms: 1.9.2
- oauthlib: 3.1.0
- django-compressor: 2.4
- djangorestframework: 3.11.0
- django-filter: 2.4.0
- django-appconf: 1.0.3
- user-agents: 2.0
- filelock: 3.0.12
- setuptools: 45.1.0
- jellyfish: 0.7.2
- openpyxl: 3.0.3
- celery: 4.4.7
- kombu: 4.6.11
- translation-finder: 2.9
- weblate-language-data: 2021.3
- html2text: 2020.1.16
- pycairo: 1.18.2
- pygobject: 3.34.0
- diff-match-patch: 20200713
- requests: 2.22.0
- django-redis: 4.11.0
- hiredis: 1.0.1
- sentry_sdk: 0.14.0
- Cython: 0.29.14
- misaka: 2.1.1
- GitPython: 3.0.5
- borgbackup: 1.1.10
- pyparsing: 2.4.7
- pyahocorasick: 1.4.1
- Python: 3.7.3
- Git: 2.20.1
- psycopg2-binary: 2.8.4
- phply: 1.2.5
- chardet: 3.0.4
- ruamel.yaml: 0.16.6
- tesserocr: 2.5.0
- boto3: 1.11.6
- zeep: 3.4.0
- aeidon: 1.6.0
- git-svn: 2.20.1
- Redis server: 5.0.3
- PostgreSQL server: 11.6
- Database backends: django.db.backends.postgresql
- Cache backends: default:RedisCache, avatar:FileBasedCache
- Email setup: django.core.mail.backends.smtp.EmailBackend: localhost
- OS encoding: filesystem=utf-8, default=utf-8
- Celery: redis://localhost:6379, redis://localhost:6379, regular
- Platform: Linux 4.19.0-6-amd64 (x86_64)
The text was updated successfully, but these errors were encountered:
Источник
lxml.etree.XMLSyntaxError: internal error: Huge input lookup #12
Comments
kmacbbrewin commented Sep 21, 2017 •
Hello,
First, this project has been very useful right out of the box for most of my sites. Thanks!
I am having one issue with a site I work with frequently, that prevents use of shareplum:
When trying to access a site with a very large user table, I get the below error from the GetUsers function on line 197 of shareplum.py. Seems like this may need an additional parser?
lxml.etree.XMLSyntaxError: internal error: Huge input lookup, line 45149, column 647
The text was updated successfully, but these errors were encountered:
jasonrollins commented Sep 22, 2017
That’s an interesting problem. Are you using Python 2 or 3? How many users do you have for one site? I suppose we could include some check for a large response to the User table and try to break up the request into smaller chunks using the query functionality.
kmacbbrewin commented Sep 22, 2017
I am using Python 3.6. We have 50k+ items in the Userinfo table, and the function breaks at 45149.
Not quite sure what’s special about that number. I am still kind of new Python and tried to create a parser, but couldn’t quite figure it out and implement it with Shareplum. How would you use the query functionality to break it up?
larrys commented Sep 22, 2017
I’ve been meaning to work on some pull requests with some fixes I’ve made over the last year of me using this.
jasonrollins commented Sep 22, 2017
It looks like Larry has a solution.
Kenny, would it be possible for you to try it out? I don’t have access to any tables this large to try it out.
Larry, can you submit a pull request? Does the large xml parser run slower?
larrys commented Sep 25, 2017
I’ve created #13 that will make it configurable by the user if they want to use huge_tree or not. I’ve not done any profiling, so I updated my code to make it user driven, and default to existing behavior and not use it.
kmacbbrewin commented Sep 25, 2017 •
I tried these changes, but now the error is different. These changes still work on sites with a small user table, but still not on a large one. I am getting a KeyError on ImnName. What is this field? I looked at the schema for UserInfo, and I don’t see a similar named field.
Here’s the error now:
File «/checklist_env/lib/python3.6/site-packages/shareplum/shareplum.py», line 204, in
return <‘py’: ,
KeyError: ‘ImnName’
larrys commented Sep 26, 2017
Are you able to run this in something that has a debugger, and put a breakpoint on that line, and see what the data looks like in the data variable? My list of users has a ImnName field.
kmacbbrewin commented Sep 26, 2017
Looks like it depends on which version of SharePoint is being used. In older versions the field is simply «Name» in newer versions, it’s «ImnName».
I was able to create a connection to the problem site and create a list. Thanks for your help!
Now, to figure out the update list items and update list functionality.
Источник
BUG: read_xml not support large file #45442
Comments
wangrenz commented Jan 18, 2022
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
lxml.etree.XMLSyntaxError: internal error: Huge input lookup, line 61504, column 702
Expected Behavior
read_xml nead add huge_tree=True in pandas/io/xml.py
Installed Versions
INSTALLED VERSIONS
commit : f00ed8f
python : 3.8.10.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-1127.13.1.el7.x86_64
Version : #1 SMP Tue Jun 23 15:46:38 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.3.0
numpy : 1.20.3
pytz : 2021.1
dateutil : 2.8.2
pip : 21.1.3
setuptools : 52.0.0.post20210125
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.22.0
pandas_datareader: None
bs4 : None
bottleneck : 1.3.2
fsspec : 2021.07.0
fastparquet : None
gcsfs : None
matplotlib : 3.3.4
numexpr : 2.7.3
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.7.0
sqlalchemy : None
tables : None
tabulate : None
xarray : 0.19.0
xlrd : None
xlwt : None
numba : None
The text was updated successfully, but these errors were encountered:
ParfaitG commented Jan 23, 2022 •
Thanks @wangrenz! This is a great use case. Large XML file support was in the works for both lxml and etree parsers (see #40131 ) We may not need an additional argument but have read_xml catch this exception and attempt the lxml workaround.
How large was your XML file? Can you post a reproducible example of its content (redact as needed)?
ParfaitG commented Jan 25, 2022
Using the Wikipedia latest page article dump with a bzip2 at 2.7 GB which decompresses to 12.4 GB, I am unable to raise your lxml error. In fact, my Ubuntu laptop of 8 GM RAM raised a killed on a read_xml attempt.
However, iterparse worked great for both lxml and etree which can be an approach to use for large XML files in read_xml since the entire tree is not read at once and you can read all elements one by one and even delete after use to avoid growing the enormous tree. See below implementation.
The idea is users pass in an iterparse_items dict parameter where the key will be the repeating element in document and value will be the list of any descendant or attribute located anywhere under the repeating element. Using this argument will be in lieu of default xpath parsing and works for users who need tags or attributes in heavily nested XML documents without relation to each other but as descendants to repeating element.
Источник
Huge input lookup error for large XLF files #5720
Comments
mclei-asw commented Mar 22, 2021 •
Cannot commit any new translations when XLF file grows to about 32MB.
Tested in versions from 3.10 to 4.5.1
I already tried
Searched for the problem and found, that it is a limitation of lxml, which could be bypassed by specifying huge_tree=True when creating the XML parser, but I was not successful.
To Reproduce the issue
Steps to reproduce the behavior: Use XLF file larger than 32MB.
Exception traceback
In 4.5.1 I get only log lines like:
WARNING Translation parse error: XMLSyntaxError: internal error: Huge input lookup, line 379625, column 1 (, line 379625)
WARNING Failed to parse file on commit: FileParseError: internal error: Huge input lookup, line 379625, column 1 (, line 379625)
ERROR project/xxxxx/cs: skipping commit due to error: internal error: Huge input lookup, line 379625, column 1 (, line 379625)
Server configuration and status
Weblate installation: PyPI installed on Debian Buster amd64.
- Weblate: 4.5.1
- Django: 3.1.7
- siphashc: 1.3
- translate-toolkit: 3.3.3
- lxml: 4.4.2
- Pillow: 7.0.0
- bleach: 3.1.5
- python-dateutil: 2.8.1
- social-auth-core: 4.1.0
- social-auth-app-django: 4.0.0
- django-crispy-forms: 1.9.2
- oauthlib: 3.1.0
- django-compressor: 2.4
- djangorestframework: 3.11.0
- django-filter: 2.4.0
- django-appconf: 1.0.3
- user-agents: 2.0
- filelock: 3.0.12
- setuptools: 45.1.0
- jellyfish: 0.7.2
- openpyxl: 3.0.3
- celery: 4.4.7
- kombu: 4.6.11
- translation-finder: 2.9
- weblate-language-data: 2021.3
- html2text: 2020.1.16
- pycairo: 1.18.2
- pygobject: 3.34.0
- diff-match-patch: 20200713
- requests: 2.22.0
- django-redis: 4.11.0
- hiredis: 1.0.1
- sentry_sdk: 0.14.0
- Cython: 0.29.14
- misaka: 2.1.1
- GitPython: 3.0.5
- borgbackup: 1.1.10
- pyparsing: 2.4.7
- pyahocorasick: 1.4.1
- Python: 3.7.3
- Git: 2.20.1
- psycopg2-binary: 2.8.4
- phply: 1.2.5
- chardet: 3.0.4
- ruamel.yaml: 0.16.6
- tesserocr: 2.5.0
- boto3: 1.11.6
- zeep: 3.4.0
- aeidon: 1.6.0
- git-svn: 2.20.1
- Redis server: 5.0.3
- PostgreSQL server: 11.6
- Database backends: django.db.backends.postgresql
- Cache backends: default:RedisCache, avatar:FileBasedCache
- Email setup: django.core.mail.backends.smtp.EmailBackend: localhost
- OS encoding: filesystem=utf-8, default=utf-8
- Celery: redis://localhost:6379, redis://localhost:6379, regular
- Platform: Linux 4.19.0-6-amd64 (x86_64)
The text was updated successfully, but these errors were encountered:
Источник
ERROR: lxml.etree.XMLSyntaxError: internal error: Huge input lookup about codeclimate-cppcheck HOT 1 OPEN
Comments (1)
Hi, I’m cppcheck contributor.
It seems that the error is not in cppcheck . We don’t use lxml internally. Instead we use xml module from the standard library in addons. So, I suppose this is a codeclimate issue.
Related Issues (20)
- codeclimate engines:enable cppcheck throws an error HOT 1
- [cppcheck] error parsing language in .codeclimate.yml HOT 9
- Max configs is not working HOT 2
- help request for private repo HOT 1
- max_configs setting doesn’t do anything HOT 2
- Alpine 3.10 with cppcheck 1.87 is available
- Support `suppressions-list` Parameter HOT 1
- Change Default `—enable` Behavior HOT 1
- Error running codeclimate analyze — [cppcheck] error: b» HOT 8
- Could anyone please support «-j» option for cppcheck? HOT 1
- official docker image is lagging behind this repo HOT 1
- max_configs and suppressions-list invalid HOT 1
- cppcheck plugin (MISRA) HOT 2
- Update Docker image HOT 3
- fail to append —inline-suppr option for cppcheck HOT 4
- Huge scan result cause error HOT 2
- `codeclimate engines:enable cppcheck` fails with «Engine not found.» HOT 1
- codeclimate analyze —dev fails HOT 2
- Classifying error as security isn’t a good idea HOT 1
Recommend Projects
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow
An Open Source Machine Learning Framework for Everyone
Django
The Web framework for perfectionists with deadlines.
Laravel
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
Recommend Topics
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
server
A server is a program made to process requests and deliver data to clients.
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
Recommend Org
We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft
Open source projects and samples from Microsoft.
Источник
lxml.etree.XMLSyntaxError: internal error: Huge input lookup about shareplum HOT 14 CLOSED
Comments (14)
I’ve been meaning to work on some pull requests with some fixes I’ve made over the last year of me using this.
jasonrollins commented on January 12, 2023
That’s an interesting problem. Are you using Python 2 or 3? How many users do you have for one site? I suppose we could include some check for a large response to the User table and try to break up the request into smaller chunks using the query functionality.
kmacbbrewin commented on January 12, 2023
I am using Python 3.6. We have 50k+ items in the Userinfo table, and the function breaks at 45149.
Not quite sure what’s special about that number. I am still kind of new Python and tried to create a parser, but couldn’t quite figure it out and implement it with Shareplum. How would you use the query functionality to break it up?
jasonrollins commented on January 12, 2023
It looks like Larry has a solution.
Kenny, would it be possible for you to try it out? I don’t have access to any tables this large to try it out.
Larry, can you submit a pull request? Does the large xml parser run slower?
larrys commented on January 12, 2023
I’ve created #13 that will make it configurable by the user if they want to use huge_tree or not. I’ve not done any profiling, so I updated my code to make it user driven, and default to existing behavior and not use it.
kmacbbrewin commented on January 12, 2023
I tried these changes, but now the error is different. These changes still work on sites with a small user table, but still not on a large one. I am getting a KeyError on ImnName. What is this field? I looked at the schema for UserInfo, and I don’t see a similar named field.
Here’s the error now:
File «/checklist_env/lib/python3.6/site-packages/shareplum/shareplum.py», line 204, in
return <‘py’: ,
KeyError: ‘ImnName’
larrys commented on January 12, 2023
Are you able to run this in something that has a debugger, and put a breakpoint on that line, and see what the data looks like in the data variable? My list of users has a ImnName field.
kmacbbrewin commented on January 12, 2023
Looks like it depends on which version of SharePoint is being used. In older versions the field is simply «Name» in newer versions, it’s «ImnName».
I was able to create a connection to the problem site and create a list. Thanks for your help!
Now, to figure out the update list items and update list functionality.
Have either of you solved for updating a list (e.g. adding columns and specifying type)? I believe that is part of the updatelist SharePoint api, but I see that was still a Todo item.
kmacbbrewin commented on January 12, 2023
Also, a different question: What’s the preference/etiquette on this project? Should I close the issue now or should we wait until the pull request is complete
jasonrollins commented on January 12, 2023
I don’t have any desired etiquette. Thanks for your interest. I have not had much of a chance to work on it for some time. That will hopefully change soon. You can just leave it open for now as a reminder for me until I merge in a fix. Which versions of SharePoint are you using SharePlum with?
kmacbbrewin commented on January 12, 2023
I have successfully tested it on SP 2010 and SP2013 on prem servers. I may also try to use it on an O365 instance, but I think the authentication is different.
ldacey commented on January 12, 2023
The fix from larrys worked for me as well. I manually edited the py shareplum.py file.
mahesh557 commented on January 12, 2023
Thanks larry. it worked for me.
Replaced all envelope = etree.fromstring(response.text.encode(‘utf-8’))
to envelope = etree.fromstring(response.text.encode(‘utf-8’), parser=etree.XMLParser(huge_tree=True))
jasonrollins commented on January 12, 2023
I’m closing this for now. Please open another issue if it still isn’t working.
Related Issues (20)
- Reading from API responses
- clientID + secret? HOT 3
- How this example works? Beginner Question
- Unable to upload file in AWS Lambda ? HOT 3
- Shareplum HTTP Post Failed : 500 Server Error: HOT 1
- OrderBy not working HOT 2
- Error authenticating against Office 365 HOT 2
- ImportError: cannot import name ‘etree’ from ‘lxml’ (/tmp/package.zip/lxml/__init__.py)
- blank in user fields
- modify date = creation date when downloading a List
- AADSTS50126: Error validating credentials due to invalid username or password HOT 1
- Querying on Created or Modified Throws Failure
- proxy setting with shareplum
- Error 403
- Folder operations on sites with read only permisisons
- UpdateListItems not working HOT 1
- Error authenticating against Office 365. Error from Office 365:’, ‘AADSTS53003: Access has been blocked by Conditional Access policies. The access policy does not allow token issuance. HOT 4
- Copy Files & Folders
- how to list all items in corrent page?
- Is it possible to download on local disk? HOT 2
Recommend Projects
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow
An Open Source Machine Learning Framework for Everyone
Django
The Web framework for perfectionists with deadlines.
Laravel
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
Recommend Topics
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
server
A server is a program made to process requests and deliver data to clients.
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
Recommend Org
We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft
Open source projects and samples from Microsoft.
Источник