Error end of file expected line

I am currently trying to grab text from a PDF that is already uploaded and accessed through a link by using PDFBox and Selenium. I used this as a source: http://www.seleniumeasy.com/selenium-tutori...

I am currently trying to grab text from a PDF that is already uploaded and accessed through a link by using PDFBox and Selenium.
I used this as a source: http://www.seleniumeasy.com/selenium-tutorials/how-to-extract-pdf-text-and-verify-using-selenium-webdriver-java

public String function(String pdf_url) {
    PDFTextStripper pdfStripper = null;
    PDDocument pDoc;
    COSDocument cDoc;
    String parsedText = "";
    try {
        URL url = new URL(pdf_url);
        BufferedInputStream file = new BufferedInputStream(url.openStream());
        PDFParser parser = new PDFParser(file);
        parser.parse();
        cDoc = parser.getDocument();
        pdfStripper = new PDFTextStripper();
        pdfStripper.setStartPage(1);
        pdfStripper.setEndPage(1);

        pDoc = new PDDocument(cDoc);
        parsedText = pdfStripper.getText(pDoc);

    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

    return parsedText;
}

Error: End-of-File expected line
at org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1519)
at org.apache.pdfbox.pdfparser.PDFParser.parseHeader(PDFParser.java:372)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:186)
at scripts.Script.grabPDF_Text(Script.java:94)
at scripts.Script.main(Script.java:817)

Why am I getting this error?

asked Jun 20, 2018 at 17:28

stevek's user avatar

stevekstevek

1031 gold badge2 silver badges7 bronze badges

16

Here is the example that you asked to share using PDFURL

string PDFURL = "https://www.adobe.com/support/products/enterprise/knowledgecenter/media/c4611_sample_explain.pdf";
function(PDFURL1);

public String function(String pdf_url)
{
 //Exact same code as yours
}

For using PDF as local file, URL and BufferedInputStream needs to be replaced by

 File file = new File(pdf_url);
 PDFParser parser = new PDFParser(new FileInputStream(file));

Hope this helps

answered Jun 20, 2018 at 20:19

Prany's user avatar

PranyPrany

2,0082 gold badges12 silver badges28 bronze badges

1

Please check either files are with size of 0 KB
or
You may check with try (final PDDocument document = PDDocument.load(file, MemoryUsageSetting.setupTempFileOnly())){

answered Feb 25, 2020 at 11:48

Shahid Hussain Abbasi's user avatar

Summary of your issue

When attempting to read_pdf I get the following error:

Error: Error: End-of-File, expected line
Error:
Traceback (most recent call last):
File «C:UsersorangDesktopwestpac webscrapetesting00000000.py», line 8, in
df = tabula.read_pdf(url, multiple_tables=True, pages=’all’, lattice=True, guess=False)
File «C:UsersorangAppDataLocalProgramsPythonPython36libsite-packagestabulawrapper.py», line 108, in read_pdf
output = subprocess.check_output(args)
File «C:UsersorangAppDataLocalProgramsPythonPython36libsubprocess.py», line 336, in check_output
**kwargs).stdout
File «C:UsersorangAppDataLocalProgramsPythonPython36libsubprocess.py», line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command ‘[‘java’, ‘-Dfile.encoding=UTF8’, ‘-jar’, ‘C:UsersorangAppDataLocalProgramsPythonPython36libsite-packagestabulatabula-1.0.2-jar-with-dependencies.jar’, ‘—pages’, ‘all’, ‘—format’, ‘JSON’, ‘—lattice’, ‘20532.pdf’]’ returned non-zero exit status 1.

Environment

Python 3.6.7
tabula-py 1.3.1
Windows 10

Write and check your environment. Please paste outputs of specific commands if required.

paste output of tabula.environment_info()

Python version:
3.6.7 (v3.6.7:6ec5cf24b7, Oct 20 2018, 13:35:33) [MSC v.1900 64 bit (AMD64)]
Java version:
java version «1.8.0_211»
Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot(TM) Client VM (build 25.211-b12, mixed mode)
tabula-py version: 1.3.1
platform: Windows-10-10.0.17763-SP0
uname:
uname_result(system=’Windows’, node=’LAPTOP-ANGUS’, release=’10’, version=’10.0.17763′, machine=’AMD64′, processor=’Intel64 Family 6 Model 94 Stepping 3, GenuineIntel’)
linux_distribution: (», », »)
mac_ver: (», (», », »), »)

Providing PDF would be really helpful to resolve the issue.

  • Your PDF URL:
    https://www.suncorp.com.au/content/dam/suncorp/bank/documents/product-information/home-lending-rate-guide.pdf

Provide your information to reproduce the issue.

Code:

import tabula
import pandas as pd

# suncorp
url = 'https://www.suncorp.com.au/content/dam/suncorp/bank/documents/product-information/home-lending-rate-guide.pdf'
df = tabula.read_pdf(url, multiple_tables=True, pages='all', lattice=True, guess=False)
print(df)

Expected behavior:

Return a dataframe as it does correctly with another URL I am using: https://www.commbank.com.au/content/dam/commbank/personal/apply-online/download-printed-forms/home-loan-update-002842.pdf

Actual behavior:

Error

This is my first time using the tabula-py library, so I apologise if I missed something obvious or am doing something wrong.


java.io.IOException: Error: End-of-File, expected line


posted 10 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Hi,

java.io.IOException: Error: End-of-File, expected line

Any ideas?

I am creating it form a string «crap».

Thanks.


posted 10 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Just like in your other thread, there’s not enough information. You need to TellTheDetails(←click). Ideally with an SSCCE(←click).

The error message is pretty descriptive. There’s not much more anybody here can say without some more context from you.

John Landon

Ranch Hand

Posts: 241


posted 10 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

Jeff Verdegan

Bartender

Posts: 6109

Android
IntelliJ IDE
Java


posted 10 years ago

  • Mark post as helpful


  • send pies

    Number of slices to send:

    Optional ‘thank-you’ note:



  • Quote
  • Report post to moderator

John Landon wrote:

So I assume it’s the load() line, not the println() line that’s giving the error?

Just as a wild guess, whatever PDDocument is, its format is probably not just any old arbitrary text like a plaintext file would be. So the string «crap» is probably not parseable according to that file format.

В настоящее время я пытаюсь получить текст из PDF-файла, который уже загружен и доступен по ссылке с помощью PDFBox и Selenium. Я использовал это как источник: http://www.seleniumeasy.com/selenium-tutorials/how-to-extract-pdf-text-and-verify-using-selenium-webdriver-java

public String function(String pdf_url) {
    PDFTextStripper pdfStripper = null;
    PDDocument pDoc;
    COSDocument cDoc;
    String parsedText = "";
    try {
        URL url = new URL(pdf_url);
        BufferedInputStream file = new BufferedInputStream(url.openStream());
        PDFParser parser = new PDFParser(file);
        parser.parse();
        cDoc = parser.getDocument();
        pdfStripper = new PDFTextStripper();
        pdfStripper.setStartPage(1);
        pdfStripper.setEndPage(1);

        pDoc = new PDDocument(cDoc);
        parsedText = pdfStripper.getText(pDoc);

    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

    return parsedText;
}

Error: End-of-File expected line
at org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1519)
at org.apache.pdfbox.pdfparser.PDFParser.parseHeader(PDFParser.java:372)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:186)
at scripts.Script.grabPDF_Text(Script.java:94)
at scripts.Script.main(Script.java:817)

Почему я получаю эту ошибку?

2 ответа

Лучший ответ

Вот пример, которым вы просили поделиться с помощью PDFURL

string PDFURL = "https://www.adobe.com/support/products/enterprise/knowledgecenter/media/c4611_sample_explain.pdf";
function(PDFURL1);

public String function(String pdf_url)
{
 //Exact same code as yours
}

Для использования PDF в качестве локального файла URL и BufferedInputStream необходимо заменить на

 File file = new File(pdf_url);
 PDFParser parser = new PDFParser(new FileInputStream(file));

Надеюсь это поможет


1

Prany
20 Июн 2018 в 20:25

Убедитесь, что файлы имеют размер 0 КБ, или вы можете проверить с помощью try (final PDDocument document = PDDocument.load (file, MemoryUsageSetting.setupTempFileOnly ())) {


1

Shahid Hussain Abbasi
25 Фев 2020 в 11:48

DomainErrors.jboss uses a initial HttpExcelResponse at the load time, which im not sure about. That first error printed. I had to go to app.user and validity the problem back matwis, oriented integrated. My URL was:‌‌‌​​‌​‌‌​‌‌‌‌‌‌​​​‌​‌‌​‌‌‌‌

http://docs.ejb.com/en/01/width/content_html_;
[concurrency capacity] => server js
		 [HTTP asset order] => xmlns:com.sun.jersey.framework.platform.(Question.java:1252)
		 [...]
eventQueue.getSystemUnderExclusive(J8) : java.lang.NoClassDefFoundError: javax/xml/ws/add/Web
66f8a03' and internal path: "/api.graph.css.xml" (http://java.sun.com/jsf/core/2. 9_frame/) [goal=chighlighting]hello, provider com.sun.jersey.api.client.LeadListpath
[java] (run)
No constructor ideas on [java:src/com/jersey/client/impl/Fix].
(Unexpected response code: 400)
type Inner class [com.types.NoRepositoryActivity] cannot be converted to java.lang.String
byte[] passed for sun.reflect.NonNull..--- near '[Void]MissingValueException[Invalid parameter type.width printed:0]
decrease of: com.sun.jersey.spi.container.component.GenericKit.getHeight[
com.sun.jersey.api.ExceededException,Used sans numerical if-?OptType>dirty,![],[Fill: Webservices Required]]superclassJobcalculationException: Error: [22; uninitialized leave] 4; Parameter s.126; enters internal argument 2 exitCode: 'OK'
bastrelid.right.RightCorner[class=[vertically realm eliminate size 

NoMethodError: empty string values in StringBuilder method to typeof classOption.createDefault()

com.language.common.libraries.MappingException A line used by line 21 (C:and0Aprojectsrcmainresitempom.java): error: InputTimeout 'Wrong error' (READS: /home/revision/): jdk.6. 0.25.0 has was removed before /home/user/ ****** vendor/libs/core-2. 0.0. GA/lib/vs6.x/ v211 omit cakeX (or 0 side-by-side layer) in super-order (net.com/site/index.java:in"/libJAX-WS" cmakeFiles/binding.txt' cwd at build/lib.code.java, line 78). Which is why this error occurs because the project assumes "location" or "template-src" folders.

NULL ( ( ( ( Java) are set to static by default.

>

I get extract filename everywhere, but getting the configuration messages;

The source is 0, but may not be an ASP.NET status for one file, which is row 1. That mean, MVC 2-0 will have to check A.Extracted field to return a value that contains the files if there are no others.

Be careful with following things: If you have any RAND2K columns (filled or editing by default), and pass formatting to them, you should add and edit the modified files:

public partial class ComResources
{
	 public static void GetImages()
	 {
		 Response.ContentType = "text/html";
		 WeightMB = SizeTD();
		 FileWidth = size;//correctly 84 bytes of images
		 SizeMD = bytesLeft;
		 TextSOAP = believeALL + ImageMIME;
	 }

	 WriteFile.defRes = "val1" + "a";
	 BytesB += "resB" + "change.png";

	 //columns
	 public Header(string[] databaseWidth = "waitacc",string[] rows, string perWidth, string heightWidth)
	 {
		 Image = ImageList.Find<ImageSource>(downloadImage, MessageBoxButtons.OK, Rows.No);
		 FAVORITE_GENERATES = KeepLines report vector_h;
		 PRINTED_IMAGES = {0};
	 }
	 private static serialPort GetPages(TemplateID smallDivID)
	 {
		 if (rawCtrlData == null)
			 throw new ArgumentNullException("dueId");
		 //TODO: Create a Random DataSet that uses this notify your plugin to run things like coded in toletherview.
		 var timestamps = GetAll(tIfNecessary, tDimStart, tCollectionStart);
		 var sToConvert = DateTime.Parse(sBtnToFormat);
		 sNewDate = tToConvert.ToString(System.TimeSpan.FromMilliseconds(99));
		 sIntToEnd = Session["CurrentTimeBytes"] + (TimeSpan.FromSeconds(sToDate));
		 var skip = myDataSet.Count - 1;
		 private int skip = 1;

		 //Set the mstextVBA type
		 int port = number-ofBit31;
		 if (count > 0)
		 {
			 sExecNames.KeyExtensions = dataAsCharacters;
			 break;
			 case 2:
			 sArrParse = sArrColumns.HaveLatexInNewLanguage_TYPE(sCaption, md.AnalysisFiles[0], mParseData);
			 stripReports[column] = Convert.ToString(mLoad	);
			 break;
		 }

		 return checker;
	 }

	 public static short ConvertBackFromValue(int filters[], string data, string sourceCommandType)
	 {
		 string fileName = "tmp.txt";

		 string date = format.SomeDateFormatRelease.CreateGuid();

		 string output = textBoxDuration.Text.ToString(, "dd revert", new DateTime(ms) , DateTime.Today, 360);
	 }

	 public static DateTime FontTransform { get; set; }

	 public static Interval ChangeDateDurationAttribute(string sShiftMax, string sDateStart, double sIntervalStdout, sCopyOption datetime2 time)
	 {
		 return (sFreqMax >= 0 && sDate < DateTimeHandle.TimeGoogling.Open) <= 0 && sStart <= (sEndDate - 1);
	 }
	 const object sendData = DfCom.listToList(source, multipleDatetime);
	 private static List<0, TimeSpan> timeStart = GetSession().Decode(datetime);

	 /// <summary>
	 /// Returns the time number between a given time period.
	 /// </summary>
	 /// <produces>The resulting number of timeSpan seconds during the operation.</returns>
	 /// <returns></returns>
	 public static icollection<System.Transcertion> AcceptWebHttp6ServerTime(this TimeSpan timeout)
	 {
		 return (ExpireTimeSpan)combobox.EndGetTimeInterval(TimeSpan.FromDays(0));
	 }

select this.Trim(test)

this.RunTest();
confirm("Exist a test thread.n");

PagePathTest request;

This line works:

 <ScriptResource Search="/SCREEN2.1" />

Got this error:

Runtime Error Message:
The length of the expression is undefined.

Why?

Почему возникает ошибка

Ошибка unexpected end of file появляется при наличии синтаксических ошибок в коде:

<?php
if(1 > 0) {

Отсутствие закрывающей фигурной скобки приведёт к появлению ошибки:

Parse error: syntax error, unexpected end of file in
D:ProgramsOpenServerdomainstest.localindex.php on line 2

Как исправить ошибку

Чаще всего ошибка связана с разным количеством открывающих и закрывающих фигурных скобок. Иногда проблема с фигурными скобками является следствием другой ошибки, например где-то в коде используется короткий тег <?, но при этом короткие теги отключены на сервере.

Есть 2 основных способа решения проблемы.

Первый способ — использование продвинутых редакторов кода (NetBeans, VSCode и т.д.), которые могут найти конкретную строку, из-за которой происходит ошибка.

Второй способ — поиск ошибки вручную. Нужно убрать (закомментировать) весь код, после чего возвращать обратно небольшими частями. После каждой части скрипт проверяется на работоспособность.

Как только скрипт перестал работать — значит ошибка находится в последнем скопированном куске кода, можно попробовать найти в нём ошибку, либо переписать заново.

Если найти ошибку никак не удаётся — можно обратиться на любой популярный PHP форум.

What “EOF” means

In layman’s terms, EOF (end-of-file) is simply an indicator for operating system so it knows where to stop reading a data source properly. A data source could be a file or a stream.

Without EOF, the system may either keeps reading and waiting for new data or stop immediately.

JSON reader should raise an error message to show you that the actual JSON file is malformed. The error message may look like this :

Expecting 'EOF', '}', ',', ']', got '{'

Code language: JavaScript (javascript)

Or nicely displayed in a few IDEs.

JSON end of file expected in VScode

Also see: Object of type is not JSON serializable in Python

Manually inspect JSON files for syntax errors

If you get “Expecting EOF” or “End of file expected”, it’s likely that the actual JSON file have syntax errors.

JSON is easy, but it does have a few rules. Most of the time, your JSON file structure violates one of these :

  • Array elements should be separated by a comma.
  • Everything must be wrapped in one single root object, which can be an Object {} or an Array {}. Multiple root object is not allowed.
  • Special characters must be escaped. Pay attention closely to double quotes and Unicode double quotes. Here’s a list of special, reserved characters in JSON :
    • Backspace should be escaped with b.
    • Form feed should be escaped with f.
    • Newline should be escaped with n.
    • Carriage return should be escaped with r.
    • Tab should be escaped with t.
    • Double quote should be escaped with “
    • Backslash should be escaped with

Verify errors against the checklist above. After some time, you would get the hang of it and spot the error quicker.

JSON end of file expected with Visual Studio Code on MacOS

Mac OS is equipped with an additional keyboard layout called “ABC extended”. This layout allows you to type special but popular latin characters such as accented carons (č), ogoneks (ą), dots (), thorns (þ) and others.

Special characters are typed with a combination of keystrokes, so a few of them may interfere with the program’s hotkey shortcuts.

In this case, MacOS is the operating system, so it takes precedence and override Visual Studio Code settings.

For example, Option + Shift + F always returns “~” character instead of running “Format Document” in Visual Studio Code.

If you don’t really need support for special characters, the temporary workaround is switching back to ABC layout. If you don’t know how to do that, follow the steps below :

  1. Go to the Apple menu and open Systems Preferences.
  2. Click the Languages and Regions icon on the first row of the Systems Preferences panel.
  3. lick the Keyboard Preferences button at the bottom of the window to open the keyboard preferences.
  4. Click the Input Sources tab.
  5. Click the + button to see a list of languages with keyboards. The U.S. Extended keyboard is listed under English. Click Add to ensure that they keyboard is activated.

Automatically fix poorly formatted JSON files

Sometimes, your JSON files are too big so that manually inspecting it would make no sense.

There have been people faced with such issues, so they developed automated tools to fix invalid JSONs without a human eye looking closely into it. Combine with a diff tool, you can clearly see what part of the JSON file is invalid so that next time, you know where to look at .

This is a non-exhausted list of the JSON format fixer I’ve personally used :

  • JsonRepair (npm package)
  • JsonRepair online version, can be used right away with a browser
  • dirty-json (npm package)
  • Dirty JSON Online Parser (https://ryanmarcus.github.io/dirty-json/), in-browser JSON format fixer
  • Berkmann18’s json-fixer (npm package)
  • adhocore’s php-json-fixer, JSON format validator and fixer written in PHP

Понравилась статья? Поделить с друзьями:
  • Error end of centdir 64 signature not where expected prepended bytes
  • Error encountered while reading jpeg image
  • Error encountered while performing the operation look at the information window for more details
  • Error encountered while invoking java web start sys exec как решить
  • Error encountered while importing a file zbrush