I’ve been beating my head against this absolutely infuriating bug for the last 48 hours, so I thought I’d finally throw in the towel and try asking here before I throw my laptop out the window.
I’m trying to parse the response XML from a call I made to AWS SimpleDB. The response is coming back on the wire just fine; for example, it may look like:
<?xml version="1.0" encoding="utf-8"?>
<ListDomainsResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/">
<ListDomainsResult>
<DomainName>Audio</DomainName>
<DomainName>Course</DomainName>
<DomainName>DocumentContents</DomainName>
<DomainName>LectureSet</DomainName>
<DomainName>MetaData</DomainName>
<DomainName>Professors</DomainName>
<DomainName>Tag</DomainName>
</ListDomainsResult>
<ResponseMetadata>
<RequestId>42330b4a-e134-6aec-e62a-5869ac2b4575</RequestId>
<BoxUsage>0.0000071759</BoxUsage>
</ResponseMetadata>
</ListDomainsResponse>
I pass in this XML to a parser with
XMLEventReader eventReader = xmlInputFactory.createXMLEventReader(response.getContent());
and call eventReader.nextEvent();
a bunch of times to get the data I want.
Here’s the bizarre part — it works great inside the local server. The response comes in, I parse it, everyone’s happy. The problem is that when I deploy the code to Google App Engine, the outgoing request still works, and the response XML seems 100% identical and correct to me, but the response fails to parse with the following exception:
com.amazonaws.http.HttpClient handleResponse: Unable to unmarshall response (ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.): <?xml version="1.0" encoding="utf-8"?>
<ListDomainsResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/"><ListDomainsResult><DomainName>Audio</DomainName><DomainName>Course</DomainName><DomainName>DocumentContents</DomainName><DomainName>LectureSet</DomainName><DomainName>MetaData</DomainName><DomainName>Professors</DomainName><DomainName>Tag</DomainName></ListDomainsResult><ResponseMetadata><RequestId>42330b4a-e134-6aec-e62a-5869ac2b4575</RequestId><BoxUsage>0.0000071759</BoxUsage></ResponseMetadata></ListDomainsResponse>
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(Unknown Source)
at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(Unknown Source)
at com.amazonaws.transform.StaxUnmarshallerContext.nextEvent(StaxUnmarshallerContext.java:153)
... (rest of lines omitted)
I have double, triple, quadruple checked this XML for ‘invisible characters’ or non-UTF8 encoded characters, etc. I looked at it byte-by-byte in an array for byte-order-marks or something of that nature. Nothing; it passes every validation test I could throw at it. Even stranger, it happens if I use a Saxon-based parser as well — but ONLY on GAE, it always works fine in my local environment.
It makes it very hard to trace the code for problems when I can only run the debugger on an environment that works perfectly (I haven’t found any good way to remotely debug on GAE). Nevertheless, using the primitive means I have, I’ve tried a million approaches including:
- XML with and without the prolog
- With and without newlines
- With and without the «encoding=» attribute in the prolog
- Both newline styles
- With and without the chunking information present in the HTTP stream
And I’ve tried most of these in multiple combinations where it made sense they would interact — nothing! I’m at my wit’s end. Has anyone seen an issue like this before that can hopefully shed some light on it?
Thanks!
For example in Problems I have
Content is not allowed in prolog.
How to find out why its there(what eclipse plugin has put it there) and how to turn it off?
asked Mar 2, 2011 at 11:00
10
I had this problem too and it was, because i changed/saved the file in UltraEdit. After the save command, the file encoding changed and included characters, eclipse was not able to read.
You can open the file with the windows «Editor» tool and delete the characters, eclipse can not read. You will directly detect them.
answered Mar 2, 2011 at 11:25
Markus LausbergMarkus Lausberg
12.1k6 gold badges42 silver badges66 bronze badges
This sounds like an error with a xml file. Most of the time «Content is not allowed in prolog» means, that your XML file does not have the right format or even doesn’t start the right way.
answered Mar 2, 2011 at 11:04
ChrisChris
7,6158 gold badges49 silver badges99 bronze badges
1
«Content is not allowed in prolog» is the error thrown by Xerces when there’s something in an XML file or stream that precedes the <?xml?>
declaration. There must be nothing before that, not even whitespace or a Byte-Order-Mark.
answered Mar 2, 2011 at 11:10
skaffmanskaffman
396k96 gold badges814 silver badges768 bronze badges
1
Double-click the message and it should take you to the file (and ideally location within the file) that is the source of the problem.
This specific error sounds like you’ve got a malformed XML file.
answered Mar 2, 2011 at 11:02
Joachim SauerJoachim Sauer
299k57 gold badges552 silver badges611 bronze badges
1
For example in Problems I have
Content is not allowed in prolog.
How to find out why its there(what eclipse plugin has put it there) and how to turn it off?
asked Mar 2, 2011 at 11:00
10
I had this problem too and it was, because i changed/saved the file in UltraEdit. After the save command, the file encoding changed and included characters, eclipse was not able to read.
You can open the file with the windows «Editor» tool and delete the characters, eclipse can not read. You will directly detect them.
answered Mar 2, 2011 at 11:25
Markus LausbergMarkus Lausberg
12.1k6 gold badges42 silver badges66 bronze badges
This sounds like an error with a xml file. Most of the time «Content is not allowed in prolog» means, that your XML file does not have the right format or even doesn’t start the right way.
answered Mar 2, 2011 at 11:04
ChrisChris
7,6158 gold badges49 silver badges99 bronze badges
1
«Content is not allowed in prolog» is the error thrown by Xerces when there’s something in an XML file or stream that precedes the <?xml?>
declaration. There must be nothing before that, not even whitespace or a Byte-Order-Mark.
answered Mar 2, 2011 at 11:10
skaffmanskaffman
396k96 gold badges814 silver badges768 bronze badges
1
Double-click the message and it should take you to the file (and ideally location within the file) that is the source of the problem.
This specific error sounds like you’ve got a malformed XML file.
answered Mar 2, 2011 at 11:02
Joachim SauerJoachim Sauer
299k57 gold badges552 silver badges611 bronze badges
1
I’m trying to compare an XML file to an XSLT generated file from that XML file, and when I run the the class as a JUnit Test, I get the following:
[Fatal Error] :1:1: Content is not allowed in prolog.
org.xml.sax.SAXParseException: Content is not allowed in prolog.
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at org.custommonkey.xmlunit.XMLUnit.buildDocument(XMLUnit.java:352)
at org.custommonkey.xmlunit.XMLUnit.buildDocument(XMLUnit.java:339)
at org.custommonkey.xmlunit.XMLUnit.buildControlDocument(XMLUnit.java:283)
at org.custommonkey.xmlunit.Diff.<init>(Diff.java:116)
at org.custommonkey.xmlunit.examples.MyXMLTestCase.testXSLTransformation(MyXMLTestCase.java:70)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at junit.framework.TestCase.runTest(TestCase.java:164)
at junit.framework.TestCase.runBare(TestCase.java:130)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:120)
at junit.framework.TestSuite.runTest(TestSuite.java:230)
at junit.framework.TestSuite.run(TestSuite.java:225)
at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)
Any ideas?
I’ve been beating my head against this absolutely infuriating bug for the last 48 hours, so I thought I’d finally throw in the towel and try asking here before I throw my laptop out the window.
I’m trying to parse the response XML from a call I made to AWS SimpleDB. The response is coming back on the wire just fine; for example, it may look like:
<?xml version="1.0" encoding="utf-8"?>
<ListDomainsResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/">
<ListDomainsResult>
<DomainName>Audio</DomainName>
<DomainName>Course</DomainName>
<DomainName>DocumentContents</DomainName>
<DomainName>LectureSet</DomainName>
<DomainName>MetaData</DomainName>
<DomainName>Professors</DomainName>
<DomainName>Tag</DomainName>
</ListDomainsResult>
<ResponseMetadata>
<RequestId>42330b4a-e134-6aec-e62a-5869ac2b4575</RequestId>
<BoxUsage>0.0000071759</BoxUsage>
</ResponseMetadata>
</ListDomainsResponse>
I pass in this XML to a parser with
XMLEventReader eventReader = xmlInputFactory.createXMLEventReader(response.getContent());
and call eventReader.nextEvent();
a bunch of times to get the data I want.
Here’s the bizarre part — it works great inside the local server. The response comes in, I parse it, everyone’s happy. The problem is that when I deploy the code to Google App Engine, the outgoing request still works, and the response XML seems 100% identical and correct to me, but the response fails to parse with the following exception:
com.amazonaws.http.HttpClient handleResponse: Unable to unmarshall response (ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.): <?xml version="1.0" encoding="utf-8"?>
<ListDomainsResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/"><ListDomainsResult><DomainName>Audio</DomainName><DomainName>Course</DomainName><DomainName>DocumentContents</DomainName><DomainName>LectureSet</DomainName><DomainName>MetaData</DomainName><DomainName>Professors</DomainName><DomainName>Tag</DomainName></ListDomainsResult><ResponseMetadata><RequestId>42330b4a-e134-6aec-e62a-5869ac2b4575</RequestId><BoxUsage>0.0000071759</BoxUsage></ResponseMetadata></ListDomainsResponse>
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(Unknown Source)
at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(Unknown Source)
at com.amazonaws.transform.StaxUnmarshallerContext.nextEvent(StaxUnmarshallerContext.java:153)
... (rest of lines omitted)
I have double, triple, quadruple checked this XML for ‘invisible characters’ or non-UTF8 encoded characters, etc. I looked at it byte-by-byte in an array for byte-order-marks or something of that nature. Nothing; it passes every validation test I could throw at it. Even stranger, it happens if I use a Saxon-based parser as well — but ONLY on GAE, it always works fine in my local environment.
It makes it very hard to trace the code for problems when I can only run the debugger on an environment that works perfectly (I haven’t found any good way to remotely debug on GAE). Nevertheless, using the primitive means I have, I’ve tried a million approaches including:
- XML with and without the prolog
- With and without newlines
- With and without the «encoding=» attribute in the prolog
- Both newline styles
- With and without the chunking information present in the HTTP stream
And I’ve tried most of these in multiple combinations where it made sense they would interact — nothing! I’m at my wit’s end. Has anyone seen an issue like this before that can hopefully shed some light on it?
Thanks!
I’ve been beating my head against this absolutely infuriating bug for the last 48 hours, so I thought I’d finally throw in the towel and try asking here before I throw my laptop out the window.
I’m trying to parse the response XML from a call I made to AWS SimpleDB. The response is coming back on the wire just fine; for example, it may look like:
<?xml version="1.0" encoding="utf-8"?>
<ListDomainsResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/">
<ListDomainsResult>
<DomainName>Audio</DomainName>
<DomainName>Course</DomainName>
<DomainName>DocumentContents</DomainName>
<DomainName>LectureSet</DomainName>
<DomainName>MetaData</DomainName>
<DomainName>Professors</DomainName>
<DomainName>Tag</DomainName>
</ListDomainsResult>
<ResponseMetadata>
<RequestId>42330b4a-e134-6aec-e62a-5869ac2b4575</RequestId>
<BoxUsage>0.0000071759</BoxUsage>
</ResponseMetadata>
</ListDomainsResponse>
I pass in this XML to a parser with
XMLEventReader eventReader = xmlInputFactory.createXMLEventReader(response.getContent());
and call eventReader.nextEvent();
a bunch of times to get the data I want.
Here’s the bizarre part — it works great inside the local server. The response comes in, I parse it, everyone’s happy. The problem is that when I deploy the code to Google App Engine, the outgoing request still works, and the response XML seems 100% identical and correct to me, but the response fails to parse with the following exception:
com.amazonaws.http.HttpClient handleResponse: Unable to unmarshall response (ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.): <?xml version="1.0" encoding="utf-8"?>
<ListDomainsResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/"><ListDomainsResult><DomainName>Audio</DomainName><DomainName>Course</DomainName><DomainName>DocumentContents</DomainName><DomainName>LectureSet</DomainName><DomainName>MetaData</DomainName><DomainName>Professors</DomainName><DomainName>Tag</DomainName></ListDomainsResult><ResponseMetadata><RequestId>42330b4a-e134-6aec-e62a-5869ac2b4575</RequestId><BoxUsage>0.0000071759</BoxUsage></ResponseMetadata></ListDomainsResponse>
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(Unknown Source)
at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(Unknown Source)
at com.amazonaws.transform.StaxUnmarshallerContext.nextEvent(StaxUnmarshallerContext.java:153)
... (rest of lines omitted)
I have double, triple, quadruple checked this XML for ‘invisible characters’ or non-UTF8 encoded characters, etc. I looked at it byte-by-byte in an array for byte-order-marks or something of that nature. Nothing; it passes every validation test I could throw at it. Even stranger, it happens if I use a Saxon-based parser as well — but ONLY on GAE, it always works fine in my local environment.
It makes it very hard to trace the code for problems when I can only run the debugger on an environment that works perfectly (I haven’t found any good way to remotely debug on GAE). Nevertheless, using the primitive means I have, I’ve tried a million approaches including:
- XML with and without the prolog
- With and without newlines
- With and without the «encoding=» attribute in the prolog
- Both newline styles
- With and without the chunking information present in the HTTP stream
And I’ve tried most of these in multiple combinations where it made sense they would interact — nothing! I’m at my wit’s end. Has anyone seen an issue like this before that can hopefully shed some light on it?
Thanks!
Table of Contents
- Sax Error Due to Invalid Text Before XML Declaration
- Byte Order Mark (BOM) At the Beginning of the XML File
- Passing a Non Existent File to Parser
- Different Encoding Formats Causing the Parser Error
- Conclusion
This article discusses the SAX Error – Content is not allowed in prolog.
The SAX parser is the XML parsing API that you can use to process the XML files. However, while using the SAX parser, you may encounter SAX error – content is not allowed in prolog.
Sax Error Due to Invalid Text Before XML Declaration
The XML files are structured using tags. Therefore, each XML file follows specified syntax.
If you place an unknown or invalid character before the XML declaration, you will get the aforementioned error while trying to parse the file using SAX error.
Let us see an example using the following XML file.
!<?xml version=«1.0» encoding=«utf-8»?> <person> <name> Mohtashim Nawaz </name> <age> 24 </age> <prof> Software Engineer </prof> </person> |
The code to parse the file is given below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
package java2blog; import java.io.IOException; import javax.xml.parsers.ParserConfigurationException; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.SAXException; import org.xml.sax.helpers.DefaultHandler; public class XmlParser { public static void main(String[] args) { SAXParserFactory f = SAXParserFactory.newInstance(); try { SAXParser parser = f.newSAXParser(); parser.parse(«sample.xml», new DefaultHandler()); } catch (ParserConfigurationException | SAXException | IOException e) { e.printStackTrace(); } } } |
Output:
org.xml.sax.SAXParseException; systemId: file:///home/stark/eclipse-workspace-java/java2blog/sample.xml; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
The parser will raise the error. However, you can correct it by removing the extra characters and changing the file as given below.
<?xml version=«1.0» encoding=«utf-8»?> <person> <name> Mohtashim Nawaz </name> <age> 24 </age> <prof> Software Engineer </prof> </person> |
Observe that this XML file does not have (!) symbol at the beginning.
Byte Order Mark (BOM) At the Beginning of the XML File
The Byte Order Mark is a special unicode character that can indicate different things. The text editors may insert the BOM character at the beginning of the file automatically.
While parsing the XML file with the BOM character inserted in the beginning, you may encounter the SAX parser error if the file is parsed as stream of characters instead of stream of bytes.
However, it might not always be the case as in the latest version of Java the SAX parser can parse the BOM character correctly.
You can add or remove the Byte Order Mark character from the file using the code as well as manually in the text editor. Most of the text editors provide options to add or remove the BOM character.
Passing a Non Existent File to Parser
If you pass a file to parser that does not exist, you shall get the SAX parser error. The same can happen if you accidentally fail to provide the correct path.
So even if the file existed, if its path is not correct, you will eventually get the parser error.
Let us see an example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
package java2blog; import java.io.IOException; import javax.xml.parsers.ParserConfigurationException; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.SAXException; import org.xml.sax.helpers.DefaultHandler; public class XmlParser { public static void main(String[] args) { SAXParserFactory f = SAXParserFactory.newInstance(); try { SAXParser parser = f.newSAXParser(); parser.parse(«sample_unknown.xml», new DefaultHandler()); } catch (ParserConfigurationException | SAXException | IOException e) { e.printStackTrace(); } } } |
The “sample_unknown.xml” file does not exist.
Output:
java.io.FileNotFoundException: /home/stark/eclipse-workspace-java/java2blog/sample_unknown.xml (No such file or directory)
However note that in this case the only error is the FileNotFoundException
rather than parser error.
Different Encoding Formats Causing the Parser Error
The difference between the file encoding format and the encoding format you pass to the parser can cause the parser error.
For instance, if your file is encoded into UTF-8
encoding and you somehow pass the UTF-16
encoding to the parser, you will end up getting the parser error. Therefore, you should always check for the file encoding before parsing it.
Conclusion
This is all about the SAX error – content is not allowed in prolog. You can read more about SAX here.
Hope you have enjoyed reading the article. Stay tuned for more such articles. Happy Learning!