You are most likely using Puppeteer to interact with the web in some form. Either you are taking screenshots of a website using Puppeteer, you scrape content, you render content or you crawl the web. Sooner or later you will run into the question of how to deal with timeouts and how to tell Puppeteer what you want.
Let’s dive right into understanding and solving navigational timeouts with Puppeteer.
The Problem: «TimeoutError: Navigation Timeout Exceeded: 30000ms exceeded» #
Whenever your target site (either your own or some other site) fails to serve you in reasonable manner you might get this error:
error { TimeoutError: Navigation Timeout Exceeded: 30000ms exceeded
at Promise.then (.../node_modules/puppeteer/lib/NavigatorWatcher.js:74:21)
at <anonymous> name: 'TimeoutError' }
Here you see the default Puppeteer timeout of 30 seconds in action. Of course only, if your Function as a Service hasn’t timed you out at this point.
Solution 1: Set or disable the timeout for a navigation/request #
Some requests take longer than others. If you need to wait for a request that regularly exceeds the Puppeteer default of 30 seconds you can increase it:
// 60k milliseconds = 60s
await page.goto(url, {waitUntil: 'load', timeout: 60000});
If need to wait for a request to finish no matter what, you want to disable the timeout:
// 0 = disabled
await page.goto(url, {waitUntil: 'load', timeout: 0});
This will work for the following navigational events
- page.goBack
- page.goForward
- page.goto
- page.reload
- page.setContent
- page.waitFor
- page.waitForFileChooser
- page.waitForFunction
- page.waitForNavigation
- page.waitForRequest
- page.waitForResponse
- page.waitForSelector
- page.waitForXPath
It’s best practice to set a very high value instead of completely disabling the timeout. The default is 30000 milliseconds. Set the value to 90k milliseconds or even higher if needed, but avoid
Here is the source, in case you are curious: Ticket, Commit
Solution 2: Set or disable timeouts for a Puppeteer page/tab #
If you want to increase the timeout for all requests and navigational events in a Puppeteer page/tab you can use setDefaultNavigationTimeout
:
// 120s
page.setDefaultNavigationTimeout(120000);
If you aren’t concerned about timeouts but about results, you can simply deactivate the timeouts for a page/tab in Puppeteer:
page.setDefaultNavigationTimeout(0);
This might be more a case for internal applications than for web crawlers.
More content in the Docu.
Over to you. #
Now it’s over to you. You should think about what times you can and have to set. Consider that network connectivity always is a risk you can’t control.
Even internal applications aren’t always safe. If you use networkidle0
and your external fonts take ages, you’ve got a problem. Gracefully handling any delays is key.
🙏🙏🙏
Since you’ve made it this far, sharing this article on your favorite social media network would be highly appreciated 💖! For feedback, please ping me on Twitter.
Published 31 May 2022
Learn how to prevent this exception from being thrown during automation of tasks with Puppeteer in Node.js
During the automation of multiple tasks on my job and personal projects, i decided to move on Puppeteer instead of the old school PhantomJS. One of the most usual problems with pages that contain a lot of content, because of the ads, images etc. is the load time, an exception is thrown (specifically the TimeoutError) after a page takes more than 30000ms (30 seconds) to load totally.
To solve this problem, you will have 2 options, either to increase this timeout in the configuration or remove it at all. Personally, i prefer to remove the limit as i know that the pages that i work with will end up loading someday.
In this article, i’ll explain you briefly 2 ways to bypass this limitation.
A. Globally on the tab
The option that i prefer, as i browse multiple pages in the same tab, is to remove the timeout limit on the tab that i use to browse. For example, to remove the limit you should add:
await page.setDefaultNavigationTimeout(0);
The setDefaultNavigationTimeout method available on a created page of Puppeteer allows you to define the timeout of the tab and expects as first argument, the value in milliseconds. A value of 0 means an unlimited amount of time. The following snippet shows how you can do it in a real example:
// Require puppeteer
const puppeteer = require('puppeteer');
(async () => {
// Create an instance of the chrome browser
// But disable headless mode !
const browser = await puppeteer.launch({
headless: false
});
// Create a new page
const page = await browser.newPage();
// Configure the navigation timeout
await page.setDefaultNavigationTimeout(0);
// Navigate to some website e.g Our Code World
await page.goto('http://ourcodeworld.com');
// Do your stuff
// ...
})();
B. Specifically on the current page
Alternatively, for specifical pages in case that you handle multiple pages on different variables, you should be able to specify the limit on the context as an option in the configuration object of the page.goto
method:
await page.goto('https://ourcodeworld.com', {
waitUntil: 'load',
// Remove the timeout
timeout: 0
});
The following snippet shows how to do it in a real example:
// Require puppeteer
const puppeteer = require('puppeteer');
(async () => {
// Create an instance of the chrome browser
// But disable headless mode !
const browser = await puppeteer.launch({
headless: false
});
// Create a new page
const page = await browser.newPage();
// Configure the navigation timeout
await page.goto('https://ourcodeworld.com', {
waitUntil: 'load',
// Remove the timeout
timeout: 0
});
// Navigate to some website e.g Our Code World
await page.goto('http://ourcodeworld.com');
// Do your stuff
// ...
})();
Happy coding !
Puppeteer is a popular js based Node module that lets you interact with Chromium in a headless manner. It can help your Node js and Javascript based application scrape content from the web or, screenshot pages, in a predictable and reliable way. If you are reading this, chances are you are stuck with an error in the headless chromium installation of Puppeteer which appears to be too common. The goal of this puppeteer tutorial is to fix them and install puppeteer without any errors.
General rule of thumb solution
Check Node version, Chromium version and Puppeteer version. Downgrade all or, any if that helps. Often, there are bugs and only way to deal with is to upgrade or downgrade to move to a more stable version.
To solve puppeteer installation errors, there are a lot of solutions, but we are sharing only the ones we tried and tested. So they are 100% confirmed to work, at least, in our testing environment, which is Ubuntu, but we can’t confirm about yours.
Having errors while running Puppeteer?
Seems like you are not facing this problem alone, others too have faced it and while there’s no elegant solution to it presently, we will walk you through a couple of solutions we tested, and it worked in our development environment.
Headless chrome(-ium) needs a lot of dependencies to work with, and puppeteer doesn’t install them all.
Puppeteer installation is quite tricky and assumes you have all the .so
library files that Chromium needs. This is specifically a problem with puppeteer
and not pupeteer-core
, as puppeteer comes with the chromium. It’s not properly documented in the Puppeteer documentation, and hence a cause of confusion for many developers working on browser automation with puppeteer
.
If you see an error like , Failed to launch chrome
, it’s likely a problem in chromium rather than puppeteer.
Installing Puppeteer
Using npm:
Run npm install puppeteer
for the whole setup consisting of chromium, or, just npm install puppeteer-core
for only puppeteer’s core functionality or, just the puppeteer module, which is suitable if Chromium installation has already been dealt with or, the app is running it remotely on another container/instance.
Troubleshoot common errors in Puppeteer
Case 1: Missing dependencies and libraries
Firstly, get a list of missing dependencies in the local chromium installation using the ldd
command —
ldd chrome | grep not | awk '{print $1}' | awk -F '.so' '{print $1}' | tr 'n' ' '
This command needs to be run in the directory where chromium is installed by puppeteer, which in our case happens to be ~/ss/node_modules/puppeteer/.local-chromium/linux-848005/chrome-linux
, but the
part and linux-848005
path might vary. So that needs to be taken care of, the exact path can be easily found by cd-ing into this path, however — ~/ss/
. It would give back the names of the files not found but are required by Chromium thus giving a hint of the missing dependencies in the installation, by extracting them out with /node_modules/puppeteer/.local-chromium/
grep
, tr
and awk
(which are usually pre-installed in most Linux distros)
Typically, these components are missing in Debian,
libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget libgbm-dev
One can install them by this command,
sudo apt-get install -y libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget libgbm-dev
Solution: Install all the missing dependencies.
Once that’s done, puppeteer is ready to be run, without those pesky errors!
Case 2: Running Chromium as root (SECURITY ALERT!)
You could get errors like running as root without --no-sandbox is not supported puppeteer
The second issue is a way to prevent security issues, and can be disabled by following the below steps –
If this error pops up in the logs, it means Puppeteer is being run as ‘root’ user, which is bad from the security perspective. It’s advisable to be run in a sandboxed environment so, in case of a hack, an attacker can’t do much damage. It can be dangerous if Chromium is being run on a production machine running the main application, if that’s the case — stop now and rethink!
Running as root without --no-sandbox is not supported.
Chromium, by default, doesn’t allow to run itself as root as a security precaution, so that vulnerabilities in Chromium don’t allow privilege escalation through bugs like RCE, which aren’t too uncommon. While this is discouraged, but if one still wishes to take the risk, and run as root on a sandbox container, it can be done by passing the --no-sandbox
flag as arguments to pupeeter.launch() method
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch(
{args: ['--no-sandbox']},
);
console.info(browser);
await browser.close();
})();
Solution: Pass the --no-sandbox
flag while launching chromium instance through args
Case 3: Puppeteer timeout errors
Puppeteer TimeoutError: Navigation timeout of 30000 ms exceeded
is a common error which appears when the page contains some large assets that can consume a lot of resources, and need some time to load. If the page load time is huge, the performance can take a hit. One can include the code in a try-catch
block to handle this error better, but in general a maximum timeout needs to be set, so that puppeteer doesn’t get stuck on a particular page for too long.
For example, adding the following bit of code, would set the default timeout to 5,000 ms or, 5 s. But if the page loads some large resources (like video content), it’s advisable to increase it to 10,000 ms or, 10 s at most.
await page.setDefaultNavigationTimeout(5000);
Solution: Set a custom page timeout.
That’s it…
A list of Puppeteer API methods can be found here, which is a good resource to get started with puppeteer.
Didn’t work? We’ll help you free-of-cost
Was this useful? Say us thanks? and if not, we would like to know what issues you are facing, shoot us a mail, team [@] savebreach.com, if the issues are small we will help you solve it, and also include them in our blog. Let’s connect
When asking for help: send us the environment you are running the software on, the specific configuration, a small stack trace with sensitive information removed, and wait for at least 3 business days for a response.
Subscribe to SaveBreach | Cyber Security, InfoSec, Bug Bounty, Pentesting & more…
Get the latest posts delivered right to your inbox
Great! Check your inbox and click the link to confirm your subscription.
Please enter a valid email address!
Добрый день, есть небольшой парсер на puppeteer.
Возможно это важно, что страницы с jsp расширением.
Может быть как то по особенному работает загрузка динамического содержимого, чем везде?!
Шаги:
1) Авторизация (отрабатывает)
2) Переход на целевую страницу. (отрабатывает)
3) Клик по подгружаемому элементу. (проблема) . Когда запускается браузер парсера, элемент появляется через незначительное кол-во времени. Потом опять исчезает и снова появляется. (с этим не знаю как бороться)
В качестве ожидания не спасают ни один из следующих приемов. Все варианты размещал перед селектором для последующего клика. Ни один вариант не помог:
1) await page.waitForNavigation();
2) await page.waitForSelector
3)
const sleep = (ms) => new Promise((res) => {
setTimeout(res, ms);
});
sleep(3000)
Код:
const puppeteer = require('puppeteer');
const config = require('./config'); // login, password
const sleep = (ms) => new Promise((res) => {
setTimeout(res, ms);
});
const loginUrl = 'loginUrl';
const craftUrl = 'craftUrl';
(async () => {
const browser = await puppeteer.launch({
headless: false,
args: ['--proxy-server=socks5://127.0.0.1:9050']
});
const page = await browser.newPage();
await page.goto(loginUrl);
await page.$eval('#usernameField', (elem, login) => {
elem.value = login;
}, config.login);
await page.$eval('#passwordField', (elem, password) => {
elem.value = password;
}, config.password);
await page.click('#ButtonBox > .OraButton.left');
await page.waitForNavigation();
await page.goto(craftUrl);
sleep(2000);
await page.waitForSelector('.textdiv').then(() => {
let elements = document.getElementsByClassName('textdiv');
elements[0].click()
});
sleep(5000);
await browser.close();
})();
Выходит ошибка:
PAGE LOG: Window.onshow:[object PageTransitionEvent] trusted:true persisted:false
PAGE LOG: AppsLoginPage.loaded=true
PAGE LOG: call AuthenticateUser
PAGE LOG: Refused to set unsafe header «Connection»
PAGE LOG: Failed to load resource: the server responded with a status of 404 (Not Found)
PAGE LOG: window.unload:
PAGE LOG: JSHandle@object
steps.step1
steps.step2
^[[D
(node:5120) UnhandledPromiseRejectionWarning: TimeoutError: waiting for selector «.textdiv» failed: timeout 30000ms exceeded
at new WaitTask (/var/www/Parsers/teset/node_modules/puppeteer/lib/DOMWorld.js:549:28)
at DOMWorld._waitForSelectorOrXPath (/var/www/Parsers/test/node_modules/puppeteer/lib/DOMWorld.js:478:22)
at DOMWorld.waitForSelector (/var/www/Parsers/test/node_modules/puppeteer/lib/DOMWorld.js:432:17)
at Frame.waitForSelector (/var/www/Parsers/test/node_modules/puppeteer/lib/FrameManager.js:627:47)
at Frame. (/var/www/Parsers/test/node_modules/puppeteer/lib/helper.js:112:23)
at Page.waitForSelector (/var/www/Parsers/test/node_modules/puppeteer/lib/Page.js:1125:29)
at /var/www/Parsers/test/index.js:69:20
at process._tickCallback (internal/process/next_tick.js:68:7)
(node:5120) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:5120) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
Подскажите, в чем может быть причина?