Error wget spider

I’m trying to do a health check in a docker container. I found this command:

wget --quiet --tries=1 --spider http://localhost:6077 || exit 1

The issue is that while the container is running, if I run wget without —spider I get a HTTP 200 code, but if using —spider it returns a 404.

Why this could be happening?

$ wget --tries=1  http://localhost:6077
--2019-04-22 04:20:12--  http://localhost:6077/
Resolving localhost (localhost)... 127.0.0.1, ::1
Connecting to localhost (localhost)|127.0.0.1|:6077... connected.
HTTP request sent, awaiting response... 200 OK
Length: 436 [application/xml]
Saving to: ‘index.html.1’


$ wget --tries=1 --spider  http://localhost:6077
Spider mode enabled. Check if remote file exists.
--2019-04-22 04:21:46--  http://localhost:6077/
Resolving localhost (localhost)... 127.0.0.1, ::1
Connecting to localhost (localhost)|127.0.0.1|:6077... connected.
HTTP request sent, awaiting response... 404 Not Found
Remote file does not exist -- broken link!!!

This strange behavior is breaking my health check, if I don’t use —spider I assume wget will try to download the index.html somewhere right?

asked Apr 22, 2019 at 4:22

The accepted answer seems to be incorrect and actually helps you in hiding a bug in your docker container. Adding the --spider option to Wget, will cause Wget to send a HEAD request instead of a GET. Especially in this particular case, where you are not invoking Wget with --recursive.

According to RFC 7231, section 4.3.2, a HEAD request is identical to a GET request except it does not contain the message body. However, in your case, the server seems to return a different response to a HEAD and a GET request. I would call this a bug in your server. Please do not simply invoke Wget without spider and sweep the issue under the rug. This behaviour goes against the HTTP spec and will possibly lead to other issues in the future as clients connecting to it see a wrong response.

answered Apr 22, 2019 at 13:33

darnirdarnir

4,3991 gold badge19 silver badges33 bronze badges

It seems your wget call with --spider doesn’t work as it should. It should also return a HTTP 200 using a HEAD request. See darnir’s answer.

if I don’t use —spider I assume wget will try to download the
index.html somewhere right?

You can set set output document with the -O option if you need a specific filename, e.g.

wget --quiet --tries=1 -O/tmp/docker.html http://localhost:6077

Or if you don’t want any output, you can use -O - to print the result to stdout and then redirect stdout/stderr to /dev/null.

wget -O - http://localhost:6077 &>/dev/null

answered Apr 22, 2019 at 4:41

FreddyFreddy

23.5k1 gold badge18 silver badges56 bronze badges

Источник

1 CONNECT TO WIFI

(I am connecting through a WiFi Direct connection from my mobile phone.
This is an ad-hoc network, I do not know if this could be the cause of my issue, but it is the reason I need the proxy to connect…)

2 SET PROXY

/usr/local/simple_network_setup/proxy-setup
(My proxy is http://192.168.49.1:8000, I set this for http & ftp, no user/pass)
Reboot.

3 OPEN PUPPY PACKAGE MANAGER & UPDATE

Clicked Configure package manager->Update database->Update Now
…Update downloads package lists & pops up a dialog saying it has completed.
(NOTE: This did not work until I rebooted, because my proxy settings had not been set on bootup yet…)

4 SEARCH FOR PROXYCHAINS & TRY TO INSTALL

5 SEARCH FOR DELUGE & TRY TO INSTALL

Found a little info here, but nothing else in that thread worked either. Have there been any significant changes to how puppy package manager connects to the repos? I took a quick look at a few scripts to see if I could find a ping or something that was not making it through the proxy, but I didn’t find anything…

I can’t say much about the ppm, i rarely use it. But you might want to try this enhanced proxy-setup script and see if it makes any difference..

It has nice gui improvements, more error checking, exports more environment variables so that more apps are aware of the proxy settings, etc.

proxy-setup.zip

Appreciate the improved script, but ppm is sticking to it’s guns. I ran your script, rebooted, same error. I don’t understand, because my web browsers work through the proxy. And if I remove all proxy settings and connect to a network that doesn’t use a proxy, ppm works fine. It’s a pain to look for a proper connection where I am at, so I guess I can only install packages when I go out some place that has a better net connection… :/

Does PPM work at all? ie does it work with other simple packages, like a desktop theme pet?
Which puppy do you have the problem with? Is it with any puppy (if you use more that one)?

This is strange as both database update (0setup script) and package download (downlaod_file script) are using wget to download the respective files. Is interesting that in one case works with the proxy and in the other does not. downlaod_file is using the spider mode so this might be an issue.

If indeed database update works with proxy and PPM can still not find ANY package with proxy, then you can tell wget to use a proxy with the -e option ie in line 67 of /usr/sbin/download_file use LANG=C wget -e use_proxy=on -e http_proxy=192.168.49.1:8000 -4 -t 2 -T 20 --waitretry=20 --spider -S "${URLSPEC}" > /tmp/download_file_spider.log1 2>&1
(if it fails try use_proxy=yes as older wget version use this)

If PPM finds some packages and fails in others then is the database that needs updating or the remote server has issues. If you use bionic this is happening these days.

Regarding the proxy command in wget, you may also want to modify accordingly lines 219, 224 and 227 of /usr/local/petget/downloadpkgs.sh

Seems a few people have had this same issue with pupget a long time ago and were able to circumvent the ping test with a hacked .pup of petget, but that does me no good since it is so old.
Here is a thread of someone detailing their frustration over the same issue…

Here is the console output up until it fails at «testing whether packages exist in repository…»
Before I have to open my browser and download them from the same urls that ppm fails to…

root# ppm EXIT=" Ok " /usr/bin/gcc ping: bad address 'distro.ibiblio.org' ping: bad address 'distro.ibiblio.org' /usr/local/petget/installpreview.sh: line 180: 17928 Terminated . /usr/lib/gtkdialog/box_splash -close never -text "$(gettext 'Please wait, processing package database files...')" /usr/local/petget/downloadpkgs.sh: line 240: 18480 Terminated . yaf-splash -bg '#FFD600' -close never -fontsize large -text "$(gettext 'Please wait, testing that packages exist in repository...')" /usr/local/petget/ui_Classic: line 206: 17014 Killed . /usr/lib/gtkdialog/box_splash -close never -text "$(gettext 'Loading Puppy Package Manager...')"

PPM pings to checks if there is a connection but this does not stop it from working.
As you see this is not a «common» issue with puppy, so would be nice if you could help providing some more info to address the issue.
Can you try to see if the suggested changes above helped and give some info for your set up?
How the proxy was set up, so we can automate its detection?. What is the output of echo $http_proxy and cat /var/local/proxy_server for example.

Ok, that’s good to hear. That thread had me thinking I had no recourse. The proxy is setup by an app called PDAnet on my phone. THIS is the setup tutorial they sent to me.

I tried the changes you proposed before, but they seemed to have no effect. I tried them with «use_proxy=on» and «use_proxy=yes» per your suggestion. Tried to install FCEUX, same deal. Errors at testing whether packages exist.

root# echo $http_proxy
http://192.168.49.1:8000
root# cat /var/local/proxy_server
cat: /var/local/proxy_server: No such file or directory
^COULD THAT BE AN ISSUE?^

So this is quite interesting! wget does not work in spider mode, but does work normally… How do I fix that???

Actually in your spider cal you do not have a correct URL (distro.ibiblio.org)
But to test that what is the output of

wget -4 -t 2 -T 20 --waitretry=20 --spider -S http://mirrors.kernel.org/ubuntu/pool/main/b/bolt/bolt_0.2-0ubuntu1_i386.deb

# and then

wget -e use_proxy=on -e http_proxy=192.168.49.1:8000 -4 -t 2 -T 20 --waitretry=20 --spider -S http://mirrors.kernel.org/ubuntu/pool/main/b/bolt/bolt_0.2-0ubuntu1_i386.deb

?
(copy/paste if possible)
Thx

Set these values in /etc/wgetrc:
# You can set the default proxies for Wget to use for http, https, and ftp.
# They will override the value in the environment.
https_proxy = http://192.168.49.1:8000/
http_proxy = http://192.168.49.1:8000/
ftp_proxy = http://192.168.49.1:8000/
`# If you do not want to use proxy at all, set this to off.`
use_proxy = on

no changes detected so far… :/
is this a bug in wget?
any workaround?

The error suggest that might be some issues with your proxy configuration probably the port forwarding or blocking of web crawlers (ie spider mode) by your proxy.
Just in case though try dropping «-4» that enforce IPv4 only. ie wget -t 2 -T 20 --waitretry=20 --spider -S .....

dropping the -4 argument produces the same results…
Read error (Connection reset by peer) in headers.
As the proxy for the connection is created by PDAnet, and not myself, I have no control over it’s settings…

Honestly, I think it’s more likely the wget --spider command is trying to circumvent the proxy settings.

Couldn’t the check for the file be done with a curl script like this [source]?
url=$"http://distro.ibiblio.org/puppylinux/Packages-puppy-xenial64-official"
if curl --output /dev/null --silent --fail -r 0-0 "$url"; then
echo "URL exists: $url"
else
echo "URL does not exist: $url"
fi
because running this script gives me this:
URL exists: http://distro.ibiblio.org/puppylinux/Packages-puppy-xenial64-official

How feasible is this to implement in your opinion?

Curl is more powerful (and bigger) than wget, but is usually is not part of puppy.
A temp way to circumvent the problem is comment out the exit commands (line 244 in downloadpkgs.sh and line 74 in download_file) till a proper solution is found.
The problem here is that If the file does not exist you end up with half-done installations.

… a couple more options in case they can bypass your pesky proxy
Try these 2 commands alone or in combination in wget
--span-hosts and -e robots=off
also try to fake wget as a browser with something like
--user-agent="Mozilla/5.0 (X11; Ubuntu; Linux x86; rv:52.0) Gecko/20100101 Firefox/52.0"

same errors with:
1.) —span-hosts
2.) -e robots=off
3.) —span-hosts -e robots=off
4.) —user-agent=»Mozilla/5.0 (X11; Ubuntu; Linux x86; rv:52.0) Gecko/20100101 Firefox/52.0″

So I tried commenting the two lines you mentioned, but I still receive an error saying it cannot determine the filesizes but they do seem to exist.

This is starting to feel like beating a dead horse. This has become a lot of effort for naught. I just wanted to use the package manager that came with my distro.

ghost

changed the title
Puppy Packager Manager does not respect proxy settings.

Puppy Package Manager does not respect proxy settings.

Apr 20, 2018

OK. Thanks for you input.
You might want to contact your proxy administrator as wget spider mode should work behind proxies.
Regarding using PPM under these conditions you could also comment out lines 117 and 165 of download_file but then you add more potential troubles, specially if you run with a limited size savefile. So I think is not a good idea.
Also you may want to be more specific with your title to facilitate potential fixing. Something like «wget spider mode fails behind a proxy, breaking Puppy Package Manager»

I remember editing the ppm to fix some things i found annoying. all those changes got lost (in early 2016 i think) when i hesitated to add the changes.

I guess this is not an issue in ./1download (build system), as it tries to download packages without checking in spider mode first. Except the arm SD skeleton image, code i didnt touch.

wget should just attempt to download the file without checking in spider mode first.

This is the kind of stuff that just should work even with proxies… they call it superiority.

And it’s a privilege to have someone who can provide all the required evidence to fix this.

He who has ears to hear, let him hear.

wget should just attempt to download the file without checking in spider mode first.

Given the way puppy is built and run, checking for presence and size is important so you do not end up with broken builds or a full savefile (people still use it) or be prompted to update your dbs. Of course for an experience linux user these are not an issue but this is not necessarily the puppy base.

On the other hand PPM could check hashes at startup with remote site to make sure package db are up to date and make sure both that connections are alive and the packages exist, bypassing the spider check. What we need here is hashes for the ibiblio pet databases and a some code in 3builddistro (or 0setup) to download and record the original db hashes.
(One concern here is the «complains» about not explicitly authorised connections and downloads 🙄 but an authorisation could be requested on every run, configurable in settings)

I’m sure there are other solutions (including removing all checks) but before we get to this or a similar exercise should point out that wget could/should work fine with a properly configured proxy.

In 1download it’s easy to tell when a package has been downloaded
correctly or not downloaded at all [ -s ]… the script extracts the
contents to test.

When phil had problems downloading uget and could not figure out why
it failed, i could easily tell it was a corrupted package by looking
at the screen and how the script kept telling the pkg was corrupted
and kept downloading it from all the mirrors…

The problem is that it doesn’t do anything obvious to alert you if you
don’t see it on time, i’ll add red letters and a pause. Or could it
just stop processing to let you know that something must be done
whether you want it or not…

The same behavior can be applied to the ppm. The ppm needs too many
fixes… but i think a new cli package manager (probably written in C)
must be written from scratch and a new primitive ppm should be
frontend.

One time the spider mode was actually making 1download slow as hell
for me, i don’t know why but the spider took like 10 seconds to finish
its task before the actual download happened.. and i had to dowload
400+ packages.

Maybe i should also add a md5sum, or sha256sum.. what should be used now?

Ok, so to anyone reading this, just to be clear, if you are experiencing this very specific issue, commenting out lines 74, 117, & 165 of /usr/sbin/download_file and line 244 in /usr/local/petget/downloadpkgs.sh still gave all the same errors (and new ones!), but it continued to download and install the packages. It’s ugly, it’s a bad method, I know, but it works. Thank you for the help. I appreciate it immensely.

I’m glad we could narrow down the cause a bit. I was going to try to get in touch with the maintainers of wget for a possible solution, but apparently you can only report bugs through a mailing list-thing. Didn’t feel worth the effort.

As I said, the proxy is setup by PDAnet App on my phone. I don’t know why you immediately want to say it’s misconfigured. Wget spider mode does definitely work through an external proxy, but I cannot say I have ever seen it work through a local one. It may be possible that spider mode uses ICMP protocol to communicate. Proxy only supports TCP and UDP packets. This is the problem with Ping.

ghost

changed the title
Puppy Package Manager does not respect proxy settings.

«Wget —spider» not working through local proxy, breaks Puppy Package Manager

Apr 23, 2018

Maybe downloadpkgs should not run wget in spider mode, as it calls download_file, which runs wget in spider mode again, i’ll take a look

This is not related to this issue, but i ended up noticing it..

In installpkg.sh, there is post-processing that only takes in frugal mode, but as far i know it also affects full installs, when you install a package containing DISTRO_ARCHDIR, it overwrites the symlink no matter the pupmode, there’s a workaround to avoid that situation lines above it

‘120102 install may have overwritten a symlink-to-dir… ‘ this can also happen in a full install, quite convoluted code here. The script shouldn’t attempt to fix every possible situation with incompatible or unsuitable packages..

There’s a huge code block i deleted in 2createpackages, in a pull request.. broken code, regarding fixing desktop file icon and category. The same code i see in installpkg.sh… I tested with 30-40 packages containing broken icons paths or something like that, it failed all my tests.

To fix this, we should add a fairly complete hicolor icon theme, i added more IconPaths to jwm_config, allowing it to find many more icons (this fixed quite a few of the missing icons in my build)… There are also broken icon paths for some old pet packages, that requires fixing those packages

In short, there’s code fix and delete..

Источник

Работает следующее:

ps aux | cut -c1-$(stty size </dev/tty | cut -d' ' -f2)

Это тоже работает:

v=$(stty size | cut -d' ' -f2) ; ps aux | cut -c1-$v

Проблема, похоже, в том, что stty должен иметь tty на стандартном входе, чтобы функционировать. Приведенные выше два подхода решают эту проблему.

Есть еще один вариант. В то время как stdin и stdout stty перенаправляются в приведенных выше командах, stderr не перенаправляется: он по-прежнему указывает на терминал. Как ни странно, stty также будет работать, если ему в качестве входных данных указать stderr:

ps aux | cut -c1-$(stty size <&2 | cut -d' ' -f2)

22.04.2019, 07:22

Ссылка

2 ответа

Кажется, ваш вызов wget с --spiderне работает должным образом. Он также должен вернуть HTTP 200, используя запрос HEAD. См. ответ Дарнира .

if I don’t use —spider I assume wget will try to download the
index.html somewhere right?

Вы можете установить выходной документ с помощью опции -O, если вам нужно конкретное имя файла, например.

wget --quiet --tries=1 -O/tmp/docker.html http://localhost:6077

Или, если вам не нужен вывод, вы можете использовать -O -для вывода результата на стандартный вывод, а затем перенаправить стандартный вывод/stderr на /dev/null.

wget -O - http://localhost:6077 &>/dev/null

Freddy
27.01.2020, 23:41

Ссылка

Принятый ответ кажется неверным и на самом деле помогает вам скрыть ошибку в вашем док-контейнере. Добавление опции --spiderв Wget заставит Wget отправлять запрос HEADвместо GET. Особенно в этом конкретном случае, когда вы не вызываете Wget с --recursive.

Согласно RFC 7231, раздел 4.3.2, запрос HEADидентичен запросу GET, за исключением того, что он не содержит тела сообщения. Однако в вашем случае сервер, похоже, возвращает разные ответы на запросы HEADи GET. Я бы назвал это ошибкой на вашем сервере. Пожалуйста, не запускайте Wget без паука и не заметайте проблему под ковер. Такое поведение противоречит спецификации HTTP и, возможно, приведет к другим проблемам в будущем, поскольку клиенты, подключающиеся к нему, увидят неправильный ответ.

darnir
27.01.2020, 23:41

Ссылка

1 CONNECT TO WIFI

2 SET PROXY

3 OPEN PUPPY PACKAGE MANAGER & UPDATE

4 SEARCH FOR PROXYCHAINS & TRY TO INSTALL

5 SEARCH FOR DELUGE & TRY TO INSTALL

Теги

Похожие вопросы

Читайте также: