Wget for windows command — Ваш верный помощник с OS Windows

What does WGET Do?

Once installed, the WGET command allows you to download files over the TCP/IP protocols: FTP, HTTP and HTTPS.

If you’re a Linux or Mac user, WGET is either already included in the package you’re running or it’s a trivial case of installing from whatever repository you prefer with a single command.

Unfortunately, it’s not quite that simple in Windows (although it’s still very easy!).

To run WGET you need to download, unzip and install manually.

Install WGET in Windows 10

Download the classic 32 bit version 1.14 here or, go to this Windows binaries collection at Eternally Bored here for the later versions and the faster 64 bit builds.

Here is the downloadable zip file for version 1.2 64 bit.

If you want to be able to run WGET from any directory inside the command terminal, you’ll need to learn about path variables in Windows to work out where to copy your new executable. If you follow these steps, you’ll be able to make WGET a command you can run from any directory in Command Prompt.

Run WGET from anywhere

Firstly, we need to determine where to copy WGET.exe.

After you’d downloaded wget.exe (or unpacked the associated distribution zip files) open a command terminal by typing “cmd” in the search menu:

type: cmd in the search bar of Windows 10

We’re going to move wget.exe into a Windows directory that will allow WGET to be run from anywhere.

First, we need to find out which directory that should be. Type:

path

You should see something like this:

Thanks to the “Path” environment variable, we know that we need to copy wget.exe to the c:\Windows\System32 folder location.

Go ahead and copy WGET.exe to the System32 directory and restart your Command Prompt.

Restart command terminal and test WGET

If you want to test WGET is working properly, restart your terminal and type:

wget -h

If you’ve copied the file to the right place, you’ll see a help file appear with all of the available commands.

So, you should see something like this:

A successful installation of WGET in Windows

Now it’s time to get started.

Get started with WGET

Seeing that we’ll be working in Command Prompt, let’s create a download directory just for WGET downloads.

To create a directory, we’ll use the command md (“make directory”).

Change to the c:/ prompt and type:

md wgetdown

Then, change to your new directory and type “dir” to see the (blank) contents.

Now, you’re ready to do some downloading.

Example commands

Once you’ve got WGET installed and you’ve created a new directory, all you have to do is learn some of the finer points of WGET arguments to make sure you get what you need.

The Gnu.org WGET manual is a particularly useful resource for those inclined to really learn the details.

If you want some quick commands though, read on. I’ve listed a set of instructions to WGET to recursively mirror your site, download all the images, CSS and JavaScript, localise all of the URLs (so the site works on your local machine), and save all the pages as a .html file.

To mirror your site execute this command:

wget -r https://www.yoursite.com

To mirror the site and localise all of the urls:

wget --convert-links -r https://www.yoursite.com

To make a full offline mirror of a site:

wget --mirror --convert-links --adjust-extension --page-requisites --no-parent https://www.yoursite.com

To mirror the site and save the files as .html:

wget --html-extension -r https://www.yoursite.com

To download all jpg images from a site:

wget -A "*.jpg" -r https://www.yoursite.com

For more filetype-specific operations, check out this useful thread on Stack.

Set a different user agent:

Some web servers are set up to deny WGET’s default user agent – for obvious, bandwidth saving reasons. You could try changing your user agent to get round this. For example, by pretending to be Googlebot:

wget --user-agent="Googlebot/2.1 (+https://www.googlebot.com/bot.html)" -r https://www.yoursite.com

Wget “spider” mode:

Wget can fetch pages without saving them which can be a useful feature in case you’re looking for broken links on a website. Remember to enable recursive mode, which allows wget to scan through the document and look for links to traverse.

wget --spider -r https://www.yoursite.com

You can also save this to a log file by adding this option:

wget --spider -r https://www.yoursite.com -o wget.log

Enjoy using this powerful tool, and I hope you’ve enjoyed my tutorial. Comments welcome!

Источник

Linux wget: ваш загрузчик командной строки

Wget — это открыто распостраняемая утилита для загрузки файлов из интернет.
Он поддерживает HTTP, FTP, HTTPS и другие протоколы, а также средство аутентификации и множество других опций.

Если вы пользователь Linux или Mac, WGET либо уже включен в пакет, который вы используете, либо это простой случай установки из любого репозитория, который вы предпочитаете, с помощью одной команды.

Как установить команду wget в Linux
Используйте команду apt / apt-get, если вы работаете в Ubuntu / Debian / Mint Linux:
$ sudo apt install wget

Пользователь Fedora Linux должен ввести команду dnf
$ sudo dnf install wget

Пользователь RHEL / CentOS / Oracle Linux должен ввести команду yum :
$ sudo yum install wget

Пользователь SUSE / OpenSUSE Linux должен ввести команду zypper:
$ zypper install wget

Пользователь Arch Linux должен ввести команду pacman:
$ sudo pacman -S wget

К сожалению, в Windows все не так просто (хотя не так сложно!).

Для запуска WGET вам необходимо скачать, распаковать и установить утилиту вручную.

Установите WGET в Windows 10

Загрузите классическую 32-разрядную версию 1.14 здесь или перейдите в эту коллекцию двоичных файлов Windows на сайте Eternal Bored здесь, чтобы получить более поздние версии и более быстрые 64-разрядные сборки.

Вот загружаемый zip-файл для 64-разрядной версии 1.2.

Если вы хотите иметь возможность запускать WGET из любого каталога в терминале, вам нужно будет узнать о переменных пути в Windows, чтобы решить, куда копировать новый исполняемый файл. Если вы это сделаете, то сможете сделать WGET командой, которую можно запускать из любого каталога в командной строке, это отдельная тема по настройке Windows.

Запуск WGET из любого места

Во-первых, нам нужно определить, куда копировать WGET.exe.

Мы собираемся переместить wget.exe в каталог Windows, который позволит запускать WGET из любого места.

После того, как вы загрузили wget.exe (или распаковали связанные с ним zip-файлы дистрибутива), откройте командный терминал, набрав «cmd» в меню поиска и запустите командную строку.

Во-первых, нам нужно выяснить, в каком каталоге это должно быть. В командную строку введите:

path

Вы должны увидеть что-то вроде этого:

Благодаря переменной окружения “Path” мы знаем, что нам нужно скопировать wget.exe в папку c:\Windows\System32.

Скопируйте WGET.exe в каталог System32 и перезапустите командную строку.

Если вы хотите проверить правильность работы WGET, перезапустите терминал и введите:

wget -h

Если вы скопировали файл в нужное место, вы увидите файл справки со всеми доступными командами.
Итак, вы должны увидеть что-то вроде этого:

Начнем работать с WGET
Мы будем работать в командной строке, поэтому давайте создадим каталог загрузок только для загрузок WGET.

Чтобы создать каталог, воспользуемся командой md («создать каталог»).

Перейдите в корневой каталог c: / и введите команду:

md wgetdown

Затем перейдите в новый каталог и введите «dir», и вы увидите (пустое) содержимое.

После того, как вы установили WGET и создали новый каталог, все, что вам нужно сделать, это изучить некоторые тонкости аргументов WGET, чтобы убедиться, что вы получаете то, что вам нужно.

Руководство Gnu.org WGET — особенно полезный ресурс для тех, кто действительно хочет узнать подробности.

Вот несколько советов, как извлечь из этого максимум пользы:

Linux wget примеры команд
Синтаксис:
wget url
wget [options] url

Давайте посмотрим на некоторые распространенные примеры команд Linux wget, синтаксис и использование.

WGET можно использовать для:

Скачать один файл с помощью wget
$ wget https://cyberciti.biz/here/lsst.tar.gz

Загрузить несколько файлов с помощью wget
$ wget https://cyberciti.biz/download/lsst.tar.gz ftp://ftp.freebsd.org/pub/sys.tar.gz ftp://ftp.redhat.com/pub/xyz-1rc-i386.rpm

Можно прочитать URL из файла
Вы можете поместить все URL в текстовый файл и использовать опцию -i, чтобы wget загрузил все файлы. Сначала создайте текстовый файл:
$ xed /temp/download.txt

Добавить список URL:
https://cyberciti.biz/download/lsst.tar.gz
ftp://ftp.freebsd.org/pub/sys.tar.gz
ftp://ftp.redhat.com/pub/xyz-1rc-i386.rpm
Введите команду wget следующим образом:
$ wget -i /temp/download.txt

Можно ограничить скорость загрузки
$ wget -c -o /temp/susedvd.log —limit-rate=50k ftp://ftp.novell.com/pub/suse/dvd1.iso

Используйте wget с сайтами, защищенными паролем
Вы можете указать http имя пользователя / пароль на сервере следующим образом:
$ wget —http-user=vivek —http-password=Secrete http://cyberciti.biz/vivek/csits.tar.gz
Другой способ указать имя пользователя и пароль — в самом URL.
$ wget ‘http://username:password@cyberciti.biz/file.tar.gz

Скачать все mp3 или pdf файлы с удаленного FTP сервера
$ wget ftp://somedom-url/pub/downloads/*.mp3
$ wget ftp://somedom-url/pub/downloads/*.pdf

Скачать сайт целиком
$ wget -r -k -l 7 -p -E -nc https://site.com/

Рассмотрим используемые параметры:

-r — указывает на то, что нужно рекурсивно переходить по ссылкам на сайте, чтобы скачивать страницы.
-k — используется для того, чтобы wget преобразовал все ссылки в скаченных файлах таким образом, чтобы по ним можно было переходить на локальном компьютере (в автономном режиме).
-p — указывает на то, что нужно загрузить все файлы, которые требуются для отображения страниц (изображения, css и т.д.).
-l — определяет максимальную глубину вложенности страниц, которые wget должен скачать (по умолчанию значение равно 5, в примере мы установили 7). В большинстве случаев сайты имеют страницы с большой степенью вложенности и wget может просто «закопаться», скачивая новые страницы. Чтобы этого не произошло можно использовать параметр -l.
-E — добавлять к загруженным файлам расширение .html.
-nc — при использовании данного параметра существующие файлы не будут перезаписаны. Это удобно, когда нужно продолжить загрузку сайта, прерванную в предыдущий раз.

По умолчанию wget загружает файл и сохраняет его с оригинальным именем в URL — в текущем каталоге.

Здесь я перечислил набор инструкций для WGET для рекурсивного зеркалирования вашего сайта, загрузки всех изображений, CSS и JavaScript, локализации всех URL-адресов (чтобы сайт работал на вашем локальном компьютере) и сохранения всех страниц как .html файл.

Чтобы скачать ваш сайт, выполните эту команду:

wget -r https://www.yoursite.com

Чтобы скачать сайт и локализовать все URL:

wget —convert-links -r https://www.yoursite.com

Чтобы создать полноценное оффлайн зеркало сайта:

wget —mirror —convert-links —adjust-extension —page-requisites —no-parent https://www.yoursite.com

Чтобы скачать сайт и сохранить файлы как .html:

wget —html-extension -r https://www.yoursite.com

Чтобы скачать все изображения в формате jpg с сайта:

wget -A «*.jpg» -r https://www.yoursite.com

Дополнительные сведения об операциях, связанных с конкретным типом файлов, можно найти в этой полезной ветке на Stack .

Установите другой пользовательский агент:

Некоторые веб-серверы настроены так, чтобы запрещать пользовательский агент WGET по умолчанию — по очевидным причинам экономии полосы пропускания. Вы можете попробовать изменить свой пользовательский агент, чтобы обойти это. Например, притворившись роботом Google:

wget —user-agent=»Googlebot/2.1 (+https://www.googlebot.com/bot.html)» -r https://www.yoursite.com

Wget режим «паук»:

Wget может получать страницы без их сохранения, что может быть полезной функцией, если вы ищете неработающие ссылки на веб-сайте. Не забудьте включить рекурсивный режим, который позволяет wget сканировать документ и искать ссылки для перехода.

wget —spider -r https://www.yoursite.com

Вы также можете сохранить это в файл журнала, добавив эту опцию:

wget —spider -r https://www.yoursite.com -o wget.log

wget -m -l 10 -e robots=off -p -k -E —reject-regex «wp» —no-check-certificate -U=«Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36» site-addr.com

Как найти неработающие ссылки на вашем сайте

wget —spider -r -nd -nv -H -l 2 -w 2 -o run1.log https://site.by

Наслаждайтесь использованием этого мощного инструмента, и я надеюсь, что вам понравился мой урок.

Источник

Download Files from the Windows Command Line with Wget

(Image credit: Tom’s Hardware)

Most users will download files onto their PC using their web browser. There’s a problem with this method, however—it’s not particularly efficient. If you need to pause your download, or if you’ve lost your connection, you’ll probably need to start your download again from scratch. You may also be working with Python or other code at the command line and want to download directly from the command prompt.

That’s where tools like Wget come in. This command line tool has a number of useful features, with support for recursive downloads and download resumption that allows you to download single files (or entire websites) in one go.

Wget is popular on Linux and other Unix-based operating systems, but it’s also available for Windows users. Below, we’ll explain how to install and use Wget to download any content you want online from your Windows command line.

Installing GNU Wget on Windows

Wget (in name, at least) is available on Windows 10 and 11 via the PowerShell terminal. However, this version of Wget isn’t the same as the GNU Wget tool that you’d use on a Linux PC. Instead, this version is simply an alias for a PowerShell command called Invoke-WebRequest.

Invoke-WebRequest is Wget-like in what it does, but it’s a completely different tool that’s much more difficult to use and understand. Instead, you’ll be better served by installing Wget for Windows, a compiled version of the same tool available for Linux users, using the steps below.

1. Download the Wget for Windows setup file from the Wget website. You’ll need to do this using your web browser.

Get Tom’s Hardware’s best news and in-depth reviews, straight to your inbox.

2. Run the Wget for Windows installer file. Once the Wget setup file has finished downloading, run the setup file and follow the on-screen instructions to complete the installation.

(Image credit: Tom’s Hardware)

3. Update the Wget.exe file (optional). The Wget installer is packaged with a fairly old version of the Wget binary. If you run into difficulties downloading files because of SSL certificate errors, you should download the latest wget.exe for your architecture from this website and save it to your Wget installation directory (typically C:\Program Files (x86)\GnuWin32\bin). This step is optional, but highly recommended.

(Image credit: Tom’s Hardware)

4. Open the Start menu, search for environment variables, and click Open. Once the installation is finished, use the search tool in the Start menu to search for environment variables, then click Open. You’ll need to do this to allow you to use the ‘wget’ command from the command line without referencing its location every time you wish to run it.

(Image credit: Tom’s Hardware)

5. Click Environment Variables in the System Properties window.

(Image credit: Tom’s Hardware)

6. Select Path and click Edit under System or User variables.

(Image credit: Tom’s Hardware)

7. Click the New button and type in the directory for the Wget for Windows binary (.exe) file. By default, this should be C:\Program Files (x86)\GnuWin32\bin.

(Image credit: Tom’s Hardware)

8. Save your changes. When you’re finished, click OK in each menu and exit System Properties.

(Image credit: Tom’s Hardware)

9. Open the Start menu, type cmd, and press Open. This will launch a new command prompt window. You can also use the newer Terminal app, as long as you switch to using a command prompt shell.

(Image credit: Tom’s Hardware)

10. Type wget —version and press Enter. If Wget was installed correctly, you should see the GNU Wget version returned in the command prompt window.

(Image credit: Tom’s Hardware)

If you want to run Wget from a PowerShell terminal instead, you’ll need to run the file from its installation directory directly (eg. C:\Program Files (x86)\GnuWin32\bin\wget.exe).

Downloading Files with Wget

Once you’ve installed GNU Wget and you’ve configured the environment variables to be able to launch it correctly, you’ll be able to use it to start downloading files and webpages.

We’ve used an example domain and file path in our examples below. You’ll need to replace this with the correct path to the file (or files) that you want to download.

Type wget -h to see a full list of commands. This will give you the full list of options that you can use with Wget.
wget -h

(Image credit: Tom’s Hardware)

Download a single file using wget <url>. Replace <url> with the path to a file on an HTTP, HTTPS, or FTP server. You can also refer to a website domain name or web page directly to download that specific page (without any of its other content).
wget example.com

(Image credit: Tom’s Hardware)

Save with a different filename using -O. Using the -O option, you’ll be able to save the file with a different filename. For example, wget -O <filename> <url>, where <filename> is the filename you’ve chosen.
wget -O example.html example.com

(Image credit: Tom’s Hardware)

Save to a different directory using -P. If you want to save to another directory than the one you’re currently in, use the -P option. For example, wget -P <path> <url>.
wget -P C:\folder example.com

(Image credit: Tom’s Hardware)

Use —continue or -c to resume files. If you want to resume a partial download, use the -c option to resume it, as long as you’re in the same directory. For example, wget -c <url>.
wget -c example.com

(Image credit: Tom’s Hardware)

Download multiple files in sequence. If you want to download several files, add each URL to your Wget command. For example, wget <url1> <url2> etc.
wget example.com tomshardware.com

(Image credit: Tom’s Hardware)

Download multiple files using a text file with -i. Using the -i option, you can refer to a text file that contains a list of URLs to download a large number of files. Assuming that each URL is on a new line, Wget will download the content from each URL in sequence. For example, wget -i <file.txt> <url>.
wget -i urls.txt

(Image credit: Tom’s Hardware)

Limit download speeds using —limit-rate. If you want to limit your bandwidth usage, you can cap the download speeds using the —limit-rate option. For example, wget —limit-rate=1M <url> would limit it to 1 megabyte per second download speeds, while wget —limit-rate=10K <url> would limit it to 10 kilobytes per second.
wget —limit-rate=10K example.com

(Image credit: Tom’s Hardware)

Use -w or –wait to set a pause period after each download. If you’re downloading multiple files, using -w can help to spread the requests you make and help to limit any chance that your downloads are blocked. For example, wget -w 10 <url1> <url2> for a 10 second wait.

wget -w 10 example.com tomshardware.com

(Image credit: Tom’s Hardware)

Set a retry limit using -t or —tries. If a download fails, wget will use the -t value to determine how many times it’ll attempt it again before it stops. The default value is 20 retries. If the file is missing, or if the connection is refused, then this value is ignored and Wget will terminate immediately.
wget -t 5 example.com

(Image credit: Tom’s Hardware)

Save a log using -o or -a. You can save your log data to a text file using -o (to always create a new log file) or -a (to append to an existing file). For example, wget -o <file.txt> <url>.

(Image credit: Tom’s Hardware)

Bypass SSL errors using —no-check-certificate. If you’re having trouble downloading from a web server with an SSL certificate and you’ve already updated your Wget installation, bypass the SSL certificate check completely using —no-check-certificate to allow the download (in most cases). You should only do this for downloads from locations that you completely trust. For example, wget —no-check-certificate example.com.

wget —no-check-certificate https://example.com

(Image credit: Tom’s Hardware)

Make sure to use the wget -h or wget —help command to view the full list of options that are available to you. If you run into trouble with Wget, make sure to limit the number of retries you make and set a wait limit for each download you attempt.

Using Wget for Recursive Downloads

One of Wget’s most useful features is the ability to download recursively. Instead of only downloading a single file, it’ll instead try to download an entire directory of related files.

For instance, if you specify a web page, it’ll download the content attached to that page (such as images). Depending on the recursive depth you choose, it can also download any pages that are linked to it, as well as the content on those pages, any pages that are linked on those pages, and so on.

Theoretically, Wget can run with an infinite depth level, meaning it’ll never stop trying to go further and deeper with the content it downloads. However, from a practical point of view, you may find that most web servers will block this level of scraping, so you’ll need to tread carefully.

Type wget -r or wget —recursive to download recursively. By default, the depth level is five. For example, wget -r <url>.
wget -r tomshardware.com

(Image credit: Tom’s Hardware)

Use -l or –level to set a custom depth level. For example, wget -r -l 10 <url>. Use wget -r -l inf <url> for an infinite depth level.
wget -r -l 10 tomshardware.com

(Image credit: Tom’s Hardware)

Use -k to convert links to local file URLs. If you’re scraping a website, Wget will automatically convert any links in HTML to point instead to the offline copy that you’ve downloaded. For example, wget -r -k <url>.
wget -r -k tomshardware.com

(Image credit: Tom’s Hardware)

Use -p or —page-requisites to download all page content. If you want a website to fully download so that all of the images, CSS, and other page content is available offline, use the -p or —page-requisites options. For example, wget -r -p <url>.
wget -r -p tomshardware.com

(Image credit: Tom’s Hardware)

For a full list of options, make sure to use the wget —h command. You should also take care to respect any website that you’re actively downloading from and do your best to limit server loads using wait, retry, and depth limits.

If you run into difficulties with downloads because of SSL certificate errors, don’t forget to update your Wget binary file (wget.exe) with the latest version.

Источник

Sometimes, you need to download a file directly from the Command Prompt (CMD). It’s simple, quick, and doesn’t require opening a browser. Here’s how to do it step-by-step. 😉

CURL Command on CMD

For Windows 10 and later, there is a built-in tool called Curl that can be used to download files using the command line. It’s pretty simple: open your command prompt by pressing the Win + R key on your keyboard, typing cmd on the search bar that appeared, and then hitting the Enter key.
Then type the below command, replacing the download URL of your file:

curl -O https://example.com/file.zip

This will download the file and save it with the same name as on the website.

curl command example screenshot in which shows a method for download files using Windows CMD (Command Prompt)

Want to rename the file as you download it? Use this:
curl -o MyFile.zip https://example.com/file.zip

Invoke-WebRequest Command on PowerShell

Invoke-WebRequest functions similarly to the curl command, but in PowerShell. To download a file using this command, replace the URL with your file’s URL as shown in the command below:

Invoke-WebRequest https://example.com/file.zip -OutFile file.zip

This will save the file as file.zip in your current directory.

Invoke-WebRequest command example screenshot in which shows a method for download files using Windows CMD (Command Prompt)

BITSADMIN Command for Older Versions of Windows

Windows XP, Vista, Windows 7, 8/8.1, and Windows 10 don’t support the curl and Invoke-WebRequest commands. In this case, you can use bitsadmin. To use it, open CMD and type the below command, replacing the download URL of your file:

bitsadmin /transfer myDownloadJob /download /priority normal https://example.com/file.zip C:\Users\yourusername\Downloads\file.zip

The file will be saved to the location C:\Downloads.

bitsadmin command example screenshot in which shows a method for download files using Windows CMD (Command Prompt)

WGET Command for Linux Lovers on Windows

The WGET command is available on Windows but requires manual installation. It is a built-in utility in Linux, making it easier for Linux users to use the same command on Windows OS as well.

To use WGET, download wget from the internet (Google “wget for Windows”). And then add to the system path. To do this, follow the steps:

Open your browser and search for “wget for Windows.”
Go to a trusted source like Eternallybored.org or another reliable website.

wget installation website example screenshot in which shows a method for download files using Windows CMD (Command Prompt)

Download the wget.exe file. Choose the version that matches your system (32-bit or 64-bit).
Once downloaded, locate the wget.exe file (usually in your Downloads folder).
Move it to the location:

C:\Windows\System32\

Copying the wget file to correct location in Windows

Now you’re all set to use wget on your Windows system for downloading files. You can now open CMD and type after replacing the file URL in the below sample command:

wget https://example.com/file.zip

wget command example screenshot in which shows a method for download files using Windows CMD (Command Prompt)

With this guide, you’ve covered all the methods to download a file from the CMD (command line) like a pro! 😃👏

Powerful VPS Hosting, Without the Premium Price

Get the performance you need at a price you’ll love. Explore our affordable VPS plans.

See Pricing

Источник

WGET is a free tool to crawl websites and download files via the command line.

In this wget tutorial, we will learn how to install and how to use wget commands with examples.

How to Use WGET command (with Examples)

What is Wget?

Wget is free command-line tool created by the GNU Project that is used todownload files from the internet.

It lets you download files from the internet via FTP, HTTP or HTTPS (web pages, pdf, xml sitemaps, etc.).
It provides recursive downloads, which means that Wget downloads the requested document, then the documents linked from that document, and then the next, etc.
It follows the links and directory structure.
It lets you overwrite the links with the correct domain, helping you create mirrors of websites.

What Is the Wget Command?

The wget command is a tool developed by the GNU Project to download files from the web. Wget allows you to retrieve content and files from web servers using a command-line interface. The name “wget” comes from “World Wide Web” and “get”. Wget supports downloads via FTP, SFTP, HTTP, and HTTPS protocols.

Wget is used by developers to automate file downloads.

WGet Command

Install Wget

To install wget on Windows, install the executable file from eternallybored.org. To install wget on Mac, use the brew install wget command on Mac. Make sure that it is not already installed first by running the wget -V command in the command line interface. For more details on how to install Wget, read one of the following tutorials.

Install Wget on Mac
Install Wget on Windows
Install Wget on Linux

Downloading Files From the Command Line (Wget Basics)

Let’s look at the wget syntax, view the basic commands structure and understand the most important options.

Wget Syntax

Wget has two arguments: [OPTION] and [URL] .

wget [OPTION]... [URL]...

[OPTION] tells what to do with the [URL] argument provided after. It has a short and a long-form (ex: -V and --version are doing the same thing).
[URL] is the file or the directory you wish to download.
You can call many OPTIONS or URLs at once.

View WGET Arguments

To view available wget Arguments, use the wget help command:

The output will show you an exhaustive list of all the wget command parameters.

Here are the 11 best things that you can do with Wget:

Download a single file
Download a files to a specific directory
Rename a downloaded files
Define User Agent
Extract as Googlebot
Extract Robots.txt when it changes
Convert links on a page
Mirror a single page
Extract Multiple URLs from a list
Limit Speed
Number of attempts
Use Proxies
Continue Interrupted Downloads
Extract Entire Website

Download a single file with Wget

$ wget https://example.com/robots.txt

Download a File to a Specific Output Directory

Here replace <YOUR-PATH> by the output directory location where you want to save the file.

$ wget ‐P &lt;YOUR-PATH> https://example.com/sitemap.xml

Rename Downloaded File when Retrieving with Wget

To output the file with a different name:

$ wget -O &lt;YOUR-FILENAME.html> https://example.com/file.html

Define User Agent in WGET

Identify yourself. Define your user-agent.

$ wget --user-agent=Chrome https://example.com/file.html

Extract as Googlebot with Wget Command

$ wget --user-agent="Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" https://example.com/path

Let’s extract robots.txt only if the latest version in the server is more recent than the local copy.

First time that you extract use -S to keep a timestamps of the file.

$ wget -S https://example.com/robots.txt

Later, to check if the robots.txt file has changed, and download it if it has.

$ wget -N https://example.com/robots.txt

Wget command to Convert Links on a Page

Convert the links in the HTML so they still work in your local version. (ex: example.com/path to localhost:8000/path)

$ wget --convert-links https://example.com/path

Mirror a Single Webpage in Wget

To mirror a single web page so that it can work on your local.

$ wget -E -H -k -K -p --convert-links https://example.com/path

Add all urls in a urls.txt file.

https://example.com/1
https://example.com/2
https://example.com/3

To be a good citizen of the web, it is important not to crawl too fast by using --wait and --limit-rate.

--wait=1: Wait 1 second between extractions.
--limit-rate=10K: Limit the download speed (bytes per second)

Define Number of Retry Attempts in Wget

Sometimes the internet connection fails, sometimes the attempts it blocked, sometimes the server does not respond. Define a number of attempts with the -tries function.

$ wget -tries=10 https://example.com

How to Use Proxies With Wget?

To set a proxy with Wget, we need to update the ~/.wgetrc file located at /etc/wgetrc.

You can modify the ~/.wgetrc in your favourite text editor.

$ vi ~/.wgetrc # VI
$ code ~/.wgetrc # VSCode

And add these lines to the wget parameters:

use_proxy = on
http_proxy =  http://username:password@proxy.server.address:port/
https_proxy =  http://username:password@proxy.server.address:port/

Then, by running any wget command, you’ll be using proxies.

Alternatively, you can use the -e command to run wget with proxies without changing the environment variables.

wget -e use_proxy=yes -e http_proxy=http://proxy.server.address:port/ https://example.com

How to remove the Wget proxies?

When you don’t want to use the proxies anymore, update the ~/.wgetrc to remove the lines that you added or simply use the command below to override them:

Continue Interrupted Downloads with Wget

When your retrieval process is interrupted, continue the download with restarting the whole extraction using the -c command.

$ wget -c https://example.com

Recursive mode extract a page, and follows the links on the pages to extract them as well.

This is extracting your entire site and can put extra load on your server. Be sure that you know what you do or that you involve the devs.

$ wget --recursive --page-requisites --adjust-extension --span-hosts --wait=1 --limit-rate=10K --convert-links --restrict-file-names=windows --no-clobber --domains example.com --no-parent example.com

Command	What it does
–recursive	Follow links in the document. The maximum depth is 5.
–page-requisites	Get all assets (CSS/JS/images)
–adjust-extension	Save files with .html at the end.
–span-hosts	Include necessary assets from offsite as well.
–wait=1	Wait 1 second between extractions.
–limit-rate=10K	Limit the download speed (bytes per second)
–convert-links	Convert the links in the HTML so they still work in your local version.
–restrict-file-names=windows	Modify filenames to work in Windows.
–no-clobber	Overwrite existing files.
–domains example.com	Do not follow links outside this domain.
–no-parent	Do not ever ascend to the parent directory when retrieving recursively
–level	Specify the depth of crawling. `inf` is used for infinite.

$ wget --spider -r https://example.com -o wget.log

Wget VS Curl

Wget’s strength compared to curl is its ability to download recursively. This means that it will download a document, then follow the links and then download those documents as well.

Use Wget With Python

Wget is strictly command line, but there is a package that you can import the wget package that mimics wget.

import wget
url = 'http://www.jcchouinard.com/robots.txt'
filename = wget.download(url)
filename

Debugging: What to Do When Wget is Not Working

Wget Command Not Found

If you get the -bash: wget: command not found error on Mac, Linux or Windows, it means that the wget GNU is either not installed or does not work properly.

Go back and make sure that you installed wget properly.

Wget is not recognized as an internal or external command

If you get the following error

'wget' is not recognized as an internal or external command, operable program or batch file

It is more than likely that the wget package was not installed on Windows. Fix the error by installing wget first and then start over using the command.

Otherwise, it may also mean that the wget command is not not found in your system’s PATH.

Adding Wget to the System’s Path (Windows)

Adding the wget command to the system’s path will allow you to run wget from anywhere.

To add wget to the Windows System ‘s Path you need to copy the wget.exe file to the right directory.

Download the wget file for Windows
Press Windows + E to open File Explorer.
Find where you downloaded wget.exe (e.g. Downloads folder)
Copy the wget.exe file
Paste into the System Directory (System32 is already in your system’s path)
- Go to C:\Windows\System32.
- Paste your wget.exe file into your System32 folder

wget: missing URL

The “wget: missing URL” error message occurs when you run the wget command without providing a URL to download.

One of the use cases that I have seen this is when users used flags without the proper casing.

$ wget -v
# wget: missing URL

Above the casing of the v flag should not be lowercase, but uppercase.

Or use the verbose way of calling it with the double-dash and full name.

$ wget --version
# No error

Alternatives to Wget on Mac and Windows

You can use cURL as an alternative of Wget command line tool. It also has to be installed on Mac, Linux and Windows.

Wget for Web Scraping

By allowing you to download files from the Internet, the wget command-line tool is incredibly useful in web scraping. It has a set of useful features that make web scraping easy:

Batch Downloading: wget allows you to download multiple files or web pages in a single command.
Recursive Downloading: the --recursive flag in wget allows you to follow links and download an entire website
Retries: wget is designed to handle unstable network connections and interruptions and retry failed extractions
Command-line options: Options are available to improve scraping capabilities (download speed, User-Agent headers, cookies for authentication, etc.).
Header and User-Agent Spoofing: To avoid being blocked by websites when web scraping, wget allows you to change the User-Agent header to make your requests appear more regular users.
Limiting Server Load: By using the --wait and --limit-rate options, you can control the speed at which wget fetches data.

About Wget

Wget was developed by	Hrvoje Nikšić
Wget is Maintained by	Tim Rühsen and al.
Wget Supported Protocols	HTTP(S), FTP(S)
Wget was Created In	January 1996
Installing Wget	brew install wget
Wget Command	wget [`option`]…[`URL`]…

Detail table about WGET

Wget FAQs

What is Wget Used For?

Wget is used to download files from the Internet without the use of a browser. It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies.

How Does Wget Work?

Wget is non-interactive and allows to download files from the internet in the background without the need of a browser or user interface. It works by following links to create local versions of remote web sites, while respecting robots.txt.

What is the Difference Between Wget and cURL?

Both Wget and cURL are command-line utilities that allow file transfer from the internet. Although, Curl generally offers more features than Wget, wget provide features such as recursive downloads.

Can you Use Wget With Python?

Yes, you can run wget get in Python by installing the wget library with $pip install wget

Does Wget Respect Robots.txt?

Yes, Wget respects the Robot Exclusion Standard (/robots.txt)

Is Wget Free?

Yes, GNU Wget is free software that everyone can use, redistribute and/or modify under the terms of the GNU General Public License

What is recursive download?

Recursive download, or recursive retrieval, is the capacity of downloading documents, follow the links within them and finally downloading those documents until all linked documents are downloaded, or the maximum depth specified is reached.

How to specify download location in Wget?

Use the -P or –directory-prefix=PREFIX. Example: $ wget -P /path <url>

Conclusion

This is it.

You now know how to install and use Wget in your command-line.

SEO Strategist at Tripadvisor, ex- Seek (Melbourne, Australia). Specialized in technical SEO. Writer in Python, Information Retrieval, SEO and machine learning. Guest author at SearchEngineJournal, SearchEngineLand and OnCrawl.

Источник

What does WGET Do?

Install WGET in Windows 10

Get started with WGET

Example commands

Установите WGET в Windows 10

Запуск WGET из любого места

WGET можно использовать для:

Скачать сайт целиком$ wget -r -k -l 7 -p -E -nc https://site.com/

Как найти неработающие ссылки на вашем сайте

Installing GNU Wget on Windows

Downloading Files with Wget

Using Wget for Recursive Downloads

CURL Command on CMD

Invoke-WebRequest Command on PowerShell

BITSADMIN Command for Older Versions of Windows

WGET Command for Linux Lovers on Windows

Powerful VPS Hosting, Without the Premium Price

What is Wget?

What Is the Wget Command?

Install Wget

Downloading Files From the Command Line (Wget Basics)

Wget Syntax

View WGET Arguments

Download a single file with Wget

Download a File to a Specific Output Directory

Rename Downloaded File when Retrieving with Wget

Define User Agent in WGET

Extract as Googlebot with Wget Command

Wget command to Convert Links on a Page

Mirror a Single Webpage in Wget

Define Number of Retry Attempts in Wget

How to Use Proxies With Wget?

How to remove the Wget proxies?

Continue Interrupted Downloads with Wget

Wget VS Curl

Use Wget With Python

Debugging: What to Do When Wget is Not Working

Wget Command Not Found

Wget is not recognized as an internal or external command

Adding Wget to the System’s Path (Windows)

wget: missing URL

Alternatives to Wget on Mac and Windows

Wget for Web Scraping

About Wget

Wget FAQs

Conclusion

Скачать сайт целиком
$ wget -r -k -l 7 -p -E -nc https://site.com/