Wget через прокси windows

The global average internet speed of 110 Mbps may look amazing, but 64.70 Mbps average download speed might not help with big downloads. 

With files getting larger, the demand for tools that can aid  users in downloading larger files with slower connection speeds increased. Wget is one such tool

WGet is a non-interactive mechanism that lets you download files. it also lets you download safely—with the help of proxy servers. 

In this article, you will learn how to use a Wget command with proxy IPs. Read on to find out.

🔑 Key Takeaways

  • Wget is a powerful command-line tool for downloading files. It is compatible with Mac, Windows, and Linux.
  • Proxies enhance security and anonymity with Wget, enabling bypassing IP blocks and geo-restrictions.
  • Install Wget on your device by following the provided instructions for Mac, Windows, and Linux.
  • Familiarize yourself with basic Wget commands for downloading files, multiple URLs, entire websites, and saving to specific directories.

Using Wget With Proxies

As open-source GNU software, Wget lets users download files on the web even with minimal or controlled bandwidth. 

It is commonly pre-installed on Linux, but it is also compatible with MacOS and Windows.

The Wget command functions can be used in scripts, terminals, and cron jobs. 

It can also let you perform recursive downloads and full mirroring of websites by storing the HTML file for local access.

Wget commands have a simple syntax that goes like this:

Wget [options] [URL]

The options are the specific Wget command functions. If Wget is installed, you can check the list of options by typing Wget -h.

Wget is often used for downloading large files. It is non-intrusive and can silently work in the background, even when a different user is logged in. 

It can also resume interrupted downloads. For added security and anonymity, you can use Wget with proxy IPs. It can also be used for testing proxies.

What You Need

First, you have to check if Wget is installed on your device. Open your terminal or command prompt and type in Wget -V

If Wget is installed, it should print the version and license information. However, if it is not present, you can install Wget by following these steps: 

A. Installing Wget on MacOS

You can install Wget through the Homebrew packet manager. Type in this command:

brew install Wget

Run the Wget -V command again to check if it’s successfully installed.

B. Wget Installation on Windows

Here is the easiest way to install Wget in Windows:

  1. Download the Wget.exe file for your desired version.
  2. Open your file manager and search for the directory: Windows\System32.
  3. Copy and paste the Wget.exe file into the System32 folder.
  4. Run Wget -V in your command prompt to check if it was installed successfully.

C. Installing Wget on Linux

Wget should be pre-installed on Linux. In case it’s not present, you can install Wget with these commands:

For Ubuntu:

sudo apt-get install Wget

For CentOS:

sudo yum install Wget

Getting Your Proxies

After ensuring Wget is installed, you should also have your proxies ready. You can get proxy IPs from paid and free sources.

Keep in mind that it is generally safer to use a paid proxy service. Free proxies are unsafe and often abused by many users.

👍 Helpful Article:
Getting the proxy that suits your needs is important to enjoy a safe browsing experience. Check out the leading proxy server services that you can choose from.

Steps To Use Wget With A Proxy

Before we get into the use of proxies, let’s take a look at the basic Wget commands that you can perform. 

1. Know the Basic Wget Commands

For MacOS and Linux, you can use the built-in terminal to run Wget commands.

In the case of Windows users, you can choose between the Windows command prompt or Powershell. 

It is a more advanced command-line software built by Microsoft that is also a programming language on its own. 

To download a file, simply run this command:

Wget https://www.example.com/example.pdf

To download files from multiple URLs, create a .txt file containing the download links.  Then, run

Wget -i manydownloads.txt

If you want to download the whole website in HTML, use this command:

Wget -m https://www.example.com/

To save a downloaded file to a specific directory, you can use:

Wget -P \Documents\Examples https://www.example.com/example.pdf

To view all “options,” you can run:

Wget -h

These commands will be useful when using proxies in Wget, so make sure to take note of each one. 

2. Applying Proxies To Wget Commands

Wget primarily supports HTTP, HTTPS, and FTP protocols. 

If you require SOCKS proxies, you can use a Wget alternative like cURL. It can support a lot of protocols.

To apply a proxy to your download request, use the execute (-e) option to activate proxy use and apply the actual proxy.

Here’s an example:

Wget -e use_proxy=yes -e http_proxy=[proxy IP]:[port] https://example.com/examplefile.zip

For proxies that need authentication, you can use Wget with your proxy username and password. It should look like this:

Wget -e use_proxy=yes -e https_proxy=[proxy IP:port] --proxy-user=[username] --proxy-password=[password] https://example.com/example.pdf

 Another option is to configure the proxy in the operating system. This will make all connections go through the proxy server. 

With this, you will not have to type in the long proxy command all the time.

Below are steps on how you can do that across different operating systems. 

a. Setting up a Proxy on Windows

To set up a proxy on Windows:

  1. Click on the Windows icon.
  2. Click on Settings (or the gear icon).
  3. Select Network and Internet.
  4. Select Proxy.
  5. Click on Set up under the Manual proxy set up.
  6. Toggle on Use a proxy server and input the proxy IP and port number.
  7. Click Save.
etting Up Proxy Server on Windows

b. Building a Proxy on MacOS

Here’s how to set up a proxy on MacOS:

1. Click on the Apple logo in the top-right corner.

2. Select System Preferences.

3. Click on Network.

4. Select your active network connection. It’s either Ethernet or WiFi.

5. Click on Advanced.

Proxy Configuration on Mac

6. Select the protocol type of your proxy or the automatic proxy options.

7. Enter the proxy IP and port number.

8. If the proxy requires authentication, tick the checkbox beside Proxy server requires authentication. Enter the username and password.

9. Click OK.

c. Putting Up a Proxy on Linux

For Linux, there is a way to configure the proxy settings on Wget without affecting the OS. All you have to do is configure the Wgetrc file.

1. Type in this command to access the Wgetrc file.

sudo nano /etc/wgetrc

2. It should return your Wget configurations. Find these lines.

#https_proxy = http://proxy.yoyodyne.com:18023/
#http_proxy = http://proxy.yoyodyne.com:18023/
#ftp_proxy = http://proxy.yoyodyne.com:18023/

3. Remove the comment symbol (number sign) of the proxy protocol that you want to configure. It should look like this:

#https_proxy = http://proxy.yoyodyne.com:18023/
http_proxy = http://proxy.yoyodyne.com:18023/
#ftp_proxy = http://proxy.yoyodyne.com:18023/

4. Change it to your proxy IP and port number.

http_proxy = http://[proxy IP]:[port]/

5. Exit and save the changes.

Now, you can send all your Wget download requests through a proxy.

Conclusion

Wget is a highly useful tool for downloading large files and mirroring websites, even with a limited internet connection speed. 

It is even more useful with the use of proxies, which let you bypass IP blocks and geo-restrictions while staying anonymous.

The best thing about Wget is that it is open-source. You can use it absolutely free of charge. It is also continuously being updated, so it will just keep getting better.

FAQs

  1. What is the config file for Wget?

    The config file for Wget is Wgetrc. In Linux, you can access this file using the command sudo nano /etc/Wgetrc.

  2. What does proxy URL mean?

    The proxy URL is the actual address that will redirect your connection to the proxy server. Here’s an example of a proxy URL: https://123.456.789.123:8080.

Sources

Timeline Of The Article

-> Published on: 22-07-2023

-> Benefited Readers – 9 and Counting

By

Harsha Kiran is the founder and innovator of Techjury.net. He started it as a personal passion project in 2019 to share expertise in internet marketing and experiences with gadgets and it soon turned into a full-scale tech blog with specialization in security, privacy, web dev, and cloud computing.

In this article, you’re going to dive deep into the world of Wget and Linux, exploring how you can easily integrate proxies into your workflow.

9 min read

Using Wget with a Proxy

GNU Wget is a versatile command line utility that has become indispensable for many Linux users who rely on it for effortlessly fetching files from the internet. It is feature-rich, is easy to use, and supports common network protocols, including HTTP, HTTPS, and FTP. In addition, Wget has built-in support to download entire websites or subsets of pages, making it an ideal tool for web scraping, mirroring, and archiving.

One of Wget’s most impressive features is its ability to work seamlessly with proxies. As a Linux user, you’ll often find yourself needing a proxy to mask your identity, bypass pesky regional restrictions, or enhance performance through load balancing. With Wget in your toolkit, integrating proxies is easy.

What Is Wget?

Wget, short for “World Wide Web” and “get”, is a free and open source program for interacting with files on the internet. It’s part of the GNU Project, a free software mass collaboration effort.

Wget comes equipped with handy features for technical users working with files online. These include batch downloads, resuming of interrupted downloads, recursive downloading, proxy support, download scheduling, bandwidth throttling, customizable user agent, and SSL/TLS support. It’s also non-interactive, making it perfect for scripts and cron jobs that run in the background.

Wget is popular among Linux and Unix users and also has versions available for Windows and macOS. Wget’s impressive range of features and cross-platform compatibility makes it the go-to tool for various web-based tasks, such as downloading large files, automating downloads, and creating website mirrors.

Using Proxies with Wget in Linux

A wide variety of proxies exist, including datacenter and residential IPs, each with its own advantages and use cases. Using proxies with Wget offers benefits, such as bypassing geo and network/ISP restrictions. It also allows you to maintain anonymity and privacy while browsing the web or downloading files.

When using the right proxy provider, proxies can also be used to cache frequently accessed resources, boosting performance. In addition, proxy providers can influence your network speeds, help you bypass rate limits by providing a wide pool of IP addresses, and even provide solutions that help you bypass captchas. This makes selecting the right proxy provider essential to ensuring that your proxy performs effectively, especially when using it with Wget.

Configuring Wget with a Proxy

To use Wget with a proxy, you need to configure the appropriate settings in your environment or within the Wget command itself. There are several different ways of doing this, and this tutorial will cover four of them: configuring proxy settings using environmental variables, setting proxies for all users by updating the /etc/wgetrc file, setting proxies for the current user by updating the ~/.wgetrc file, and setting the proxy for the current terminal instance using the -e flag. You’ll also learn how to use Wget with both authenticated and unauthenticated proxies.

You can find all the configurations discussed in this article in this GitHub gist.

Before you start using Wget with a proxy, you need the following:

  • Debian-based systems
  • Proxy server details: To use Wget with a proxy, you need the proxy server’s details. This includes the server’s IP address or hostname, port number, and if required, authentication information (ie username and password). You can obtain this information from your proxy provider or network administrator.

1. Configure Proxy Settings Using Environment Variables

The easiest way to set proxy configurations for Wget is by defining them at a system level using environment variables. This allows multiple programs to read the value and use it, so you only need to change it once. Setting the proxy as an environmental variable ensures that Wget uses it for all requests made from your computer.

To set up a proxy for Wget, add the following lines to your shell configuration file (ie .bashrc or .bash_profile) and replace the placeholders with your proxy server’s address and port (ie http://proxy.example.com:8080):

export http_proxy=http://proxy_address:proxy_port
export https_proxy=https://proxy_address:proxy_port

If the proxy requires authentication, instead of http://proxy_address:proxy_port, you should also add the username and password in the URL so that it looks like this:

export http_proxy=http://username:password@proxy_address:proxy_port
export https_proxy=username:password@proxy_address:proxy_port
export ftp_proxy=username:password@proxy_address:proxy_port

Remember to replace the variables for usernamepasswordproxy_address, and proxy_port with the appropriate values (ie http_proxy=http://username:[email protected]:8080).

After adding these lines, restart your shell or run source .bashrc or source .bash_profile, depending on which file you used to apply the changes.

2. Set Proxies for All Users by Updating the /etc/wgetrc File

If you need to set a proxy for the entire system. For example, if you want to use a proxy only when downloading files with Wget to protect your identity, Wget provides an easy way to do this, and it can be done either for an individual system user or for all system users.

Setting the proxy for all system users is helpful in cases where there’s a shared company machine with different users relying on the same proxy to carry out the work. Wget allows you to configure the proxy once so that all users can access it.

To set a proxy for all users, you need to modify the config file located in /etc/wgetrc. The wgetrc file is an initialization file that stores default settings and options for Wget. This file allows you to customize its behavior without always specifying command line arguments.

To set the proxy, open the wgetrc file using your favorite text editor and add the following lines:

https_proxy = http://proxy.example.com:8080
http_proxy = http://proxy.example.com:8080
ftp_proxy = http://proxy.example.com:8080

For authenticated proxies, use the following syntax:

https_proxy = http://username:[email protected]:8080
http_proxy = http://username:[email protected]:8080
ftp_proxy = http://username:[email protected]:8080

Replace proxy.example.com:8080 and username:password with the address, port, and authorization credentials of your proxy server. Save the file and close the editor. From now on, all Wget requests made by any user on your system will use the specified proxy server.

3. Set Proxies for the Current User by Updating the ~/.wgetrc File

Wget also allows you to change the proxy configurations for just the current user. This can be helpful if you’re in a situation where the proxy details need authentication information specific to each user.

To set the proxy, you need to create/modify the ~/.wgetrc file. This is a user-specific wgetrc file located in your home directory (~/). It stores configurations that are only affected by the current user. The ~/.wgetrc file may not exist by default, especially on new Linux installations or user accounts because the file is typically created when a user needs to customize Wget settings specific to their account. If the file doesn’t exist, you can create it.

Once you have the ~/.wgetrc file, open it in your favorite text editor and add the following lines:

https_proxy = http://proxy.example.com:8080
http_proxy = http://proxy.example.com:8080
ftp_proxy = http://proxy.example.com:8080

Again, remember to replace proxy.example.com:8080 with your specific details. This method only affects Wget requests made by the current user. If you switch to a different user on the same machine, these settings do not apply.

4. Set Proxy for the Current Terminal Instance Using the -e Flag

If you don’t want to set the proxy at a system or Wget level, you can configure it directly when running the Wget command. This method allows you to use different proxy settings for individual Wget commands, giving you greater flexibility.

To specify the proxy configuration for a single Wget request, use the following syntax:

# http proxy
wget -e use_proxy=yes -e http_proxy=http://proxy_address:proxy_port URL
# https proxy
wget -e use_proxy=yes -e https_proxy=http://proxy_address:proxy_port URL

In this code, URL is the URL you want Wget to fetch (eg www.google.com).

For authenticated proxies, you can use the http://username:password@proxy_address:proxy_port syntax for specifying the proxy, as discussed in the previous section.

This method allows you to set a proxy for a single request without affecting other requests, terminal sessions, or users.

Using Wget

Thankfully, using Wget is simple. The general syntax for Wget is wget [options] [url], where you first specify optional arguments (ie [options]), such as the -e use_proxy=yes that you learned about earlier, followed by the [url] that you want to fetch. This could be a media file, such as a document or even a web page.

Because the first part is optional, you can fetch a web resource by specifying wget [url]. For instance, calling wget http://example.com/file.pdf fetches the file and downloads it to your local machine.

You can also download a file while specifying the name to use when saving it to disk. To do this, use the --output-document argument:

 wget --output-document=image.jpg https://httpbin.org/image/jpeg

Wget also allows you to batch your downloads and download from multiple URLs in one command. To do this, you need to create a file and paste your URLs, each on its own line. After doing this, you can run the following command:

wget ‐‐input list-of-file-urls.txt

Please note: If you’re using method 4 for specifying your proxy, you must append -e use_proxy=yes -e http_proxy=http://proxy_address:proxy_port to the previous commands to get them to work with a proxy. Options 1, 2, and 3 will work as is since the proxy is configured already.

Conclusion

In this article, you learned four different methods for using Wget with a proxy and the advantages of doing so. You can select the most appropriate method depending on your preferences to apply proxies system-wide, to all users, to a specific user, or to a single Wget request.

While using Wget with a proxy offers numerous benefits, choosing the right proxy services is essential for maximizing performance and reliability. Bright Data is a web data platform that assists companies in gathering massive amounts of structured data from the web. By using their proxy solutions, you can improve your Wget experience and reduce the number of failed requests, ensuring better information retrieval from the internet.

Whether you’re a start-up or an enterprise, Bright Data helps you scrape websites and collect data using their proxy solutions and custom tools, such as Web Scraper API and Web Unlocker. Explore Bright Data’s datacenter proxies, proxy server options, and pricing to find the perfect fit for your proxy network requirements.

No credit card required

Wget is a popular command-line utility that can download files from the web. It’s part of the GNU Project and, as a result, commonly bundled with numerous Linux distributions.

This article will walk you through the step-by-step process of installing and downloading files using Wget with or without proxies, covering multiple scenarios and showcasing practical examples.

What is Wget

Wget is a free software package that can retrieve files via HTTP(S) and FTP(S) internet protocols. The utility is part of the GNU Project. Thus, the full name is GNU Wget. The capitalization is optional (Wget or wget).

How to install Wget

Wget can be downloaded from the official GNU channel and installed manually. However, we recommend using package managers. Package managers facilitate the installation and make future upgrades more convenient. Also, most Linux distributions are bundled with Wget.

To install Wget on Ubuntu/Debian, open the terminal and run the following command:

sudo apt-get install wget

To install Wget on CentOS/RHEL, open the terminal and run the following command:

If you’re using macOS, we highly recommend using the Homebrew package manager. Open the terminal and run the following command:

If you’re using Windows, Chocolatey package manager is a good choice. When using Chocolatey, run the following command from the command line or PowerShell:

Lastly, to verify the installation of Wget, run the following command:

This will print the installed version of Wget along with other related information.

Running Wget

Wget command can be run from any command-line interface. In this tutorial, we’ll be using the terminal. To run the Wget command, open the terminal and enter the following:

This will list all the options that can be used with the Wget command grouped in categories, such as Startup, Logging, Download, etc.

Downloading a single file

To download a single file, run Wget and type in the complete URL of the file. For example, the Wget binary file is located at https://ftp.gnu.org/gnu/wget/wget2-2.0.0.tar.lz. To download this file, enter the following in the terminal:

wget https://ftp.gnu.org/gnu/wget/wget2-2.0.0.tar.lz

Wget shows the progress of downloads

Wget shows detailed information about the file being downloaded: the download completion bar, progress of each step, total file size and its mime type, etc.

Changing the User-Agent

Every program, including web browsers, sends certain headers when connecting to a web service. In this case, the User-Agent header is the most important as it contains a string that identifies the program.

To see how User-Agent varies across various applications, open this URL in different browsers that you have installed.

To identify the User-Agent used by Wget, request this URL:

wget https://httpbin.org/user-agent

This command will download a file named user-agent without any extension. To view the contents of this file, use the cat command on macOS and Linux. On Windows, you can use the type command.

~$ cat user-agent
{
  "user-agent": "wget/1.21.2"
}

The default User-Agent can be modified using the —header option. The syntax is as follows:

wget --header "user-agent: DESIRED USER AGENT" URL-OF-FILE

The following example should clarify it further:

~$ wget  --header "user-agent: Mozilla/5.0 (Macintosh)" https://httpbin.org/user-agent
~$ cat user-agent
{
  "user-agent": "Mozilla/5.0 (Macintosh)"
}

As it’s evident here, the User-Agent has changed. If you wish to send any other header, you can add more —header options followed by a header in «HeaderName: HeaderValue» format.

Downloading multiple files

There are two methods for downloading multiple files using Wget. The first method is to send all the URLs to Wget separated with a space. For example, the following command will download files from all three URLs:

~$ wget http://example.com/file1.zip http://example.com/file2.zip http://example.com/file3.zip

If you wish to try a real example, use the following command:

~$ wget https://ftp.gnu.org/gnu/wget/wget2-2.0.0.tar.lz https://ftp.gnu.org/gnu/wget/wget2-1.99.2.tar.lz

The command will download both files one at a time.

This method works well when the number of files is limited. It can become difficult to manage as the number of files grows, making the second method more useful.

The second method is to write all the URLs in a file and use the -i or —input-file option. For example, to read the URLs from the urls.txt file, run any of the following commands:

~$ wget --input-file=urls.txt
~$ wget -i urls.txt

The best part of this option is that if any of the URLs don’t work, Wget will continue and download the rest of the functional URLs.

Extracting links from a webpage

The —input-file option of the Wget command can be expanded to extract links from a webpage.

In its simplest form, you can supply a URL that contains the links to the files. For example, this page contains links to downloadable content of Wget. To download all files from this URL, run the following:

~$ wget --input-file=https://ftp.gnu.org/gnu/wget

However, this command won’t be particularly useful without any further customization. There are multiple reasons for that.

By default, Wget does not overwrite existing files. If a download results in overwriting a file, it’ll create a new file by appending a numerical suffix. It means that for every instance of a compressed.gif file, it’ll create new files with names such as compressed.gif, compressed.gif.1, compressed.gif.2, and so on.

This behavior can be modified by specifying the —no-clobber switch to skip duplicate files.

Next, you may want to download the files recursively by specifying the —recursive switch.

Finally, you may want to skip downloading certain files by specifying the extensions as a comma-separated list to the —reject switch.

Similarly, you may want to download certain files while ignoring everything else by using the —accept switch. This also takes a list of extensions separated by a comma.

Some other useful switches are —no-directories and —no-parent. These two ensure that no directories are created, and the Wget command doesn’t traverse to a parent directory.

For example, to download all files with the .sig extension, use the following command:

~$ wget --recursive --no-parent --no-directories --no-clobber --accept=sig --input-file=https://ftp.gnu.org/gnu/wget

Using proxies with Wget

There are two methods for Wget proxy server integration. The first method uses command line switches to specify the proxy server and authentication details.

The easiest way to verify is to get an IP address before specifying a proxy server. To check your current IP address, run the following commands:

~$ wget https://ip.oxylabs.io/location
#output of wget here
~$ cat index.html
11.22.33.44 #prints actual IP address

The first command simply receives the index.html file containing the IP address. The cat command (or type command for Windows) prints the file contents.

The same result can be achieved by running Wget in quiet mode and redirecting the output to the terminal instead of downloading the file:

~$ wget --quiet --output-document=- https://ip.oxylabs.io/location

The shorter version of the same command is as follows:

~$ wget -q -O - https://ip.oxylabs.io/location

To utilize a proxy that doesn’t require authentication, use two -e or two —execute switches. The first will enable the proxy, and the second will specify the proxy server’s URL.

The following commands enable the proxy and specify the proxy server’s IP 12.13.14.15 and port 1234:

~$ wget -q -O- -e use_proxy=yes -e http_proxy=12.13.14.15:1234 https://ip.oxylabs.io/location
12.13.14.15

In the example above, the proxy doesn’t require authentication. If the proxy server requires user authentication, set the proxy username by using the —proxy-user switch. Similarly, set the proxy password using the —proxy-password switch:

~$ wget -q -O- -e use_proxy=yes -e http_proxy=12.13.14.15:1234  --proxy-user=your_username --proxy-password=your_password https://ip.oxylabs.io/location

As evident here, the command is quite long. However, it’s useful when you don’t want to use a proxy all the time.

The second method is to use the .wgetrc configuration file. This file can store proxy configuration, which Wget then reads.

The configuration file is located in the user’s home directory and is named .wgetrc. Alternatively, you can use any file as the configuration file by using the —config switch.

In the ~/.wgetrc file, enter the following lines:

use_proxy = on
http_proxy = http://12.13.14.15:1234

If you also need to set user authentication for the proxy, modify the file as follows:

use_proxy = on
http_proxy = http://your_username:your_password@12.13.14.15:1234

As of now, every time Wget runs, it’ll use the specified proxy.

$ wget -q -O- http://httpbin.org/ip
# Prints IP of the proxy server

The proxies can also be set with the environment variables like http_proxy. However, it isn’t specific to Wget and will apply to the entire network traffic, making it unsuitable for the task at hand.

cURL vs Wget

cURL or Curl is another open-source command-line tool for downloading files and is available for free.

cURL and Wget share many similarities, but there are important distinctions differentiating the tools for specific individual purposes.

First, let’s take a quick look at the similarities. Both options:

  • Are open-source, command-line tools for downloading content from HTTP(S) and FTP(S)

  • Can send HTTP GET and POST requests

  • Support cookies

  • Are designed to run in the background

The following features are only available in cURL:

  • Available as a library

  • Support for more protocols beyond HTTP and FTP

  • Better SSL support

  • More HTTP authentication methods

  • Support for SOCKS proxies

  • Better support for HTTP POST 

Nonetheless, Wget has its advantages as well:

  • Supports recursive. This is the most prominent advantage, allowing you to download files recursively using the —mirror switch and create a local copy of a website. 

  • Can resume interrupted downloads

More information about what is curl is in this article. If you want to read about the differences in detail, see the cURL comparison table.

The differences listed above should help you figure out the more suitable tool for a particular scenario. For example, if you want recursive downloads, choose Wget. If you require SOCKS proxy support, pick cURL.

Neither tool is decisively better than the other. Select the one that is suitable for your specific scenario at a given moment. 

Conclusion

This article detailed how to configure Wget, from installation and downloading single or multiple files to the methods of using proxy servers. Lastly, the comparison between cURL and Wget overviewed their differences according to the functionality and individual use cases.

If you want to find out more about how proxies and advanced public data acquisition tools work or about specific web scraping use cases, such as web scraping job postings or building a Python web scraper, check out our blog. And if, after reading this article, you’re thinking of using proxies with Wget, check out our Residential Proxies and Datacenter Proxies.

About the author

author avatar

Augustas Pelakauskas

Senior Technical Copywriter

Augustas Pelakauskas was a Senior Technical Copywriter at Oxylabs. Coming from an artistic background, he is deeply invested in various creative ventures — the most recent being writing. After testing his abilities in freelance journalism, he transitioned to tech content creation. When at ease, he enjoys the sunny outdoors and active recreation. As it turns out, his bicycle is his fourth-best friend.

All information on Oxylabs Blog is provided on an «as is» basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website’s terms of service or receive a scraping license.

Related articles

How to Use cURL with REST API

How to Use cURL with REST API

A detailed yet clear step-by-step tutorial on using cURL with REST API. Check it out and implement the gathered insights to future projects.

author avatar

Yelyzaveta Nechytailo

2023-05-16

Web Scraping With RegEx

RegEx stands for Regular Expressions, a method to match specific patterns depending on the provided combinations, which can be used as filters to get the desired output.

author avatar

Augustas Pelakauskas

2022-04-29

Building a Web Scraper in Golang

This article will guide you through the step-by-step process of writing a fast and efficient Golang web scraper that can extract public data from a target website.

author avatar

Augustas Pelakauskas

2021-12-23

ISO/IEC 27001:2017 certified products:

Scale up your business with Oxylabs®

Forget about complex web scraping processes

Choose Oxylabs’ advanced web intelligence collection solutions to gather real-time public data hassle-free.

Can you «download the internet»? Surely, that’s impossible, you simply won’t have enough storage. However, developers of free and open-source software always have a keen sense of humor. A simple example is the wget utility. Its name is the abbreviation of «www get,» where WWW stands for World Wide Web. Thus, the term can be understood as «download the Internet.»

In this material, however, we will focus not on the utility itself but on the ways to make it work through proxy. Usually, this is required for organizing multi-threaded connections and parsing operations.

Earlier, we have already talked about a similar utility — cURL (it is quite compatible with proxy granted that you have enough skills). Therefore, we will additionally compare both utilities and talk about their differences below.

What Is Wget and How to Use It

What Is Wget and How to Use It

Wget — is a built-in command-line utility that is provided with practically all popular Linux distributions; it is developed for fast downloading of files and other content via various internet protocols.

If needed, the utility can be installed and used on other platforms, as the program has open-source code that can be compiled for different execution environments.

Wget boasts a very simple syntax and is therefore ideal for everyday use, including for beginners. The fact that wget is included in the basic environment of Linux distributions allows downloading other progarms and packages quite quickly and easily. Tasks can be included in the cron scheduler as well (scripts and commands are executed on a schedule). Plus, wget can be incorporated into any other scripts and console commands.

For example, wget can be used to fully download a target website, if the options for bypassing URL addresses (with recursion) are set correctly.

Wget supports working with HTTP, HTTPS, FTP and FTPS protocols (+ some other, less popular ones).

A more correct name is GNU Wget (official website and documentation).

Note that there is a parallel implementation of wget — wget2. It has a number of small innovations and features.

An example of using wget to download an archive:

  • wget https://your.site/directory/archive.zip

Bulk files can be downloaded here by simply specifying all their names (links) separated by spaces:

  • wget https://your.site/directory/archive1.zip https://your.site/directory/archive2.zip https://your.site/directory/archive3.zip

The utility will download files sequentially with progress displayed directly in the console.

The names of target files (list of URLs) can be saved in a separate document and «fed» to wget like this:

  • wget —input-file=~/urls.txt

The same is about shortened options:

  • wget -i ~/urls.txt

If access is protected by a login and password, wget can handle it as well (you need to replace user and password with actual ones):

  • wget ftp://user:password@host/path

This is how you can create a local version of a specific website (it will be downloaded as HTML pages with all related content):

  • wget —mirror -p —convert-links -P /home/user/site111 source-site.com

You can download only files of a certain type from a website:

  • wget -r -A «*.png» domain.zone

Note! Wget cannot handle JavaScript, meaning it will only load and save custom HTML code. All dynamically loaded elements will be ignored.

wget applications

There are plenty of possible wget applications.

A complete list of all options and keys for the utility can be found in the program documentation as well as on the official website. In particular, you can:

  • Limit download speed and set other quotas;
  • Change the user-agent to your own value (for example, you can pretend to be a Chrome browser to the website);
  • Resume download;
  • Set offset when reading a file;
  • Analyze creation/modification time, MIME type;
  • Use constant and random delays between requests;
  • Recursively traverse specified directories and subdirectories;
  • Use compression at the wget proxy server level;
  • Switch to the background mode;
  • Employ proxies.

Naturally, we are mostly interested in the first point.

When parsing, wget can help with saving HTML content, which can later be dissected and analyzed by other tools and scripts. For more details, see materials on Python web scraping libraries and Golang Scraper.

Why Use a Proxy with Wget

Why Use a Proxy with Wget

A proxy is an intermediary server. Its main task is to organize an alternative route for exchanging requests between a client and a server.

Proxies can use different connection schemes and technologies. For example, proxies can be anonymous or not, work based on different types of devices (server-based, mobile, residential), paid or free, with feedback mechanisms (backconnect proxies), static or dynamic addresses etc.

No matter what they are, their tasks remain roughly the same: redirection, location change, content modification (compression, cleaning etc.).

When parsing, wget use proxy is also needed in order to hide the real owner’s address and organize multiple parallel connections, for example, to speed up the data collection procedure (scraping, not to be confused with web crawling).

How to Install Wget

How to Install Wget

In many Linux distributions, wget is a pre-installed utility. If the wget command returns an error, wget can be easily installed using the native package manager.

Debian-based distributions, including Ubuntu:

  • sudo apt-get install wget

Fedora, CentOS and RHEL:

  • yum install wget

ArchLinux and equivalents:

  • pacman -Sy wget

In MacOS, wget can be installed either from the source (with “make” and “make install” commands) or using the Homebrew package manager. For beginners, the latter option will be the most convenient (note that cURL utility is used, which is pre-installed in MacOS by default):

  • /bin/bash -c «$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)»
  • brew install wget

In the latest versions of Windows (10 and 11), wget can be installed in the Linux subsystem (WSL), directly from compiled sources (for example, they can be found here) or using third-party package managers like Chocolatey. Installation command for Chocolatey:

  • choco install wget

If you install wget in Windows at the binary file level, you will need to specify the program link in the PATH variable for the correct applet invocation in the command line. Otherwise, you will have to refer to the file directly each time as «.\directory\wget.exe», followed by the list of options and parameters.

Running Wget

Once the utility is installed, it can be launched either from the command line or accessed within shell scripts.

Typical launch:

  • wget https://site.zone/directory/file.zip

Immediately after pressing “enter”, the utility will start downloading the file to the user’s home directory (or to another directory according to environment settings).

In the console, wget displays the current speed and overall download progress.

You can change the filename during download:

  • wget -O new-name.zip https://site.zone/directory/source-file.zip

If you need to call up help for the set of options, type:

  • wget -h

Setting Up Wget to Work Through Proxy

Setting Up Wget to Work Through Proxy

The simplest way to specify a wget proxy is through special options in the command line:

  • If the proxy does not require authentication:

wget -e use_proxy=on -e http_proxy=proxy.address.or.IP.address:port https://target.site/directory/file.zip

  • If authentication with a username and password is required:

wget -e use_proxy=on -e http_proxy=132.217.171.127:1234 —proxy-user=USERNAME —proxy-password=PASSWORD https://target.site/directory/file.zip

In some cases, instead of the option «use_proxy=on», the combination «use_proxy=yes» may be used.

If it is inconvenient for you to specify options in the console every time, you can add the proxy wget at the configuration file level. This can be done either in the general configuration directory (/etc/wgetrc) or in the local user config (~.wgetrc, if there is no such file, it can be created manually). Just replace the options with the following (if the user config is created from scratch, just add the options to an empty file):

use_proxy=on

http_proxy=155.217.170.121:12345

https_proxy=155.217.170.121:12345

Naturally, instead of 155.217.170.121:12345, you should specify the actual IP address and port number.

If authentication with a username and password is required, you can use the following construction:

use_proxy = on

http_proxy = http://USERNAME:PASSWORD@155.217.170.121:12345

Now you can run wget without additional keywords; the utility will keep working through proxy.

Rotating Proxy for Wget

Wget does not have built-in tools for proxy rotation. Therefore, if you want to run each new wget with proxy, you need to write a bash script or use the «-e» option.

Example:

wget -e use_proxy=on -e http_proxy=104.254.41.36:1234 —proxy-user=USERNAME-one —proxy-password=PASSWORD-one https://site-one.zone/directory/file-one.zip

wget -e use_proxy=on -e http_proxy=26.104.52.225:2234 —proxy-user=USERNAME-two —proxy-password=PASSWORD-two https://site-two.zone/directory/file-two.zip

wget -e use_proxy=on -e http_proxy=70.174.89.3:44444 —proxy-user=USERNAME-three —proxy-password=PASSWORD-three https://site-three.zone/directory/file-three.zip

And here’s how a bash script variant of forced proxy rotation might look, randomly selected from a list stored in the file proxies.txt (let’s assume there are 10 lines):

for i in {1..10}

do

proxy=$(shuf -n 1 proxies.txt)

wget -e use_proxy=on -e http_proxy=$proxy —proxy-user=USERNAME —proxy-password=PASSWORD https://target-site.zone/subdirectory/some-file

done

If you’re not familiar with scripting, there’s another elegant solution – using backconnect proxies. Let’s take Froxy proxy as an example:

  1. A port is configured in the personal account (location and conditions for rotating outgoing IP addresses are defined with each new request, for example);
  2. Proxy port data is copied (this will be a regular proxy for wget).
  3. The requests are then executed similar to using regular individual proxies (wget -e use_proxy=on -e http_proxy=255.89.155.178:1234 —proxy-user=USERNAME —proxy-password=PASSWORD https://target.site/directory/file.zip).
  4. IP address rotation is carried out on the proxy provider’s side. The input port remains the same (there is no need to add or update anything in wget).

cURL vs Wget

cURL vs Wget

Both cURL and wget are open-source utilities used for downloading files and other content via HTTP and FTP protocols. They both handle HTTP POST and GET requests, cookies, can work with secure versions of websites (HTTPS) and can be incorporated into bash scripts.

However, they also have distinctions.

Let’s start with cURL.

  • This is not just a utility but also a software library that can be used at the code level;
  • Unlike wget, cURL supports a vast number of additional protocols (here is a detailed comparison table).
  • cURL can work through SOCKS proxies (wget supports HTTP only);
  • It offers more capabilities for site authentication and SSL connection support;
  • In addition to POST and GET, it also supports some other methods (e.g., PUT).

On the other hand, wget also has something to offer:

  • Recursive downloading of directory contents is possible;
  • Creating copies of websites is available;
  • Interrupted downloads can be resumed (no need to re-download large files);
  • It has a smaller set of options, making wget easier to manage and configure.

Take your time to find out how to integrate cURL with proxies.

Conclusion and Recommendations

wget with proxy

Wget is a simple yet powerful utility for downloading files and HTML pages. It can be adapted for parsing tasks and can be accessed in the console or through bash scripts. Its downsides include the inability to use it as a library and the lack of built-in proxy rotation.

You can find quality residential and mobile proxies with automatic rotation in our service. Froxy offers over 8 million IP addresses, a convenient interface and targeting up to the city level (with solid coverage in all countries worldwide). Price depends on traffic only. There’s a special trial package available for testing the utility features.

Понравилась статья? Поделить с друзьями:
0 0 голоса
Рейтинг статьи
Подписаться
Уведомить о
guest

0 комментариев
Старые
Новые Популярные
Межтекстовые Отзывы
Посмотреть все комментарии
  • Смонтировать образ программы открыть через проводник в windows 10
  • Как отключить чекдиск при загрузке windows 10
  • Как отключить dwm в windows 10
  • Windows 10 scaciati besplatno
  • Нельзя удалить пин код windows 10