How to Download Files From URLs With Python

When it comes to file retrieval, Python offers a robust set of tools and packages that are useful in a variety of applications, from web scraping to automating scripts and analyzing retrieved data. Downloading files from a URL programmatically is a useful skill to learn for various programming and data projects and workflows.

In this tutorial, you’ll learn how to:

  • Download files from the Web using the standard library as well as third-party libraries in Python
  • Stream data to download large files in manageable chunks
  • Implement parallel downloads using a pool of threads
  • Perform asynchronous downloads to fetch multiple files in bulk

In this tutorial, you’ll be downloading a range of economic data from the World Bank Open Data platform. To get started on this example project, go ahead and grab the sample code below:

Facilitating File Downloads With Python

While it’s possible to download files from URLs using traditional command-line tools, Python provides several libraries that facilitate file retrieval. Using Python to download files offers several advantages.

One advantage is flexibility, as Python has a rich ecosystem of libraries, including ones that offer efficient ways to handle different file formats, protocols, and authentication methods. You can choose the most suitable Python tools to accomplish the task at hand and fulfill your specific requirements, whether you’re downloading from a plain-text CSV file or a complex binary file.

Another reason is portability. You may encounter situations where you’re working on cross-platform applications. In such cases, using Python is a good choice because it’s a cross-platform programming language. This means that Python code can run consistently across different operating systems, such as Windows, Linux, and macOS.

Using Python also offers the possibility of automating your processes, saving you time and effort. Some examples include automating retries if a download fails, retrieving and saving multiple files from URLs, and processing and storing your data in designated locations.

These are just a few reasons why downloading files using Python is better than using traditional command-line tools. Depending on your project requirements, you can choose the approach and library that best suits your needs. In this tutorial, you’ll learn approaches to some common scenarios requiring file retrievals.

Downloading a File From a URL in Python

In this section, you’ll learn the basics of downloading a ZIP file containing gross domestic product (GDP) data from the World Bank Open Data platform. You’ll use two common tools in Python, urllib and requests, to download GDP by country.

While the urllib package comes with Python in its standard library, it has some limitations. So, you’ll also learn to use a popular third-party library, requests, that offers more features for making HTTP requests. Later in the tutorial, you’ll see additional functionalities and use cases.

Using urllib From the Standard Library

Python ships with a package called urllib, which provides a convenient way to interact with web resources. It has a straightforward and user-friendly interface, making it suitable for quick prototyping and smaller projects. With urllib, you can perform different tasks dealing with network communication, such as parsing URLs, sending HTTP requests, downloading files, and handling errors related to network operations.

As a standard library package, urllib has no external dependencies and doesn’t require installing additional packages, making it a convenient choice. For the same reason, it’s readily accessible for development and deployment. It’s also cross-platform compatible, meaning you can write and run code seamlessly using the urllib package across different operating systems without additional dependencies or configuration.

The urllib package is also very versatile. It integrates well with other modules in the Python standard library, such as re for building and manipulating regular expressions, as well as json for working with JSON data. The latter is particularly handy when you need to consume JSON APIs.

In addition, you can extend the urllib package and use it with other third-party libraries, like requests, BeautifulSoup, and Scrapy. This offers the possibility for more advanced operations in web scraping and interacting with web APIs.

To download a file from a URL using the urllib package, you can call urlretrieve() from the urllib.request module. This function fetches a web resource from the specified URL and then saves the response to a local file. To start, import urlretrieve() from urlllib.request:

>>>

>>> from urllib.request import urlretrieve

Next, define the URL that you want to retrieve data from. If you don’t specify a path to a local file where you want to save the data, then the function will create a temporary file for you. Since you know that you’ll be downloading a ZIP file from that URL, go ahead and provide an optional path to the target file:

>>>

>>> url = (
...     "https://api.worldbank.org/v2/en/indicator/"
...     "NY.GDP.MKTP.CD?downloadformat=csv"
... )
>>> filename = "gdp_by_country.zip"

Because your URL is quite long, you rely on Python’s implicit concatenation by splitting the string literal over multiple lines inside parentheses. The Python interpreter will automatically join the separate strings on different lines into a single string. You also define the location where you wish to save the file. When you only provide a filename without a path, Python will save the resulting file in your current working directory.

Then, you can download and save the file by calling urlretrieve() and passing in the URL and optionally your filename:

Read the full article at https://realpython.com/python-download-file-from-url/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Related Articles

CycleGAN: Unpaired Image-to-Image Translation (Part 3)

Table of Contents CycleGAN: Unpaired Image-to-Image Translation (Part 3) Configuring Your Development Environment Need Help Configuring Your Development Environment? Project Structure Implementing CycleGAN Training Implementing Training Callback Implementing Data Pipeline and Model Training Perform Image-to-Image Translation Summary Citation Information CycleGAN:…
The post CycleGAN: Unpaired Image-to-Image Translation (Part 3) appeared first on PyImageSearch.

PyMC Open Source Development

In this episode of Open Source Directions, we were joined by Thomas Wiecki once again who talked about the work being done with PyMC. PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) and variational inference (VI) algorithms. Its flexibility and extensibility make it applicable to a large suite of problems.

Technology Roundtable

Technology Roundtable is an opportunity for technology architects in the technology industry to learn, innovate and collaborate with their peers. Roundtable members work together on industry priorities and general topics of interest and concern related to open source technology initiatives.