How to Download Files From URLs With Python
When it comes to file retrieval, Python offers a robust set of tools and packages that are useful in a variety of applications, from web scraping to automating scripts and analyzing retrieved data. Downloading files from a URL programmatically is a useful skill to learn for various programming and data projects and workflows.
In this tutorial, you’ll learn how to:
- Download files from the Web using the standard library as well as third-party libraries in Python
- Stream data to download large files in manageable chunks
- Implement parallel downloads using a pool of threads
- Perform asynchronous downloads to fetch multiple files in bulk
In this tutorial, you’ll be downloading a range of economic data from the World Bank Open Data platform. To get started on this example project, go ahead and grab the sample code below:
Free Bonus: Click here to download your sample code for downloading files from the Web with Python.
Facilitating File Downloads With Python
While it’s possible to download files from URLs using traditional command-line tools, Python provides several libraries that facilitate file retrieval. Using Python to download files offers several advantages.
One advantage is flexibility, as Python has a rich ecosystem of libraries, including ones that offer efficient ways to handle different file formats, protocols, and authentication methods. You can choose the most suitable Python tools to accomplish the task at hand and fulfill your specific requirements, whether you’re downloading from a plain-text CSV file or a complex binary file.
Another reason is portability. You may encounter situations where you’re working on cross-platform applications. In such cases, using Python is a good choice because it’s a cross-platform programming language. This means that Python code can run consistently across different operating systems, such as Windows, Linux, and macOS.
Using Python also offers the possibility of automating your processes, saving you time and effort. Some examples include automating retries if a download fails, retrieving and saving multiple files from URLs, and processing and storing your data in designated locations.
These are just a few reasons why downloading files using Python is better than using traditional command-line tools. Depending on your project requirements, you can choose the approach and library that best suits your needs. In this tutorial, you’ll learn approaches to some common scenarios requiring file retrievals.
Downloading a File From a URL in Python
In this section, you’ll learn the basics of downloading a ZIP file containing gross domestic product (GDP) data from the World Bank Open Data platform. You’ll use two common tools in Python, urllib
and requests
, to download GDP by country.
While the urllib
package comes with Python in its standard library, it has some limitations. So, you’ll also learn to use a popular third-party library, requests
, that offers more features for making HTTP requests. Later in the tutorial, you’ll see additional functionalities and use cases.
Using urllib
From the Standard Library
Python ships with a package called urllib
, which provides a convenient way to interact with web resources. It has a straightforward and user-friendly interface, making it suitable for quick prototyping and smaller projects. With urllib
, you can perform different tasks dealing with network communication, such as parsing URLs, sending HTTP requests, downloading files, and handling errors related to network operations.
As a standard library package, urllib
has no external dependencies and doesn’t require installing additional packages, making it a convenient choice. For the same reason, it’s readily accessible for development and deployment. It’s also cross-platform compatible, meaning you can write and run code seamlessly using the urllib
package across different operating systems without additional dependencies or configuration.
The urllib
package is also very versatile. It integrates well with other modules in the Python standard library, such as re
for building and manipulating regular expressions, as well as json
for working with JSON data. The latter is particularly handy when you need to consume JSON APIs.
In addition, you can extend the urllib
package and use it with other third-party libraries, like requests
, BeautifulSoup
, and Scrapy
. This offers the possibility for more advanced operations in web scraping and interacting with web APIs.
To download a file from a URL using the urllib
package, you can call urlretrieve()
from the urllib.request
module. This function fetches a web resource from the specified URL and then saves the response to a local file. To start, import urlretrieve()
from urlllib.request
:
>>> from urllib.request import urlretrieve
Next, define the URL that you want to retrieve data from. If you don’t specify a path to a local file where you want to save the data, then the function will create a temporary file for you. Since you know that you’ll be downloading a ZIP file from that URL, go ahead and provide an optional path to the target file:
>>> url = (
... "https://api.worldbank.org/v2/en/indicator/"
... "NY.GDP.MKTP.CD?downloadformat=csv"
... )
>>> filename = "gdp_by_country.zip"
Because your URL is quite long, you rely on Python’s implicit concatenation by splitting the string literal over multiple lines inside parentheses. The Python interpreter will automatically join the separate strings on different lines into a single string. You also define the location where you wish to save the file. When you only provide a filename without a path, Python will save the resulting file in your current working directory.
Then, you can download and save the file by calling urlretrieve()
and passing in the URL and optionally your filename:
Read the full article at https://realpython.com/python-download-file-from-url/ »
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]