python realize wallpaper batch download code example

Project Address:/jrainlau/wallpaper-downloader

preamble

It's been a while since I've written an article, as I've been adapting to my new position and using my free time to learn python. this article is a summary of a recent python learning phase, which involved the development of a crawler to batch download HD wallpapers from a wallpaper website.

Note: The project to which this article belongs is for python learning only, and is strictly forbidden to be used for any other purpose!

Initialization Project

The project uses thevirtualenvto create a virtual environment and avoid polluting the global. Use thepip3Just download it directly:

pip3 install virtualenv

Then create a newwallpaper-downloaderdirectory, use thevirtualenvCreate a file namedvenvof the virtual environment:

virtualenv venv
. venv/bin/activate

Next create the dependency directory:

echo bs4 lxml requests >

Finally yun just download and install the dependencies:

pip3 install -r

Analyzing Crawler Work Steps

For simplicity's sake, let's go directly to the wallpaper listing page categorized as "aero": /aer.....

As you can see, there are a total of 10 wallpapers available for download on this page. But because here are thumbnails, as a wallpaper for the clarity is far from enough, so we need to enter the wallpaper details page, to find high-definition download links. From the first wallpaper point in, you can see a new page:

Since my machine has a Retina screen, I'm going to just download the one with the largest volume for HD (volume shown in red circle).

Understand the specific steps, it is through the developer tools to find the corresponding dom node, extract the corresponding url can be, this process is no longer unfolded, the reader to try on their own can be, the following into the coding section.

access page

Create a newfile and then introduce two libraries:

from bs4 import BeautifulSoup
import requests

Next, write a function dedicated to accessing the url and then returning the page html:

def visit_page(url):
 headers = {
  'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36'
 }
 r = (url, headers = headers)
  = 'utf-8'
 soup = BeautifulSoup(, 'lxml')
 return soup

In order to prevent being hit by the website's anti-climbing mechanism, so we need to disguise the crawler as a normal browser by adding UA to the header, then specify the utf-8 encoding, and finally return the html in string format.

Extract link

After getting the html of the page, we need to extract the url of the wallpaper list of the page:

def get_paper_link(page):
 links = ('#content > div > ul > li > div > div a')
 collect = []
 for link in links:
  (('href'))
 return collect

This function will fetch the url of all wallpaper details on the list page.

Download wallpapers

Once we have the address of the detail page, we can go in and pick the right size. After analyzing the dom structure of the page, we can see that each size corresponds to a link:

So the first step is to extract the links corresponding to these sizes:

wallpaper_source = visit_page(link)
wallpaper_size_links = wallpaper_source.select('#wallpaper-resolutions > a')
size_list = []
for link in wallpaper_size_links:
 href = ('href')
 size_list.append({
  'size': eval(link.get_text().replace('x', '*')),
  'name': ('/download/', ''),
  'url': href
 })

size_listis a collection of these links. To make it easier to select the highest definition (largest) wallpaper next, thesizeI used theevalmethod, it's straightforward to put here5120x3200Give the calculation assizeThe value of the

Once you've got all the collections, you can use themax()The method picks the clearest one out:

biggest_one = max(size_list, key = lambda item: item['size'])

this onebiggest_oneamongurlis the download link for the corresponding size, then you just need to download it via therequestsThe library downloads the linked resources just fine:

result = (PAGE_DOMAIN + biggest_one['url'])
if result.status_code == 200:
 open('wallpapers/' + biggest_one['name'], 'wb').write()

Note that first you need to create a root directory with awallpapersdirectory, otherwise the runtime will report an error.

Organized and completedownload_wallpaperThe function looks like this:

def download_wallpaper(link):
 wallpaper_source = visit_page(PAGE_DOMAIN + link)
 wallpaper_size_links = wallpaper_source.select('#wallpaper-resolutions > a')
 size_list = []
 for link in wallpaper_size_links:
  href = ('href')
  size_list.append({
   'size': eval(link.get_text().replace('x', '*')),
   'name': ('/download/', ''),
   'url': href
  })
 biggest_one = max(size_list, key = lambda item: item['size'])
 print('Downloading the ' + str(index + 1) + '/' + str(total) + ' wallpaper: ' + biggest_one['name'])
 result = (PAGE_DOMAIN + biggest_one['url'])

 if result.status_code == 200:
  open('wallpapers/' + biggest_one['name'], 'wb').write()

batch run

The above steps are only able to downloadFirst wallpaper list page(used form a nominal expression)First wallpaper. If we want to downloadMultiple listing pages(used form a nominal expression)All wallpapersWe'll need to call these methods in a loop. First we define a couple constants:

import sys
if len() != 4:
 print('3 arguments were required but only find ' + str(len() - 1) + '!')
 exit()
category = [1]
try:
 page_start = [int([2])]
 page_end = int([3])
except:
 print('The second and third arguments must be a number but not a string!')
 exit()

Here three constants are specified by getting command line argumentscategory, page_startcap (a poem)page_end, corresponding to the wallpaper category, the start page number, and the end page number, respectively.

For convenience, define two more url-related constants:

PAGE_DOMAIN = ''
PAGE_URL = '/' + category + '-desktop-wallpapers/page/'

The next step is to happily do batch operations, before we do that let's define astart()Startup Functions:

def start():
 if page_start[0] <= page_end:
  print('Preparing to download the ' + str(page_start[0]) + ' page of all the "' + category + '" wallpapers...')
  PAGE_SOURCE = visit_page(PAGE_URL + str(page_start[0]))
  WALLPAPER_LINKS = get_paper_link(PAGE_SOURCE)
  page_start[0] = page_start[0] + 1
  for index, link in enumerate(WALLPAPER_LINKS):
   download_wallpaper(link, index, len(WALLPAPER_LINKS), start)

Then put the previousdownload_wallpaperfunction is rewritten a bit more:

def download_wallpaper(link, index, total, callback):
 wallpaper_source = visit_page(PAGE_DOMAIN + link)
 wallpaper_size_links = wallpaper_source.select('#wallpaper-resolutions > a')
 size_list = []
 for link in wallpaper_size_links:
  href = ('href')
  size_list.append({
   'size': eval(link.get_text().replace('x', '*')),
   'name': ('/download/', ''),
   'url': href
  })
 biggest_one = max(size_list, key = lambda item: item['size'])
 print('Downloading the ' + str(index + 1) + '/' + str(total) + ' wallpaper: ' + biggest_one['name'])
 result = (PAGE_DOMAIN + biggest_one['url'])
 if result.status_code == 200:
  open('wallpapers/' + biggest_one['name'], 'wb').write()

 if index + 1 == total:
  print('Download completed!\n\n')
  callback()

Finally, specify the startup rules:

if __name__ == '__main__':
  start()

Running Projects

Start the test by entering the following code at the command line:

python3  aero 1 2

The following output can then be seen:

Take charles and grab the packet and you can see that the script is running smoothly:

At this point, the download script has been developed, and you finally don't have to worry about wallpaper drought!

The above is the whole content of this for you to organize, you have any questions can be discussed in the message area below, thank you for your support.