This guide will teach you how to use Python and Selenium libraries to build an automated image engine that automatically takes snapshots of web pages based on specified parameters and store generated images to the cloud. This tool can also receive task instructions through message queues, which is very suitable for application scenarios where web page screenshots need to be processed in batches.
1. Prepare the environment
Make sure you have Python and the necessary libraries installed:
pip install selenium oss2 kafka-python-ng
2. Create a configuration file
Create a simple file to store your OSS and Kafka settings:
[oss] access_key_id = YOUR_OSS_ACCESS_KEY_ID access_key_secret = YOUR_OSS_ACCESS_KEY_SECRET bucket_name = YOUR_BUCKET_NAME endpoint = [kafka] bootstrap_servers = localhost:9092 topic = your_topic_name notify_topic = your_notify_topic consumer_group = your_consumer_group [engine] driver_path = path/to/chromedriver image_path = path/to/screenshots param_path = path/to/params site_base_path =
3. Set up logging
Add basic logging functions to the program for easy debugging:
import logging from import TimedRotatingFileHandler import os logger = ('image_engine') () log_file = 'logs/image_engine.log' ('logs', exist_ok=True) handler = TimedRotatingFileHandler(log_file, when='midnight', backupCount=7, encoding='utf-8') formatter = ('%(asctime)s - %(levelname)s - %(message)s') (formatter) (handler)
4. Initialize Selenium WebDriver
Initialize Chrome WebDriver and set window maximization:
from selenium import webdriver from import Service # Read the configuration fileimport configparser config = () ('') service = Service(('engine', 'driver_path')) driver = (service=service) driver.maximize_window()
5. Image processing logic
Write a function to process each Kafka message, open the specified web page, wait for the page to load, and save the screenshot:
from kafka import KafkaConsumer, KafkaProducer import json import time from datetime import datetime import oss2 def process_task(msg): task_params = () item_id = task_params['itemId'] param_value = task_params['paramValue'] (f"Start processing items【{item_id}】Corresponding parameters【{param_value}】") # Build request link url = f"{('engine', 'site_base_path')}/view?param={param_value}&id={item_id}" (url) try: # Simply wait for the page to load (3) # Adjust or replace with WebDriverWait as needed # Generate screenshot file name today = ().strftime('%Y-%m-%d') screenshot_dir = (('engine', 'image_path'), 'images', today) (screenshot_dir, exist_ok=True) fname = (screenshot_dir, f"{item_id}_{param_value}.png") driver.save_screenshot(fname) (f"Save the screenshot to {fname}") # Upload to OSS (omit the specific implementation and add it according to the actual situation) upload_to_oss(fname) # Send a completion notification notify_completion(item_id, param_value, fname) (f"Complete the processing item【{item_id}】Corresponding parameters【{param_value}】") except Exception as e: (f"Processing items【{item_id}】Corresponding parameters【{param_value}】An exception occurred while: {e}") def upload_to_oss(file_path): """Upload files to Alibaba Cloud OSS""" auth = (('oss', 'access_key_id'), ('oss', 'access_key_secret')) bucket = (auth, ('oss', 'endpoint'), ('oss', 'bucket_name')) remote_path = (file_path, ('engine', 'image_path')) bucket.put_object_from_file(remote_path, file_path) def notify_completion(item_id, param_value, image_path): """Send Complete Notification""" (('kafka', 'notify_topic'), { 'itemId': item_id, 'paramValue': param_value, 'imagePath': image_path })
6. Start Kafka Consumers
Start Kafka consumer, listen for messages and call processing functions:
if __name__ == "__main__": consumer = KafkaConsumer( ('kafka', 'topic'), bootstrap_servers=('kafka', 'bootstrap_servers').split(','), group_id=('kafka', 'consumer_group'), auto_offset_reset='latest', enable_auto_commit=True, value_deserializer=lambda m: ('utf-8') ) for msg in consumer: try: process_task(msg) except Exception as ex: (f"An abnormal consumption message: {ex}")
Summarize
With the simplified steps above, you can quickly build a Python and Selenium-based image engine. The engine is able to receive task instructions from Kafka, access designated websites, take page snapshots, and upload screenshots to Alibaba Cloud OSS. This release removes unnecessary complexity and focuses on the implementation of core functions.
This is the end of this article about using Python and Selenium to build an automated image engine. For more related content on Python Selenium to build an image engine, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!