SoFunction
Updated on 2024-10-30

Detailed process for rapid deployment of Scrapy project scrapyd

Rapid deployment of Scrapy projects scrapyd

Install scrapyd for the server

pip install scrapyd  -i /simple

(of a computer) run

scrapyd

在这里插入图片描述

Modify configuration items for remote access

Use Ctrl +c to stop the previous run of scrapyd

In the path where you want to run the scrapyd command, create a new file File

Enter the following

[scrapyd]
# IP address on which the web and Json services listen, default is 127.0.0.1 (only change it to 0.0.0.0 to be able to access the server after scrapyd is running on other computers)
bind_address = 0.0.0.0
# The port to listen on, default is 6800
http_port   = 6800
# Whether to turn on debug mode, default is off
debug = off

在这里插入图片描述

install scrapyd-client

1. Install scrapy-client with the following commands

pip install scrapyd-client  -i /simple

Configuring a Scrapy Project

Modify File

在这里插入图片描述

1 Checking the configuration

scrapyd-deploay -l

Publish the scrapy project to the server where scrapyd resides (the crawler is not running at this point)

# scrapyd-deploy <target> -p <project> --version <version>
# taget : This is the name of the target after the deploy in the previous configuration file, e.g. ubuntu1.
# project: optional, recommended to have the same name as the scrapy crawler project.
# version: custom version number defaults to the current timestamp if not written, usually not written
scrapyd-deploy ubuntu-1 -p douban

take note of

Don't put irrelevant py files in the crawler directory, putting irrelevant py files will lead to publishing failure, but when the crawler is published successfully, it will generate a file in the current directory, which can be deleted.

4. Send the command to run the crawler

curl http://10.211.55.5:6800/ -d project=douban -d spider=top250

5. Stop the crawler

curl http://ip:6800/ -d project=project name -d job=id value of the job

在这里插入图片描述

curl http://10.211.55.5:6800/ -d project=douban -d job=121cc034388a11ebb1a7001c42d0a249

在这里插入图片描述

take note of

  1. If the scrapy project code is modified, just republish it to the scrapyd server.
  2. If the scrapy project is paused, it is possible to pass it again through thecurlThe commands are sent in a manner that allows it to "break and crawl."

Scrapy Project Deployment - Graphically Manipulating Gerapy

I. Description

Gerapy is a national development of crawler management software ( with Chinese interface ) is a visual tool to manage the crawler project , the project deployment to the management of the operation of all become interactive , to achieve batch deployment , more convenient to control , management , real-time view results .

The relationship between gerapy and scrapyd is that we can open the crawler directly through the GUI without using commands after configuring scrapyd in gerapy.

II. Installation

command (installed on the crawler code upload side)

pip install gerapy  -i /simple

beta (software)

在这里插入图片描述

III. Use

Create a gerapy working directory

gerapy init

Generate the folder as follows

在这里插入图片描述

Create a splite database to store the version of the scrapy project to be deployed.

gerapy migrate

After successful creation, use the tree command to view the current list of files

Creating user passwords

在这里插入图片描述

Starting services

gerapy runserver

to this article on the rapid deployment of Scrapy project scrapyd article is introduced to this, more related Scrapy project scrapyd content please search for my previous articles or continue to browse the following related articles I hope you will support me in the future more!