Rapid deployment of Scrapy projects scrapyd
Install scrapyd for the server
pip install scrapyd -i /simple
(of a computer) run
scrapyd
Modify configuration items for remote access
Use Ctrl +c to stop the previous run of scrapyd
In the path where you want to run the scrapyd command, create a new file File
Enter the following
[scrapyd]
# IP address on which the web and Json services listen, default is 127.0.0.1 (only change it to 0.0.0.0 to be able to access the server after scrapyd is running on other computers)
bind_address = 0.0.0.0
# The port to listen on, default is 6800
http_port = 6800
# Whether to turn on debug mode, default is off
debug = off
install scrapyd-client
1. Install scrapy-client with the following commands
pip install scrapyd-client -i /simple
Configuring a Scrapy Project
Modify File
1 Checking the configuration
scrapyd-deploay -l
Publish the scrapy project to the server where scrapyd resides (the crawler is not running at this point)
# scrapyd-deploy <target> -p <project> --version <version> # taget : This is the name of the target after the deploy in the previous configuration file, e.g. ubuntu1. # project: optional, recommended to have the same name as the scrapy crawler project. # version: custom version number defaults to the current timestamp if not written, usually not written scrapyd-deploy ubuntu-1 -p douban
take note of
Don't put irrelevant py files in the crawler directory, putting irrelevant py files will lead to publishing failure, but when the crawler is published successfully, it will generate a file in the current directory, which can be deleted.
4. Send the command to run the crawler
curl http://10.211.55.5:6800/ -d project=douban -d spider=top250
5. Stop the crawler
curl http://ip:6800/ -d project=project name -d job=id value of the job
curl http://10.211.55.5:6800/ -d project=douban -d job=121cc034388a11ebb1a7001c42d0a249
take note of
- If the scrapy project code is modified, just republish it to the scrapyd server.
- If the scrapy project is paused, it is possible to pass it again through the
curl
The commands are sent in a manner that allows it to "break and crawl."
Scrapy Project Deployment - Graphically Manipulating Gerapy
I. Description
Gerapy is a national development of crawler management software ( with Chinese interface ) is a visual tool to manage the crawler project , the project deployment to the management of the operation of all become interactive , to achieve batch deployment , more convenient to control , management , real-time view results .
The relationship between gerapy and scrapyd is that we can open the crawler directly through the GUI without using commands after configuring scrapyd in gerapy.
II. Installation
command (installed on the crawler code upload side)
pip install gerapy -i /simple
beta (software)
III. Use
Create a gerapy working directory
gerapy init
Generate the folder as follows
Create a splite database to store the version of the scrapy project to be deployed.
gerapy migrate
After successful creation, use the tree command to view the current list of files
Creating user passwords
Starting services
gerapy runserver
to this article on the rapid deployment of Scrapy project scrapyd article is introduced to this, more related Scrapy project scrapyd content please search for my previous articles or continue to browse the following related articles I hope you will support me in the future more!