core code
Download html page
Analyzing html content
from requests import get from bs4 import BeautifulSoup as bs from datetime import datetime as dt def Today(style=1): date = () if style!=1: return f'{}moon{}date' return f'{}-{:02}-{:02}' def SinaNews(style=1): url1 = 'http://news.***./' if style==1: url1 += 'world' elif style==2: url1 += 'china' else: url1='/' text = get(url1) ='uft-8' soup = bs(,'') aTags = soup.find_all("a") return [(,t['href']) for t in aTags if Today() in str(t)]
Crawl Title
for i,news in enumerate(SinaNews(1)): print(f'No{i+1}:',news[0]) No1: foreign news media:***** No2: Japanese news media:****** ...... ......
The content is mosaic!!!
The first time to do crawler, in order to facilitate the start to find a do not have to crack the web page of a news site, download the web page can be obtained directly from the content. One of the international, domestic and military news three web pages as a content source, download the web page, analyze the resulting html text, all <a href=... >Mark with date just what is needed.
Crawl Text
Then according to the url to download the body of the web page, the analysis can be seen id='article' <div> layer is the body of the location, .get_text () is the key function to obtain the text, and then appropriate to do some formatting:
>>> def NewsDownload(url): html = get(url) ='uft-8' soup = bs(,'') text = ('div',id='article').get_text().strip() text = ('Click to go to topic:',' Related topics:') text = ('','\n') while '\n\n\n' in text: text = ('\n\n\n','\n\n') return text >>> url = 'https://******/w/2021-09-29/' >>> NewsDownload(url) 'Original title: ******************************************************' >>>
interface code
Use the built-in GUI library tkinter to control Text, Listbox, Scrollbar, Button. set basic properties, placement, bind commands, and then debug to program completion!
Source code: The names of the websites involved have been mosaicked!
from requests import get from bs4 import BeautifulSoup as bs from datetime import datetime as dt from os import path import tkinter as tk def Today(style=1): date = () if style!=1: return f'{}moon{}date' return f'{}-{:02}-{:02}' def SinaNews(style=1): url1 = 'http://news.****./' if style==1: url1 += 'world' elif style==2: url1 += 'china' else: url1='https://mil.****./' text = get(url1) ='uft-8' soup = bs(,'') aTags = soup.find_all("a") return [(,t['href']) for t in aTags if Today() in str(t)] def NewsList(i): global news news = SinaNews(i) (0,) for idx,item in enumerate(news): (,f'{idx+1:03} {item[0]}') (state=) (0.0,) (state=) NewsShow(0) def NewsList1(): NewsList(1) def NewsList2(): NewsList(2) def NewsList3(): NewsList(3) def NewsShow(idx): if idx!=0: idx = ()[0] title,url = news[idx][0],news[idx][1] html = get(url) ='uft-8' soup = bs(,'') text = ('div',id='article').get_text().strip() text = ('Click to go to topic:',' Related topics:') text = ('','\n') while '\n\n\n' in text: text = ('\n\n\n','\n\n') (state=) (0.0,) (, title+'\n\n'+text) (state=) def InitWindow(self,W,H): Y = self.winfo_screenheight() winPosition = str(W)+'x'+str(H)+'+8+'+str(Y-H-100) (winPosition) icoFile = '' f = (icoFile) if f: (icoFile) (False,False) self.wm_attributes('-topmost',True) (bTitle[0]) SetControl() () () def SetControl(): global tList,tText tScroll = (win, orient=) (x=450,y=320,height=300) tList = (win,selectmode=,yscrollcommand=) (command=) for idx,item in enumerate(news): (,f'{idx+1:03} {item[0]}') (x=15,y=320,width=435,height=300) tList.select_set(0) () bW,bH = 70,35 The width and height of the #button bX,bY = 95,270 # Coordinates of the button tBtn1 = (win,text=bTitle[1],command=NewsList1) (x=bX,y=bY,width=bW,height=bH) tBtn2=(win,text=bTitle[2],command=NewsList2) (x=bX+100,y=bY,width=bW,height=bH) tBtn3 = (win,text=bTitle[3],command=NewsList3) (x=bX+200,y=bY,width=bW,height=bH) tScroll2 = (win, orient=) (x=450,y=10,height=240) tText = (win,yscrollcommand=) (command=) (x=15,y=10,width=435,height=240) (state=,bg='azure',font=('Song Style', '14')) NewsShow(0) ("<Double-Button-1>",NewsShow) if __name__=='__main__': win = () bTitle = ('News of the Day','International News','Domestic news','Military News') news = SinaNews() InitWindow(win,480,640)
We will not analyze the code in detail, please leave a comment if you need to discuss. My environment Win7+Python3.8.8 can run without error! The names of the websites involved in the article have been mosaic, if you can't guess the name, you can ask me privately.
software compilation
Use compile into a single runtime file, note that the source file should be suffixed with .pyw otherwise a cmd black window will appear. There is also a small knowledge point, any website Logo icon icon file, generally can be downloaded in the root directory, ie:
http(s)://(.cn)/
The compilation command is as follows:
D:\>pyinstaller --onefile --nowindowed --icon="D:\"
After the compilation is completed, an executable file is generated in the dist folder, the size of about 15M is acceptable.
You can use it as soon as you take it.
Above is the Python applet crawl today's news take away to use the details, more information about Python applet please pay attention to my other related articles!