SoFunction
Updated on 2024-12-19

Python applet crawls today's news take it away and use it

core code

Download html page
Analyzing html content

from requests import get
from bs4 import BeautifulSoup as bs
from datetime import datetime as dt
def Today(style=1):
    date = ()
    if style!=1: return f'{}moon{}date'
    return f'{}-{:02}-{:02}' 
def SinaNews(style=1):
    url1 = 'http://news.***./'
    if style==1: url1 += 'world'
    elif style==2: url1 += 'china'
    else: url1='/'
    text = get(url1)
    ='uft-8'
    soup = bs(,'')
    aTags = soup.find_all("a")
    return [(,t['href']) for t in aTags if Today() in str(t)]

Crawl Title

for i,news in enumerate(SinaNews(1)):
    print(f'No{i+1}:',news[0])

    
No1: foreign news media:*****
No2: Japanese news media:******
......

......

The content is mosaic!!!

The first time to do crawler, in order to facilitate the start to find a do not have to crack the web page of a news site, download the web page can be obtained directly from the content. One of the international, domestic and military news three web pages as a content source, download the web page, analyze the resulting html text, all <a href=... >Mark with date just what is needed.

Crawl Text

Then according to the url to download the body of the web page, the analysis can be seen id='article' <div> layer is the body of the location, .get_text () is the key function to obtain the text, and then appropriate to do some formatting:

>>> def NewsDownload(url):
    html = get(url)
    ='uft-8'
    soup = bs(,'')
    text = ('div',id='article').get_text().strip()
    text = ('Click to go to topic:',' Related topics:')
    text = ('','\n')
    while '\n\n\n' in text:
        text = ('\n\n\n','\n\n')
    return text 
>>> url = 'https://******/w/2021-09-29/'
>>> NewsDownload(url)
'Original title: ******************************************************'
>>> 

interface code

Use the built-in GUI library tkinter to control Text, Listbox, Scrollbar, Button. set basic properties, placement, bind commands, and then debug to program completion!

Source code: The names of the websites involved have been mosaicked!

from requests import get
from bs4 import BeautifulSoup as bs
from datetime import datetime as dt
from os import path
import tkinter as tk 
def Today(style=1):
    date = ()
    if style!=1: return f'{}moon{}date'
    return f'{}-{:02}-{:02}'
def SinaNews(style=1):
    url1 = 'http://news.****./'
    if style==1: url1 += 'world'
    elif style==2: url1 += 'china'
    else: url1='https://mil.****./'
    text = get(url1)
    ='uft-8'
    soup = bs(,'')
    aTags = soup.find_all("a")
    return [(,t['href']) for t in aTags if Today() in str(t)] 
def NewsList(i):
    global news
    news = SinaNews(i)
    (0,)
    for idx,item in enumerate(news):
        (,f'{idx+1:03} {item[0]}')
    (state=)
    (0.0,)
    (state=)
    NewsShow(0)   
def NewsList1(): NewsList(1)
def NewsList2(): NewsList(2)
def NewsList3(): NewsList(3) 
def NewsShow(idx):
    if idx!=0:
        idx = ()[0]
    title,url = news[idx][0],news[idx][1]
    html = get(url)
    ='uft-8'
    soup = bs(,'')
    text = ('div',id='article').get_text().strip()
    text = ('Click to go to topic:',' Related topics:')
    text = ('','\n')
    while '\n\n\n' in text:
        text = ('\n\n\n','\n\n')
    (state=)
    (0.0,)
    (, title+'\n\n'+text)
    (state=)   
def InitWindow(self,W,H):
    Y = self.winfo_screenheight()
    winPosition = str(W)+'x'+str(H)+'+8+'+str(Y-H-100)
    (winPosition)
    icoFile = ''
    f = (icoFile)
    if f: (icoFile)
    (False,False)
    self.wm_attributes('-topmost',True)
    (bTitle[0])
    SetControl()
    ()
    ()
def SetControl():
    global tList,tText
    tScroll = (win, orient=)
    (x=450,y=320,height=300)
    tList = (win,selectmode=,yscrollcommand=)
    (command=)
    for idx,item in enumerate(news):
        (,f'{idx+1:03} {item[0]}')
    (x=15,y=320,width=435,height=300)
    tList.select_set(0)
    ()
    bW,bH = 70,35    The width and height of the #button
    bX,bY = 95,270    # Coordinates of the button
    tBtn1 = (win,text=bTitle[1],command=NewsList1)
    (x=bX,y=bY,width=bW,height=bH)
    tBtn2=(win,text=bTitle[2],command=NewsList2)
    (x=bX+100,y=bY,width=bW,height=bH)
    tBtn3 = (win,text=bTitle[3],command=NewsList3)
    (x=bX+200,y=bY,width=bW,height=bH)
    tScroll2 = (win, orient=)
    (x=450,y=10,height=240)
    tText = (win,yscrollcommand=)
    (command=)
    (x=15,y=10,width=435,height=240)
    (state=,bg='azure',font=('Song Style', '14'))
    NewsShow(0)
    ("<Double-Button-1>",NewsShow)
if __name__=='__main__':
    win = ()
    bTitle = ('News of the Day','International News','Domestic news','Military News')
    news = SinaNews()
    InitWindow(win,480,640)
 

We will not analyze the code in detail, please leave a comment if you need to discuss. My environment Win7+Python3.8.8 can run without error! The names of the websites involved in the article have been mosaic, if you can't guess the name, you can ask me privately.

software compilation

Use compile into a single runtime file, note that the source file should be suffixed with .pyw otherwise a cmd black window will appear. There is also a small knowledge point, any website Logo icon icon file, generally can be downloaded in the root directory, ie:
http(s)://(.cn)/

The compilation command is as follows:

D:\>pyinstaller --onefile --nowindowed --icon="D:\"

After the compilation is completed, an executable file is generated in the dist folder, the size of about 15M is acceptable.

You can use it as soon as you take it.

Above is the Python applet crawl today's news take away to use the details, more information about Python applet please pay attention to my other related articles!