SoFunction
Updated on 2024-10-29

How Python socket parses HTTP request content

socket parses the contents of an HTTP request

reasoning

1. Parsing the header of an HTTP request

HTTP request header terminator line "\r\n", you can read the content of the HTTP request header by line, if you read a line "\r\n", it means the end of the HTTP request header.

2. The request header contains the Content-Length parameter.

If there is a Content-Length parameter inside the HTTP request, it means that the size of the content of the HTTP request is determined, and the request directly reads the value of Content-Length, and then reads the content of the corresponding bytes.

3. The request header contains the Transfer-Encoding: chunked parameter.

If there is a Transfer-Encoding parameter inside the HTTP request, it means that the size of the content of the HTTP request is uncertain, and the terminator of this content is "0\r\n\r\n", so you can read the content of the HTTP request by line, and if you read "0\r\n" and "\r\n" consecutively, it means that the content has been read.

code implementation

In the code: self._file stands for ()

 def get_http_content(self):
        content_length = 0
        transfer_encoding = False
        while True:
            req_line = self._file.readline()
            req_line = str(req_line, "utf-8")
 
            # Encounter http header terminator
            # Read http content
            if req_line == "\r\n":
                if content_length != 0:
                    content = self._file.read(content_length)
                    content = str(content, "utf-8")
                    self._content = content
                    return None
 
                if transfer_encoding:
                    content = ""
                    self._file.readline()
                    while True:
                        line = self._file.readline()
                        line = str(line, "utf-8")
                        if line == "0\r\n":
                            sub_line = self._file.readline()
                            sub_line = str(sub_line, "utf-8")
                            if sub_line == "\r\n":
                                self._content = content
                                return None
                        else:
                            content += line
                            continue
                    self._content = False
 
            # The header file doesn't end
            # and no field for content size was found
            else:
                if content_length == 0 and transfer_encoding is False:
                    words = req_line.split()
                    if words[0] == "Content-Length:":
                        content_length = int(words[1])
                    if words[0] == "Transfer-Encoding:":
                        transfer_encoding = True
 
            self._content = False

socket simulate http request

# coding: utf-8
import socket
from  import urlparse
def get_url(url):
    url = urlparse(url)
    host = 
    path = 
    if path == "":
        path = "/"
    # Establish a socket connection
    client = (socket.AF_INET, socket.SOCK_STREAM)
    ((host, 80))
    ("GET {} HTTP/1.1\r\nHost:{}\r\nConnection:close\r\n\r\n".format(path, host).encode("utf-8"))
    data = b""
    while True:
        d = (1024)
        if d:
            data += d
        else:
            break
    data = ("utf-8")
    html_data = ("\r\n\r\n")[1]
    print(html_data)
    ()
    pass
if __name__ == '__main__':
    get_url("")

The above is a personal experience, I hope it can give you a reference, and I hope you can support me more.