socket parses the contents of an HTTP request
reasoning
1. Parsing the header of an HTTP request
HTTP request header terminator line "\r\n", you can read the content of the HTTP request header by line, if you read a line "\r\n", it means the end of the HTTP request header.
2. The request header contains the Content-Length parameter.
If there is a Content-Length parameter inside the HTTP request, it means that the size of the content of the HTTP request is determined, and the request directly reads the value of Content-Length, and then reads the content of the corresponding bytes.
3. The request header contains the Transfer-Encoding: chunked parameter.
If there is a Transfer-Encoding parameter inside the HTTP request, it means that the size of the content of the HTTP request is uncertain, and the terminator of this content is "0\r\n\r\n", so you can read the content of the HTTP request by line, and if you read "0\r\n" and "\r\n" consecutively, it means that the content has been read.
code implementation
In the code: self._file stands for ()
def get_http_content(self): content_length = 0 transfer_encoding = False while True: req_line = self._file.readline() req_line = str(req_line, "utf-8") # Encounter http header terminator # Read http content if req_line == "\r\n": if content_length != 0: content = self._file.read(content_length) content = str(content, "utf-8") self._content = content return None if transfer_encoding: content = "" self._file.readline() while True: line = self._file.readline() line = str(line, "utf-8") if line == "0\r\n": sub_line = self._file.readline() sub_line = str(sub_line, "utf-8") if sub_line == "\r\n": self._content = content return None else: content += line continue self._content = False # The header file doesn't end # and no field for content size was found else: if content_length == 0 and transfer_encoding is False: words = req_line.split() if words[0] == "Content-Length:": content_length = int(words[1]) if words[0] == "Transfer-Encoding:": transfer_encoding = True self._content = False
socket simulate http request
# coding: utf-8 import socket from import urlparse def get_url(url): url = urlparse(url) host = path = if path == "": path = "/" # Establish a socket connection client = (socket.AF_INET, socket.SOCK_STREAM) ((host, 80)) ("GET {} HTTP/1.1\r\nHost:{}\r\nConnection:close\r\n\r\n".format(path, host).encode("utf-8")) data = b"" while True: d = (1024) if d: data += d else: break data = ("utf-8") html_data = ("\r\n\r\n")[1] print(html_data) () pass if __name__ == '__main__': get_url("")
The above is a personal experience, I hope it can give you a reference, and I hope you can support me more.