SoFunction
Updated on 2025-03-01

How to use golang to parse web page weapon goquery

Preface

This article mainly introduces the relevant content about the use of golang analysis web tool goquery, and shares it for your reference and learning. I won’t say much below, let’s take a look at the detailed introduction together.

Using Jsoup in Java and Cheerio in nodejs can all be used to parse web pages very conveniently. I also found a powerful tool for web page parsing in golang language, which is quite useful. The selector is the same as jQuery.

Install

go get /PuerkitoBio/goquery

use

It's actually the demo in the project

package main

import (
 "fmt"
 "log"

 "/PuerkitoBio/goquery"
)

func ExampleScrape() {
 doc, err := ("")
 if err != nil {
 (err)
 }

 // Find the review items
 (".sidebar-reviews article .content-block").Each(func(i int, s *) {
 // For each item found, get the band and title
 band := ("a").Text()
 title := ("i").Text()
 ("Review %d: %s - %s\n", i, band, title)
 })
}

func main() {
 ExampleScrape()
}

Garbage code problem

There will be garbled problems with Chinese web pages because it is utf8 encoding by default, and then you will need to use the transcoder.

Install iconv-go

go get /djimenez/iconv-go

How to use

func ExampleScrape() {
 res, err := (baseUrl)
 if err != nil {
 (())
 } else {
 defer ()
 utfBody, err := (, "gb2312", "utf-8")
 if err != nil {
  (())
 } else {
  doc, err := (utfBody)
  // You can use doc to obtain the structure data in the web page  // for example  ("li").Each(func(i int, s *) {
  (i, ())
  })
 }
 }
}

Advanced

Some websites will set cookies, Referer and other verifications, and can set the requested header information before sending http request.

This is not something in goquery. If you want to know more, you can check the methods under the net/http package in golang and other information.

baseUrl:=""
client:=&{}
req, err := ("GET", baseUrl, nil)
("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36")
("Referer", baseUrl)
("Cookie", "your cookie") // You can also set cookies through ()res, err := (req)
defer ()
//In the end, you can directly pass res to goquery to parse the web pagedoc, err := (res)

Summarize

The above is the entire content of this article. I hope that the content of this article has certain reference value for everyone's study or work. If you have any questions, you can leave a message to communicate. Thank you for your support.

refer to

  • /PuerkitoBio/goquery
  • /PuerkitoBio/goquery/issues/185
  • /PuerkitoBio/goquery/wiki/Tips-and-tricks#handle-non-utf8-html-pages