Local Knowledge-Based Q&A Robot langchain-ChatGLM Large Language Model Implementation Approach in Detail

contexts

After ChatGPT caught fire, various Large Language Model (LLM) models have been released, and the ones that are fully open source are ChatGLM, BLOOM, LLaMA, etc. But the knowledge learned by these models is lagging (e.g., ChatGPT's knowledge is as of 2021), and the knowledge is in a generalized domain.

In real-world application scenarios, most robots are designed to accomplish specific tasks, with the exception of idle chatting robots. For example, digital human virtual anchors, intelligent customer service of a company, etc. are required toCentered around a specific businessto have a Q&A. HowIntegration of business-specific knowledge into a large language model, is an important issue to consider for the landing application of Q&A bots.

I. Introduction to langchain-ChatGLM

langchain-ChatGLM is a local knowledge-based Q&A bot, where users are free to configure local knowledge and answers to user questions are generated based on local knowledge. github link is:GitHub - imClumsyPanda/langchain-ChatGLM: langchain-ChatGLM, local knowledge based ChatGLM with langchain ｜ local knowledge-based ChatGLM question and answer。

Second, taobao clothes as an example, test the effect of Q&A

Build local knowledge with Taobao clothing attributes to test the Q&A effect. Taobao link for /?abbucket=6&id=656544342321&ns=1&spm=a230r.1.14.48.b3f84f64A9YLJ0's "Baby Details" and "Size Recommendations" are organized into "local_knowledge_clothing_describe.txt", which reads as follows describe.txt", the content is as follows:

(a person's) height：160-170cm， weight：90-115catty，Suggested SizeM。
(a person's) height：165-175cm， weight：115-135catty，Suggested SizeL。
(a person's) height：170-178cm， weight：130-150catty，Suggested SizeXL。
(a person's) height：175-182cm， weight：145-165catty，Suggested Size2XL。
(a person's) height：178-185cm， weight：160-180catty，Suggested Size3XL。
(a person's) height：180-190cm， weight：180-210catty，Suggested Size4XL。
(a person's) height：180-195cm， weight：210-230catty，Suggested Size5XL。
Fabric Classification：polyester fiber
motifs：几何motifs
neckband：hooded
lapel：zippers
color：ferrous302Spring and Autumn（slash pocket） dark gray302Spring and Autumn（slash pocket） blue (color)302Spring and Autumn（slash pocket） beige302Spring and Autumn（slash pocket） ferrous303Spring and Autumn（Workwear Pockets） dark gray303Spring and Autumn（Workwear Pockets） blue (color)303Spring and Autumn（Workwear Pockets） ferrous303（Winter Cotton Jacket Thickened） dark gray303（Winter Cotton Jacket Thickened） blue (color)303（Winter Cotton Jacket Thickened） ferrous302（Winter Cotton Jacket Thickened） beige302（Winter Cotton Jacket Thickened） ferrous303Spring and Autumn+ferrous302Spring and Autumn dark gray303Spring and Autumn+ferrous302Spring and Autumn
sleeve pattern：cuffed sleeve
Applicable seasons：Spring and Autumn Annals, chronicle of Lu State (722-481 BC)
sleeve length：long-sleeved shirt
thickness：conventional (weapons)
Applicable Scenarios：Other leisure
length of clothing：conventional (weapons)款
shape：loose-fitting
Style Details：lit. false two-piece suit
process technology：protective layer
Applicable objects：adolescents
Fabric Features：windbreaks
outfit：wear outside
Material composition：polyester fiber100%

Load "local_knowledge_clothing_describe.txt" as local knowledge and the quiz effect is shown below:

subscribers: Height 173, weight 144 pounds, what size to choosefirst aid: If you are 173 cm tall and weigh 144 pounds, the recommended size is XL. This size is for people who are between 170-178 cm tall and weigh between 130-150 pounds. Choosing size XL ensures that the garment will cover your body appropriately, while not being too small or too large to ensure a comfortable fit.subscribers: What the clothes are made offirst aid: The material composition of the garment is 100% polyester.

From the test of the question and answer results, the basic requirements are satisfied, the specific situation still need more real test results to illustrate.

III. Principles of langchain-ChatGLM

To start with, the architecture of langchain-ChatGLM is shown below:

Fig. 1, Architecture of langchain-ChatGLM

3.1. get the context related to user query from local knowledge

Through Faiss search to get and query_embedding the closest relationship with the Top K Document, the content of these Documents by line breaks put together as the context of the query. an example is shown in the following figure:

Figure 2: Getting query-related context through Faiss search

3.2. Use query and context to fill the template to get prompt

Sample templates are as follows, which can be modified according to the Q&A effect

prompt_template = &quot;&quot;&quot;Based on the following known information，Answer user questions in a concise and professional manner。No fabrications in the answers are allowed。
    What is known:
    {context}
    concern:
    {query}&quot;&quot;&quot;

Populate the context and query into the prompt that the template gets, add it input to ChatGLM-6B to generate the response.

IV. langchain-ChatGLM in use

V. Q&A needs to be optimized

Personally, I think langchain-ChatGLM is a framework for Q&A using local knowledge, and its actual Q&A effectiveness is related to the following two questions:

1, how to get query relevance high context, that is, as much as possible with the query related Document can be recalled;

2、How to make LLM get high quality response based on query and context.

5.1. Let query-related Documents be recalled as much as possible

When slicing local knowledge into Documents, we need to consider the interplay between the length of the Document, the quality of the Document embedding and the number of recalled Documents. In the case that the text slicing algorithm is not so intelligent, the content of local knowledge is better structured, and the semantic association between paragraphs is not so strong.In the case that the Document is shorter, the quality of the obtained Document embedding may be higher, and the relevance of the Document and query obtained through Faiss will be higher.

The prerequisite for using Faiss for searching is the availability of high-quality text vectorization tools. Therefore it is desirable to be able to Finetune the textual vectorization tool based on local knowledge.Alternatively it is possible to consider combining ES search results with Faiss results.

5.2. Based on query and context, LLM can get high quality response.

With query-related context, how to make LLM generate high-quality responses is also a very important issue. There are two optimization points: ①, try more than one prompt template, choose a suitable one, but this may be a bit metaphysical; ②, use the corpus related to the local knowledge query to Finetune the LLM.

Addendum: ChatGML models have higher hardware requirements, requiring more than 32G of RAM, and the semi-precision ChatGLM2-6B model requires roughly 13GB of RAM.

Check the Dedicated GPU Memory: as ChatGLM2 memory requires 6GB or more to fulfill the performance requirements.

The table below shows ChatGLM2-2B's Q&A performance based on video memory size.

quantitative rating	Minimum GPU memory (inference)	Minimum GPU memory (efficient parameter fine-tuning)
FP16 (unquantized)	13 GB	14 GB
INT8	8 GB	9 GB
INT4	6 GB	7 GB