SoFunction
Updated on 2025-03-02

Detailed explanation of backreference for regular expression learning tutorial

This article describes the backreference of regular expression backreference. Share it for your reference, as follows:

In all examples, the regular expression matching result is included between [and] in the source text. Some examples will be implemented using Java. If it is the usage of the regular expression of Java itself, it will be explained in the corresponding place. All java examples were tested and passed under JDK1.6.0_13.

1. Problem introduction

A question about matching title tags (H1-H6) in HTML pages:

text:

<body>
<h1>Welcome to my page</H1>
Content is divided into twosections:<br>
<h2>Introduction</h2>
Information about me.
<H2>Hobby</H2>
Information about my hobby.
<h2>This is invalid HTML</h3>
</body>

Regular expression:<[hH][1-6]>.*?</[hH][1-6]>

result:

<body>
【<h1>Welcome to my page</H1>】
Content is divided into twosections:<br>
【<h2>Introduction</h2>】
Information about me.
【<H2>Hobby</H2>】
Information about my hobby.
【<h2>This is invalid HTML</h3>】
</body>

Analysis: The pattern <[hH][1-6]> matches the start tag of any level title, and is case-insensitive. In this example, it matches <h1>, <h2>, </[hH][1-6]> matches</h1>, </h2>, </h3>; here, lazy metacharacters are used to match the text in the tag, otherwise the content from the first start tag to the last end tag will be matched. However, from the results, we can see that an invalid tag also matches, that is, <h2></h3>, and they cannot be paired at all. To solve this problem, you need to use backreference.

2. Backtracking reference matching

A backtracking reference refers to a subexpression defined in the first half of the pattern.As for the use, division and reference of subexpressions, it has been introduced earlier. Now let’s solve the previous example:

text:

<body>
<h1>Welcome to my page</H1>
Content is divided into twosections:<br>
<h2>Introduction</h2>
Information about me.
<H2>Hobby</H2>
Information about my hobby.
<h2>This is invalid HTML</h3>
</body>

Regular expression:<[hH]([1-6])>.*?</[hH]\1>

result:

<body>
【<h1>Welcome to my page</H1>】
Content is divided into twosections:<br>
【<h2>Introduction</h2>】
Information about me.
【<H2>Hobby</H2>】
Information about my hobby.
<h2>This is invalid HTML</h3>

Analysis: First, match the pattern of the start title tag <[hH]([1-6])>, use brackets to use [1-6] as a subexpression, and the matching end title tag pattern is </[hH]\1>, where \1 means referring to the first subexpression, that is, ([1-6]). If ([1-6]) matches 1, then \1 also matches 1. If 2 matches, then \1 also matches 2, so the last invalid title tag will not be matched.

PS: Here are two very convenient regular expression tools for your reference:

JavaScript regular expression online testing tool:
http://tools./regex/javascript

Regular expression online generation tool:
http://tools./regex/create_reg

For more information about JavaScript, readers who are interested in reading this site's special topic:JavaScript regular expression skills》、《Summary of JavaScript replacement operation skills》、《Summary of JavaScript search algorithm skills》、《Summary of JavaScript data structure and algorithm techniques》、《JavaScript traversal algorithm and skills summary》、《Summary of json operation skills in JavaScript》、《Summary of JavaScript Errors and Debugging Skills"and"Summary of JavaScript mathematical operations usage

I hope this article will be helpful to everyone's JavaScript programming.