Regular Expression Learning Q&A

Let’s give a simple example. The names of usr, dev and other names under Unix were passed down at that time. Now many people have criticized it. Usr is not a user, dev is not a device, which is difficult to learn and remember. After years of rapid development, many of the problems of that year have been packaged beautifully. Today's users may be more accustomed to clicking icons such as "user directory" and "drive" directly, and no longer have to worry about those irregular short names. Unfortunately, the syntax of regular expressions has not changed much, and even the subsequent added functions have followed the previous grammar style. Today, when programming languages are becoming more and more humanized, it naturally seems very difficult to understand. Today's developers may be more accustomed to writing ('a', 'z') than [a-z]; if they encounter structures like (?![a-z]), they will be even more confused unless they are converted to writing (('a', 'z')).

However, from another perspective, the two are actually the same thing, but the form of expression is different, one is similar to the key points and the other is similar to the plain language. If we can build a transformation from key points to plain language in our minds, the regular expression will be much simpler, and it can even be said to be a module splicing. For example, if the Alipay's relay number is 18 or 26 digits, and it is matched with a regular expression, that is ^([0-9]{18}|[0-9]{26})$, or ^[0-9]{18}([0-9]{8})?$. The logic is very simple: ^ is used to lock the beginning, $ is used to lock the end, [0-9] matches the numeric characters, ([0-9]{18}|[0-9]{26}) represents two parallel options, that is, the length of the numeric string is 18-bit or 26-bit, while [0-9]{18}([0-9]{8})? represents at least 18-bit numeric string, and there may be an 8-bit numeric string after that (so the total length is 26-bit). The general application of regular expressions is that simple.

If you think what you said above is correct, then the only problem of learning regular expressions is left with the right choice. When we learn programming languages, we emphasize that we should not just read books, but write programs by hand. The best way is to input the examples in the book yourself and run them, so that we can truly learn it. But in the eyes of many people, regular expressions may not be considered programming languages, so learning is about clicking on the point, or even satisfied with copying some ready-made expressions from the Internet. Therefore, one of the common questions is "Is there any shortcut to learning?" Unfortunately, the answer is no - since you can't learn to program by copying other people's code, you can't copy ready-made expressions and read a few documents, of course you can't learn regularity. But there is also the news that it doesn't take too long to truly learn regular expressions.

In my experience, when learning regular expressions, what you really need to do is to deeply understand common functions: character groups, multi-select branches, matching patterns, and surround viewing. It can be said that if you understand these points, 80% of the regular problems can be solved. But to understand these points, you need to learn specifically: what problems do character groups solve and how do they use them? What problem does multi-select branch solve and how is it used? You should spend some time specifically studying and thinking; if you understand all these, you will then study how expressions that solve complex problems are composed. If you can draw 1-2 hours a day to study, it will have obvious results within two weeks and you can almost reach a considerable level in one month. Moreover, in my experience, when learning a new programming language, not only do I have to input all the examples in the book and run them myself, but I also have to change the sample code by myself to see what happens, and then think about why this happens. If you do this when learning regular expressions, you will definitely be able to achieve twice the result with half the effort.

If you truly understand these commonly used functions and have a clear concept of their value and use, then another problem will be solved - the regular expressions in different languages are different, how to solve them? Although the regular expression regulations in different languages are different, the ideas behind them are unified, and the differences are just the form of expression, or the way of implementation of concepts. The advantage is that the programming language documentation will not explain in detail what character groups are and what multi-selected branches are, but will tell you in detail how character groups are represented in this language and how multi-selected branches are represented (if you don’t believe it, you can search for character class or alteration in these documents). So if your mind is clear enough, even if you are not sure how to write the final expression, you only need to check the document to solve it. For example, a character group \s matching a whitespace character should be written in a Java string. Because \s is not a legal escape sequence in a Java string, there must be \ to escape before; in PHP, you can directly write \s, because when PHP processes strings, it will save unrecognized escape sequences intact; in some tools under Unix, it must be written [[:space:]], which is the Perl-style representation of \s in the POSIX specification. It seems a bit troublesome, and that's all, because we know that what we need to use here is "character group matching whitespace characters".

Having written so much above, some people may say: Regular expressions are not a good place, so there is no need to spend so much energy. Perhaps it is this view that forms the root of the idea of "not studying regular expressions seriously". Fortunately, this question is actually easy to understand, because many things are the same. For example, when writing articles, we do not require everyone to be a writer, but everyone may write a few serious articles that are proficient when they need it. "It is not a writer" is not a reason for "not being able to write serious articles when they need it". In order to write serious articles when needed, you must take time to learn and connect to writing. This is actually the same in learning regular expressions.