Chaos, marks, regular learning questions and answers

Recently, I was fortunate to participate in the special Q&A of Yu Zheng Expression as a guest on two websites, Open Source China and 51CTO. During the Q&A process, I collected some common questions in the process of learning regular expressions, and I spent a little time here to answer them.

Regular expressions are difficult to learn, and there is no doubt about this. But I think the difficulty lies only in grammar. Regular expressions have been around for a long time, and it (the syntax) was born in the 1970s. What kind of scene is that? Let's give a simple example, under Unixusr、devThe name was passed down at that time. Now many people have criticized it. USB is not a user, and dev is not a device, which is difficult to learn and remember. After years of rapid development, many of the problems of that year have been packaged beautifully. Today's users may be more accustomed to clicking icons such as "user directory" and "drive" directly, and no longer have to worry about those irregular short names. Unfortunately, the syntax of regular expressions has not changed much, and even the subsequent added functions have followed the previous grammar style. Today, when programming languages are becoming more and more humanized, it naturally seems very difficult to understand. Developers today may be more accustomed to it(‘a’, ‘z’)This way of writing, not getting used to it[a-z];meet(?![a-z])This structure is even more confusing unless it is converted to((‘a’, ‘z’))How to write.

However, from another perspective, the two are actually the same thing, but the form of expression is different, one is similar to the key points and the other is similar to the plain language. If we can build a transformation from key points to plain language in our minds, the regular expression will be much simpler, and it can even be said to be a module splicing. For example, the Alipay's flow number is 18 or 26 digits, and it is matched with regular expressions, that is^([0-9]{18}|[0-9]{26})$,or^[0-9]{18}([0-9]{8})?$. The logic is very simple:^Used to lock the beginning,$Used to lock the end,[0-9]Match numeric characters,([0-9]{18}|[0-9]{26})Represents two parallel options, i.e. the numeric string length is 18 or 26 bits, and[0-9]{18}([0-9]{8})?It means that at least 18-bit numeric strings need to appear, and there may be an 8-bit numeric string after that (so the total length is 26-bit). The general application of regular expressions is that simple.

If you think what you said above is correct, then the only problem of learning regular expressions is left with the right choice. When we learn programming languages, we emphasize that we should not just read books, but write programs by hand. The best way is to input the examples in the book yourself and run them, so that we can truly learn it. But in the eyes of many people, regular expressions may not be considered programming languages, so learning is about clicking on the point, or even satisfied with copying some ready-made expressions from the Internet. Therefore, one of the common questions is "Is there any shortcut to learning?" Unfortunately, the answer is no - since you can't learn to program by copying other people's code, you can't copy ready-made expressions and read a few documents, of course you can't learn regularity. But there is also the news that it doesn't take too long to truly learn regular expressions.

In my experience, when learning regular expressions, what you really need to do is to deeply understand common functions: character groups, multi-select branches, matching patterns, and surround viewing. It can be said that if you understand these points, 80% of the regular problems can be solved. But to understand these points, you need to learn specifically: what problems do character groups solve and how do they use them? What problem does multi-select branch solve and how is it used? You should spend some time specifically studying and thinking; if you understand all these, you will then study how expressions that solve complex problems are composed. If you can draw 1-2 hours a day to study, it will have obvious results within two weeks and you can almost reach a considerable level in one month. Moreover, in my experience, when learning a new programming language, not only do I have to input all the examples in the book and run them myself, but I also have to change the sample code by myself to see what happens, and then think about why this happens. If you do this when learning regular expressions, you will definitely be able to achieve twice the result with half the effort.

If you truly understand these commonly used functions and have a clear concept of their value and use, then another problem will be solved - the regular expressions in different languages are different, how to solve them? Although the regular expression regulations in different languages are different, the ideas behind them are unified, and the differences are just the form of expression, or the way of implementation of concepts. The advantage is that the programming language documentation will not explain in detail what character groups are and what multi-selected branches are, but will tell you in detail how character groups are represented in this language and how multi-selected branches are represented (if you don’t believe it, you can search for character class or alteration in these documents). So if your mind is clear enough, even if you are not sure how to write the final expression, you only need to check the document to solve it. For example, a character group matching a blank character\s, to write in Java string\\s,because\sIt is not a legal escape sequence in Java strings, so there must be\Come escape\; Can be written directly in PHP\s, because PHP will save unrecognized escape sequences intact when processing strings; in some tools under Unix, it must be written[[:space:]]This is Perl style\sNotation in the POSIX specification. It seems a bit troublesome, and that's all, because we know that what we need to use here is "character group matching whitespace characters".

This statement can convince some people, but there are still some people who cannot convince them. At the same time, according to my observation, those who cannot be convinced do not seem to spend too much energy on other "main business", but instead are troubled by regular expressions from time to time. On the contrary, a programmer with a truly professional quality is likethe Productive ProgrammerAs mentioned in the article, you will be willing to spend 2 hours writing a regular expression, saving endless time in the future. Of course, the premise of all the above is to be able to correct the attitude of learning regular expressions, or to learn valuable skills. People who make software have read *s' famous article "No Silver Bullet", so here you might as well borrow his words to say that there is no silver bullet when learning regular expressions.

This article is original by Yurii. Please indicate the source when reprinting: Chaos, marks