I. Getting to Know Regular Expressions
Regular expression is a special sequence of characters, a string whether we set such a sequence of characters, match the fast retrieval of text, the realization of the operation of replacing text
json (xml) Lightweight web data exchange format
import re a='C|C++|Java|C#||Python|Javascript' r= ('Python',a) print(r) if len(r) > 0: print('String with Python in it') else: print('No') ['Python'] String containingPython
II. Metacharacters and ordinary characters
import re a='C0C++7Java8C#9Python6Javascript' r= ('\d',a) print(r) b='' for x in a: try: int(x) b +=x+',' except : pass print(b)
Results:
['0', '7', '8', '9', '6']
0,7,8,9,6,
'Python' normal character '\d' meta character
III. Character sets
import re # Find words whose middle character isn't a C or an F # s = 'abc, acc, adc, aec, afc, ahc' r = ('a[^cf]c', s) #[a-z] [cf] print(r)
Results:
['abc', 'adc', 'aec', 'ahc']
IV. Generalized character sets
#\d Numbers \D Letters #\w Numbers and letters = [a-zA-Z0-9_] \W #\s Blank characters \S a='python 11\t11java&678p\nh\rp' r = ('\s', a) print(r)
Results:
[' ', '\t', '\n', '\r']
V. Quantifiers
a='python 1111java&678php' r = ('[a-z]{3,6}', a) print(r)
Results:
['python', 'java', 'php']
VI. Greed and non-greed
a='python 1111java&678php' r = ('[a-z]{3,6}?', a) #Greedy vs. non-greedy ? print(r)
Results:
['pyt', 'hon', 'jav', 'php']
VII. Match 0 times 1 time or an unlimited number of times
# * Match 0 or infinitely many times # + Match 1 or infinitely many times # ? Match 0 or 1 times a='pytho0python1pythonn2pythonw' r = ('python*', a) print(r)
Results:
['pytho', 'python', 'pythonn', 'python']
VIII. Boundary Matchers
qq = '12345678' # 4~8 r = ('^\d{4,8}$', qq) print(r) a = '123456789' # 4~8 ^Rules$ ^beginning $end e = ('^\d{4,8}$', a) print(e)
Results:
['12345678']
[]
IX. Groups
# () Group a = 'pythonpythonpythonpythonpython' # r = ('(python){3}', a) print(r)
Results:
['python'] means that there exists a set of (pythonpythonpythonpython) such data
X. Matching model parameters
# I | S ignore case | match all characters lanuage = 'PythonC#\nJavaPHP' r = ('c#.{1}', lanuage, | ) print(r)
Results:
['C#\n']
XI. Regular substitution
Search Replacement
def convert(value): matched = () # print(value) <_sre.SRE_Match object; span=(6, 8), match='C#'> return '!!'+matched+'!!' lanuage = 'PythonC#JavaC#PHPC#' # r = ('C#', 'GO', lanuage, 1) Returns: PythonGOJavaC#PHPC# # s=('C#', 'GO') r = ('C#', convert, lanuage) #Pass in parameters print(r)
Results:
Python!!C#!!Java!!C#!!PHP!!C#!!
XII. Passing functions as parameters
def convert(value): matched = () # Get the value of the object # print(value) <_sre.SRE_Match object; span=(6, 8), match='C#'> if int(matched) >=6 : return '9' else: return '0' lanuage = 'A8C3721D86' r = ('\d', convert, lanuage) print(r) # A9C0900D99
XIII, search and match function
s = 'A8C3721D86' # None Match from the beginning If no match is found, return None Match only once. r = ('\d', s) print(r) #None # Search the string and return once the first match is found Match only once r1 = ('\d', s) print(r1) #<_sre.SRE_Match object; span=(1, 2), match='8'> print(()) #8 print(()) # (1, 2) r2 = ('\d', s) print(r2) #['8', '3', '7', '2', '1', '8', '6']
xiv. grouping
# Extract the value between life and python # s = 'life is short,i use python' #None r = ('life.*python', s) print(()) #life is short,i use python group(batch number) r = ('life(.*)python', s) print((0)) #life is short,i use python group(batch number) print((1)) # is short,i use #group(0) A special case Match the result of the full regular expression r = ('life(.*)python', s) print(r) #[' is short,i use '] s = 'life is short,i use python, i love python' r = ('life(.*)python(.*)python', s) print((0)) # life is short,i use python, i love python print((1)) # is short,i use print((2)) # , i love print((0,1,2)) #('life is short,i use python, i love python', ' is short,i use ', ', i love ') print(()) # (' is short,i use ', ', i love ')
XV. Some suggestions for learning regularization
#\d Numbers \D Letters #\w Numbers and letters = [a-zA-Z0-9_] \W #\s Blank characters \S # . Matches all characters except the newline character \n # * Match 0 or infinitely many times # + Match 1 or infinitely many times # ? Match 0 or 1 times # () Group # I | S ignore capitals | Match all characters
python : crawler, data processing
Understanding JSON
JSON is a lightweightdata exchange format
Strings are JSON representations
A string that conforms to the JSON format is called a JSON string.
{"name":"qiyue"}
JSON VS XML
Advantage:
Cross-language exchange of data
easy-to-read
easy to analyze
High network transmission efficiency
XVII. Deserialization
import json # JSON object array json_str = '{"name":"qiyue","age":18}' s = (json_str) # dict # Deserialization s = (json_str) #load() converts json datatypes to our own language datatypes. print(type(s)) #<class 'dict'> print(s) #{'name': 'qiyue', 'age': 18} print(s['name']) # qiyue json_str = '[{"name":"qiyue","age":18},{"name":"qiyue","age":18}]' s = (json_str) print(type(s)) # <class 'list'> print(s) # [{'name': 'qiyue', 'age': 18}, {'name': 'qiyue', 'age': 18}] JSON Python object dict array list string str number int number float true True false False null None
XVIII. Serialization
# Serialize to json student = [ {"name":"qiyue","age":18, 'flag':False}, {"name":"python","age":18} ] json_str = (student) print(type(json_str)) # <class 'str'> print(json_str) #[{"name": "qiyue", "age": 18, "flag": false}, {"name": "python", "age": 18}]
XIX, a small talk JSON, JSON objects and JSON strings
JSON is a lightweightdata exchange format
JSON objects Limited to languages
JSON String
JSON has its own data types
Although it's somewhat similar to JavaScript's data types, they're not the same language.
ECMASCRIPT a standard JavaScript ActionScription JSON a solution to implement the standard
REST service
summarize
The above is a small introduction to the regular expressions in Python and JSON data exchange format ,I hope to help you, if you have any questions please leave me a message, I will reply to you in a timely manner. Here also thank you very much for your support of my website!
If you find this article helpful, please feel free to reprint it, and please note the source, thank you!