Should comply with regulations. . . . Bala, Bala, Bala. Therefore, desensitizing sensitive information of customers in the log is put on the agenda.
1. There are about 4 solutions through the technology used in historical projects and online high-frequency methods.
- Single desensitization tool class, desensitization processing of log output separately
- Desensitization tool class + log frame surface method converter unified desensitization
- Desensitization framework-----Annotation mode
- Desensitization framework-----Tool class + configuration mode
2. Description of the difficulty of log desensitization:
- Parameter naming and database field naming are not standardized, or are not standardized
- Log output non-single field output
- Unable to identify what desensitization ranges are included in the log output
- The log output code is very large, and it is difficult to evaluate one by one.
3. Pros and cons of each plan
1. Solution 1: Single desensitization tool class, desensitization of log output separately
Advantages: The most efficient
Disadvantages: Most invasive and most workload
Summary: It can solve the problems of almost all projects, but it is too invasive and too high in coupling and is almost no longer used.
2. Solution 2: Unified desensitization on the facet method of desensitization tool class + log frame
Advantages: Minimum intrusiveness and minimal workload
Disadvantages: The lowest efficiency, desensitization judgment is required for all possible situations, and the risk of manslaughter is the highest
Summary: It can solve the problems of almost all projects. The logs are too long and will generally be intercepted and have the highest usage rate. However, if the log volume is huge, it may be used to assist.
3. Solution 3: Desensitization framework----annotation mode (such as: sensitive)
Advantages: Simple method, easy to use, high efficiency, can solve the desensitization of most scenarios
Disadvantages: Highly invasive, and desensitization is required for all possible situations
Summary: New projects, or project reconstruction and high data standardization will be considered. They are not friendly to custom log output, request log, etc. (It is desensitized only after adding annotations)
4. Solution 4: Desensitization framework-----Tool class + configuration mode (such as: desensitize)
Advantages: Low invasiveness and minimal workload
Disadvantages: easy to miss, all log outputs need to be output in the specific format of the framework
Summary: It can solve the problems of almost all projects. For project standardization requirements, new projects or reconstruction, the recommended index is higher than that of the three solutions. However, for most fields (such as the ID number field, different ID types, and different ID numbers specifications), the multiple fields (such as: number: number, which may represent the ID number, and may also represent the quantity) are not very friendly.
4. Code demonstration of solution 2
(Scheme 1 will never be used. Scheme 3 and Scheme 4 have many detailed explanations, so I won't go into details.
So here we focus on plan 2. )
1. Don't blow or black
When most projects on the market face compliance, the project is basically formed, especially larger projects, which are generally divided into different large project teams. Different teams target business overlapping parts, and field naming are often different. Log output has different habits for different developers. Therefore, if you want to use Solution 3 and Solution 4, it is difficult to complete comprehensive, low coupling and low workload. Then the disadvantages of Solution 2 may be the most acceptable.
2. Sample background
- The data is standardized, but the degree is relatively low
- The project is relatively large and relatively complete.
- The entire project structure is divided into front-end portal + back-end portal + multiple back-end services + multiple third-party services
- The log output includes {object name: object} method, {object} method, string splicing object method, third-party request body, xml, sql, etc.
- Logbacks are all logbacks
3. Sample Requirements
- All types of data required for compliance must be desensitized regardless of field names.
- Minimum code intrusion
- Does not affect the normal function of the service
- Desensitization to pure numeric types requires decoding (this project requires special requirements, compliance does not want to allow such a person)
4. Sample solution design
Configuration
<conversionRule conversionWord="msg" converterClass=""> </conversionRule>
Converter (SensitiveDataConverter) code
public class SensitiveConverter extends MessageConverter { @Override public String convert(ILoggingEvent event){ // Get the original log String requestLogMsg = (); // Get the log after desensitization isLogMaskEnabled global public configuration, whether to desensitize the switch return () ? (requestLogMsg) : requestLogMsg; } }
Actual processing tool class (LogSensitiveUtils) code-can be extracted into a public package for dependency management
public class LogSensitiveUtils { /** * [Email]@Hide in front of @<Example: 138******1234> * The reversible mask of the field plus the number and the pure number adopts the reversible mask * For other desensitization parts, take the middle part according to the length of the field to desensitize them. * * @param content * @return */ public static String filterSensitive(String content) { try { if (!(content)) { for (<String, List<Pattern>> entry : LogSensitiveConstants.SENSITIVE_SEQUENCE.entrySet()) { content = filter(content, (), ()); } } return content; } catch (Exception e) { return content; } } /** * Reversible mask of numbers * * @param content Desensitized string * @param type Desensitization method * @param patterns The regulars that need to be matched in this method * @return * @author hh * @date October 18, 2021 */ public static String filter(String content, String type, List<Pattern> patterns) { for (Pattern pattern : patterns) { Matcher matcher = (content); StringBuffer sb = new StringBuffer(); while (()) { (sb, (basesensitive((), type))); } (sb); content = (); } return content; } /** * Basic pure lens desensitization treatment Specify the start and end display length The remaining characters in "KEY" are replaced * Non-pure digital desensitization treatment * [Email] @I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm very big, I'm� * * @param str string to be desensitized * @return * @author hh * @date October 18, 2021 */ private static String basesensitive(String str, String type) { int startLength, endLength = 0; if ((str)) { return ; } // Default desensitization starts from the 4th character mask startLength = 3; endLength = getEndLength(()); if ((type)) { endLength = () - ('@'); } if (LogSensitiveConstants.FIELD_NUM.equals(type)) { Matcher matcher = LogSensitiveConstants.NUMBER_PATTERN.matcher(str); int start = -1; int end = -1; String ss = ""; if (()) { ss = (); } start = (ss); while (()) { ss = (); } end = (ss); int length = end - start; endLength = getEndLength(length); startLength = start + startLength; } String replacement = (startLength, () - endLength); StringBuilder sb = new StringBuilder(); if ((type) || LogSensitiveConstants.FIELD_NUM.equals(type)) { for (int i = 0; i < (); i++) { char ch; if ((i) >= '0' && (i) <= '?') { ch = ((int) ((i) - '0')); } else { ch = (i); } (ch); } } else { for (int i = 0; i < (); i++) { ("*"); } } return (str, startLength).concat(((str, endLength), () - startLength, ())); }
Regular expression constant class (LogSensitiveConstants)-- can be extracted into a public package for dependency management.
public class LogSensitiveConstants { /** * Digital desensitization mask characters */ public static final String KEY = "oiZeAsGTbQ"; public static final Pattern NUMBER_PATTERN = ("\\d"); /** * Desensitization mask type identification */ public static final String EMAIL = "email"; public static final String FIELD_NUM = "field_num"; public static final String FIELD = "field"; public static final String NOT_NUM = "not_num"; public static final String NUM = "num"; /** * Filter order: Email --> Field plus number reversible mask --> Other non-numeric masks --> Field plus non-numeric masks --> Digital reversible masks * Sequence reason: * 1. The email address may be desensitized first by other rules before the email address, but the email address has special requirements for desensitization, so desensitization is preferred. * 2. The non-pure numbers with pure numbers and non-field prefix verification are placed in the end because the coverage range of pure numbers overlaps with other ones. The best way to reduce the killing is to put in the end with a large range. * 3. The two situations of field prefix verification are not divided into order. Similarly, the two types of non-field prefixes are not divided into before and after */ public static final Map<String,List<Pattern>> SENSITIVE_SEQUENCE = new TreeMap<String, List<Pattern>>(); /** * Number: mobile phone number, ID number */ public static final List<Pattern> SENSITIVE_NUM_KEY = new ArrayList<Pattern>(4); /** * Filtering order: ID number --> Mobile phone number --> Landline number --> QQ --> Business license --> Tax registration number + * pass + household registration book + ID number (pure number) --> Home return certificate * and Macao pass to mainland China */ public static final List<Pattern> SENSITIVE_NOT_NUM_KEY = new ArrayList<Pattern>(7); /** * Field filtering (non-pure numbers): */ public static final List<Pattern> SENSITIVE_FIELD_KEY = new ArrayList<Pattern>(6); /** * Field filtering (pure number): */ public static final List<Pattern> SENSITIVE_FIELD_NUM = new ArrayList<Pattern>(3); /** * Email Filter: */ public static final List<Pattern> SENSITIVE_EMAIL_KEY = new ArrayList<Pattern>(1); /** * Regular matching of mobile phone numbers */ public static final String TEL_REGEX = "^1[23456789]\\d{9}$"; /** * Regular matching of phone numbers */ public static final String PHONE_REGEX = "^0\\d{2,3}-\\d{7,8}$"; /** * Regular matching of ID number */ public static final String IDENTIFY_REGEX = "(^[1-9]\\d{7}((0\\d)|(1[0-2]))(([0|1|2]\\d)|3[0-1])\\d{3}$)|(^[1-9]\\d{5}[1-9]\\d{3}((0\\d)|(1[0-2]))(([0|1|2]\\d)|3[0-1])\\d{3}([0-9]|X)$)"; /** * Email regular matching */ public static final String EMAIL_REGEX = "([^a-zA-Z0-9._%-]|^)([a-zA-Z0-9_\\.-]+)@([\\da-zA-Z\\.-]+)\\.([a-zA-Z\\.]{2,6})" + "([^a-zA-Z\\d]|$)|([^a-zA-Z\\d]|^)[a-zA-Z\\d]+(\\.[a-z\\d]+)*@([\\da-zA-Z](-[\\da-zA-Z])?)+(\\.{1,2}[a-zA-Z]+)+$/([^a-zA-Z]|$)"; /** * Passport number * Passport number varies according to the type of passport, some of which are included in mobile phone number and LICENSE_NO_REGEX */ private static final String PASSPORT_REGEX = "(\\D|^)[P|pS|s]\\d{7}(\\b)"; /** * Unified social credit code: ^([0-9A-HJ-NPQRTUWXY]{2}\d{6}[0-9A-HJ-NPQRTUWXV]{10}1[1-9]\d{14})$ */ private static final String USCC_REGEX = "([^0-9A-HJ-NPQRTUWXY]|^)([0-9A-HJ-NPQRTUWXV]{2}\\d{6}[0-9A-HJ-NPQRTUWXY]{10}|[1-9]\\d{14})(\\b)"; /** * Organization Code Certificate: [a-zA-Z0-9]{8}-[a-ZA-Z0-9] */ private static final String OCC_REGEX = "([^a-zA-Z0-9]|^)([a-zA-Z0-9]{8})-[a-zA-Z0-9](\\b)"; /** * Police Officer's Certificate: ([a-zA-Z0-9_\.\-])+\@(([a-zA-Z0-9\-])+\.)+([a-zA-Z0-9]{2,4}) */ private static final String POLICE_REGEX = "([^a-zA-Z0-9_\\.\\-]|^)(([a-zA-Z0-9_\\.\\-])+\\@(([a-zA-Z0-9\\-])+\\.)+([a-zA-Z0-9]{2,4}))(\\b)"; /** * Military/armed police ID ^[\u4E00-\u9FA5](son)([0-9a-zA-Z]{4,8})(son?)$/ */ private static final String SOLDIER_REGEX = "([^\\u4E00-\\u9FA5]|^)([\\u4E00-\\u9FA5](Words)([0-9a-zA-Z]{4,8})(Number?))"; /** * mac */ private static final String MAC_REGEX = "[A-F0-9]{2}([-:][A-F0-9]{2})([-:.][A-F0-9]{2})([-:][A-F0-9]{2})([-:.][A-F0-9]{2})([-:][A-F-9]{2})(\\b)"; /** * License plate: ([Beijing, Tianjin, Shanghai, Chongqing, Hebei, Henan, Liaoning, Anhui, Shandong, New Jiangsu, Zhejiang, Jiangxi, Hubei, Guizhou, Shanxi, Shaanxi, Fujian, Guizhou, Guangdong, Qinghai, Tibet, Sichuan, Ningqiong Envoy A-Z]{1}[A-Z]{1}(([0-9]{5}[DF])|([[A-HJ-NP-Z0-9])[0-9]{4})))|([Beijing, Tianjin, Shanghai, Chongqing, Hebei, Henan, Liaoning, Anhui, Shandong, New Jiangsu, Zhejiang, Jiangxi, Hubei, Guizhou, Guizhou, Guangdong, *, Shaanxi, Guizhou, Guangdong, Qinghai, Tibet, Sichuan, Ningqiong Envoy A-Z]{1}[A-Z]{1}[A-HJ-NP-Z0-9]{4}[A-HJ-NP-Z0-9 Study Police * and Macau]{1}) * Ordinary cars: [Beijing, Tianjin, Shanghai, Chongqing, Hebei, Henan, Yunnan, Liaoning, Hunan, Anhui, Shandong, New Jiangsu, Zhejiang, Jiangxi, Hubei, Guizhou, Gansu, Shanxi, *, Shaanxi, Guizhou, Guangdong, Qinghai, Tibet, Sichuan, Ningqiong, A-Z]{1}[A-Z]{1}[A-HJ-NP-Z0-9]{4}[A-HJ-NP-Z0-9 School Police, * and Macau]{1} * New energy vehicles: [Beijing, Tianjin, Shanghai, Chongqing, Hebei, Henan, Yunnan, Liaoning, Hunan, Anhui, Shandong, New Jiangsu, Zhejiang, Jiangxi, Hubei, Guizhou, Gansu, Shanxi, *, Shaanxi, Guizhou, Guangdong, Qinghai, Tibet, Sichuan, Ningqiong, A-Z]{1}[A-Z]{1}(([0-9]{5}[DF])|([DF][A-HJ-NP-Z0-9][0-9]{4})) */ private static final String LICENSEE_CAR_REGEX = "[Beijing, Tianjin, Shanghai, Chongqing, Hebei, Henan, Yunnan, Hunan, Anhui, Shandong, New Jiangsu, Zhejiang, Jiangxi, Hubei, Guizhou, Gansu, Shanxi, *, Shaanxi, Ji, Fujian, Guangdong, Qinghai, Tibet, Sichuan, Ningqiong envoyA-Z]{1}[A-Z]{1}(([0-9]{5}[DF])|([DF][A-HJ-NP-Z0-9][0-9]{4})|([A-HJ-NP-Z0-9]{4}[A-HJ-NP-Z0-9Hangout police in * and Macao]{1}))"; /** * IP * IPV4: ((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9][01]?[0-9][0-9]?) * IPV6: ([0-9a-fA-F]{1,4}::?){1,7}([0-9a-fA-F]{1,4}) */ private static final String IP_REGEX = "(((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(25[0-5]|2[0-4][0-9][01]?[0-9][0-9]?))|(([0-9a-fA-F]{1,4}::?){1,7}([0-9a-fA-F]{1,4}))"; /** * Name: User/Customer Name/Creator/Reporter [\u4E00-\u9FA5]{2,12} * cust_name/repr_client_name/USER_NAME/ACCT_NAME/CREATER_NAME */ private static final String USER_REGEX = "(\"?)((cust(_?)name)|(repr(_?)client(_?)name)|(repr(_?)client(_?)name)|(USER(_?)NAME)|(ACCT(_?)NAME)|(CREATER(_?)NAME))(\"?)(:|=)(\"?)[\\u4E00-\\u9FA5]{2,12}(\"?)"; /** * WeChat regular match */ private static final String WECHART_REGEX = "(\"?)wechat(\"?)(:|=)(\"?)[a-zA-Z]([-_a-zA-Z0-9]{5,19})(\"?)"; /** * Birthday regular match, block valid data, that is, data from the time period 1900-01-01~2099-12-31 */ private static final String BIRTH_DATE_REGEX = "(\"?)birth(_?)date(\"?)(:|=)(\"?)(19|20)\\d{2}([.|_|Year]?)(1[0-2]|0?[1-9])([.|_|Year]?(0?[1-9]|[1-2]|[0-9]|3[0-1])(\"?))"; /** * Graduation college correction matching */ private static final String GRADUATE_INSTITUTIONS_REGEX = "(\"?)gradate(_?)institutions(\"?)(:|=)(\"?)[\\u4E00-\\u9FA5]{4,18}(\"?)"; /** * Hometown */ private static final String NATIVE_PLACE_REGEX = "(\"?)native(_?)place(\"?)(:|=)(\"?)[\\u4E00-\\u9FA5]{2,18}(\"?)"; /** * address: * ADDR\ADDRESS\ADDRDETAILS\addressLines */ private static final String ADDR_REGEX = "[\\u4E00-\\u9FA5][#()()A-Z0-9\\u4E00-\\u9FA5]{1,20}(Province|City|District|County|County|Village|Tun|Road|Street|Group|Community|Office|Unit)" + "|[#()()A-Z0-9\\u4E00-\u9FA5]{1,20}(Province|City|District|County|Town|Village|Tun|Road|Street|Group|Community|Office|Unit)[#()()()0-9a-z\\u4E00-\u9FA5]{0,20}"; /** * Bank account number: ([1-9]{1})(\\d{14,18}) * acctno\accountno */ private static final String ACCTNO_REGEX = "((\"?)((ACCT(_?)NO)|(ACCOUNT(_?)NO))(\"?)(:|=)(\"?)([1-9](\\d{14,18}))(\"?))"; /** * QQ regular match: [1-9][8-9]{4,12} * qq */ private static final String QQ_REGEX = "(\"?)qq(\"?)(:|=)(\"?)[1-9][8-9]{4,12}(\"?)"; /** * Business license number and tax registration certificate ([A-Z0-9]{15}|[A-Z0-9]{18}|[A-Z0-9]{20}) * licence_no\tax_no */ private static final String LICENSE_NO_REGEX = "(\"?)((licence(_?)no)|(tax(_?)no))(\"?)(:|=)(\"?)([A-Z0-9]{15}|[A-Z0-9]{18}|[A-Z0-9]{20})(\"?)"; static { SENSITIVE_NUM_KEY.add((TEL_REGEX)); SENSITIVE_NUM_KEY.add((PHONE_REGEX)); SENSITIVE_NUM_KEY.add((IDENTIFY_REGEX)); SENSITIVE_NUM_KEY.add((PASSPORT_REGEX)); } static { SENSITIVE_NOT_NUM_KEY.add((USCC_REGEX)); SENSITIVE_NOT_NUM_KEY.add((OCC_REGEX)); SENSITIVE_NOT_NUM_KEY.add((POLICE_REGEX)); SENSITIVE_NOT_NUM_KEY.add((SOLDIER_REGEX)); SENSITIVE_NOT_NUM_KEY.add((MAC_REGEX)); SENSITIVE_NOT_NUM_KEY.add((LICENSEE_CAR_REGEX)); SENSITIVE_NOT_NUM_KEY.add((IP_REGEX)); } static { SENSITIVE_FIELD_KEY.add((USER_REGEX, Pattern.CASE_INSENSITIVE)); SENSITIVE_FIELD_KEY.add((WECHART_REGEX, Pattern.CASE_INSENSITIVE)); SENSITIVE_FIELD_KEY.add((BIRTH_DATE_REGEX, Pattern.CASE_INSENSITIVE)); SENSITIVE_FIELD_KEY.add((GRADUATE_INSTITUTIONS_REGEX, Pattern.CASE_INSENSITIVE)); SENSITIVE_FIELD_KEY.add((NATIVE_PLACE_REGEX, Pattern.CASE_INSENSITIVE)); SENSITIVE_FIELD_KEY.add((ADDR_REGEX, Pattern.CASE_INSENSITIVE)); } static { SENSITIVE_FIELD_NUM.add((QQ_REGEX, Pattern.CASE_INSENSITIVE)); SENSITIVE_FIELD_NUM.add((LICENSE_NO_REGEX, Pattern.CASE_INSENSITIVE)); SENSITIVE_FIELD_NUM.add((ACCTNO_REGEX, Pattern.CASE_INSENSITIVE)); } static { SENSITIVE_EMAIL_KEY.add((EMAIL_REGEX)); } static { SENSITIVE_SEQUENCE.put(EMAIL, SENSITIVE_EMAIL_KEY); SENSITIVE_SEQUENCE.put(FIELD_NUM, SENSITIVE_FIELD_NUM); SENSITIVE_SEQUENCE.put(FIELD, SENSITIVE_FIELD_KEY); SENSITIVE_SEQUENCE.put(NOT_NUM, SENSITIVE_NOT_NUM_KEY); SENSITIVE_SEQUENCE.put(NUM, SENSITIVE_NUM_KEY); } private LogSensitiveConstants() { } }
Finish!
If the log output is relatively standardized, most scenarios where the fields that need to be desensitized are used, the most recommended solution 4 is used when key:value or key = value.