goal
A computer interest group was recently formed in the lab
Initiative to document and share more of your problem-solving experience
It's like blogging at CSDN.
Although just starting out
However, considering that more and more information will be recorded later on this type of experience
That's why it's important to get the template design right at the beginning (as shown below)
Facilitate the creation of electronic databases at a later stage
This allows others to quickly search for relevant records
They say, "Life is short. I use python."
So it was decided to extract the header information from the docx document using python
Then update the information into an xls spreadsheet like the following (just po the results)
And clicking on the file path can directly open the corresponding file (including hyperlinks)
code implementation
1. Capture header information from docx files
# -*- coding:utf-8 -*- # This program scans docx files in Log and returns basic information import docx from docx import Document test_d = '.. /log/sublime building an IDE for python.docx' def docxInfo(addr): document = Document(addr) info = {'title':[], 'keywords':[], 'author':[], 'date':[], 'question':[]} lines = [0 for i in range(len())] k = 0 for paragraph in : lines[k] = k = k+1 index = [0 for i in range(5)] k = 0 for line in lines: if ('Title'): index[0] = k if ('Keywords'): index[1] = k if ('The Author'): index[2] = k if ('Date'): index[3] = k if ('Description of the problem'): index[4] = k k = k+1 info['title'] = lines[index[0]+1] keywords = [] for line in lines[index[1]+1:index[2]]: (line) info['keywords'] = keywords info['author'] = lines[index[2]+1] info['date'] = lines[index[3]+1] info['question'] = lines[index[4]+1] return info if __name__ == '__main__': print(docxInfo(test_d))
2. Traversing the log folder for information updates
# -*- coding:utf-8 -*- # This program can batch scan the files in the log, if you come across docx documents. # Then call readfile() to extract the document information and save it to the digger. # Log list.xls for quick retrieval at a later stage. import os,datetime import time import xlrd from xlrd import xldate_as_tuple import xlwt from readfile import docxInfo from import copy # Open the log list to read the update date of the most recent record. memo_d = '.. /log/digger log list.xls' memo = xlrd.open_workbook(memo_d) #Read excel sheet0 = memo.sheet_by_index(0) #Read the 1st table memo_date = sheet0.col_values(5) #Read column 5 memo_n = len(memo_date) # Remove the title if memo_n>0: xlsx_date = memo_date[memo_n-1] # Read the date of the last record. latest_date = sheet0.cell_value(memo_n-1,5) # Return timestamp # Create a new xlsx memo_new = copy(memo) sheet1 = memo_new.get_sheet(0) # Rebuild hyperlinks hyperlinks = sheet0.col_values(6) # xlrd also reads text, causing hyperlinks to be lost. k = 1 n_hyperlink = len(hyperlinks) for k in range(n_hyperlink): link = 'HYPERLINK("%s";"%s")' %(hyperlinks[k],hyperlinks[k]) (k,6,(link)) k = k+1 # Determine the file suffix def endWith(s,*endstring): array = map(,endstring) if True in array: return True else: return False # Traverse the log folder and query log_d = '../log' logFiles = (log_d) for file in logFiles: if endWith(file,'.docx'): timestamp = (log_d+'/'+file) if timestamp>latest_date: info = docxInfo(log_d+'/'+file) (memo_n,0,info['title']) keywords_text = ','.join(info['keywords']) (memo_n,1,keywords_text) (memo_n,2,info['author']) (memo_n,3,info['date']) (memo_n,4,info['question']) # Get the current time time_now = () # Floating point values, accurate to milliseconds (memo_n,5, time_now) link = 'HYPERLINK("%s";"%s")' %(file,file) (memo_n,6,(link)) memo_n = memo_n+1 (memo_d) memo_new.save(memo_d) print('memo was updated!')
In fact, there are some better modules for operating spreadsheets, such as panda, xlsxwriter, openpyxl and so on. However, the above code has been basically able to achieve the function, and research dog after all, not so much time to write code to do debugging, so later have time to update it!
a thank-you note
Borrowed heavily from the various experiences of the greats in the CSDN forums in the process!!!!