Directly extracting the text content in PPT can be convenient for us to further process or analyze, and can also be directly used for compiling other documents. By using Python program, we can quickly batch extract the text content in PPT, so as to realize efficient information collection or analyze the data in it. In this article, we will introduce how toExtract text from PowerPoint presentations using Python programcontent, including the body text in the slide, the slide note text, and the slide.
The methods used in this article require the use of for Python, which can be downloaded from the official website or installed via PyPI:pip install
。
Apply for Free License
Extract PPT Slide Text with Python
In PPT slides, text content is placed in various shapes, such as text boxes and graphics. We can first get the shape of the slide, and then extract the text in it, so as to realize the extraction of the text content of the slide. The following are the operation steps:
- establishPresentation object and use the() method to load the PPT.
- Iterate over the slides in the PPT and then over the shapes in the slides.
- Determine if the shape isIAutoShape instance. If so, the instance is passed through the Get the paragraphs in it, and then pass the property gets the text in the paragraph.
- Write text to a text file.
Code Example:
Python
surname Cong import (data) * surname Cong import (data) * # Create an object of the Presentation class pres = Presentation() # Load a PowerPoint presentation ("Example.pptx") text = [] # Loop through each slide with regards to slide exist center: # Loop over each shape with regards to shape exist center: # Check if the shape is an instance of IAutoShape in the event that isinstance(shape, IAutoShape): # Extract text from shapes with regards to paragraph exist center: () # Write text to a text file f = open("output/slide text.txt", "w", encoding='utf-8') with regards to s exist text center: (s + "\n") () ()
Extraction results:
Extracting PPT Note Text with Python
Notes are additional information added based on slides that can guide or prompt the speaker and are not shown to the audience. Notes for slides are stored in theNotesSlide object, which can be accessed via the attribute. Once you've gotten the changed object again, you can use the attribute to extract the text in it now. Here are the steps to do so:
- establishPresentation object and use the() method to load the PPT.
- Iterate through the slides in the PPT, by means of the Property AcquisitionNotesSlide object, which is then passed through the attribute to extract the note text.
- Writes text to a text file.
Code Example:
Python
surname Cong import (data) * surname Cong import (data) * # Create an object of the Presentation class pres = Presentation() # Load a PowerPoint presentation ("Example.pptx") notes_list = [] # Loop through each slide with regards to slide exist center: # Get notes slides notes_slide = # Access to the contents of the note notes = notes_slide. notes_list.append(notes) # Write notes to text files f = open("output/remarks text.txt", "w", encoding="utf-8") with regards to note exist notes_list center: (note) ("\n") () ()
Extraction results:
Extract PPT annotated text with Python
We can also pass the property to get the annotations in the PPT slide with the property to get the text in the annotation. The following is the procedure:
- establishPresentation object and use the() method to load the PPT.
- Iterate through the slides by property gets the collection of annotations in each slide.
- Iterate through the annotations by property extracts the text in the annotation.
- Write text to a text file.
Code Example:
Python
from import * from import * # Create an object of the Presentation class pres = Presentation() # Load a PowerPoint presentation ("Example.pptx") comments_list = [] # Iterate over all slides for slide in : # Get all the comments in the slideshow comments = # Traversing comments for comment in comments: # Get comment text comment_text = comments_list.append(comment_text) # Write comments to a text file f = open("output/annotated text.txt", "w", encoding="utf-8") for comment in comments_list: (comment + "\n") () ()
Extraction results:
summarize
This article describes how to use Python to extract text content from slides, including extracting slide text, note text, and annotation text.
The API used in the text also supports many other PPT processing operations, go to the for Python TutorialLearn more about PPT operations.
Above is the use of Python to quickly extract the text content of the PPT code example of the details, more information about Python to extract the text content of the PPT please pay attention to my other related articles!