SoFunction
Updated on 2024-12-19

Code example of using Python to quickly extract text content from PPTs

Directly extracting the text content in PPT can be convenient for us to further process or analyze, and can also be directly used for compiling other documents. By using Python program, we can quickly batch extract the text content in PPT, so as to realize efficient information collection or analyze the data in it. In this article, we will introduce how toExtract text from PowerPoint presentations using Python programcontent, including the body text in the slide, the slide note text, and the slide.

The methods used in this article require the use of for Python, which can be downloaded from the official website or installed via PyPI:pip install

Apply for Free License

Extract PPT Slide Text with Python

In PPT slides, text content is placed in various shapes, such as text boxes and graphics. We can first get the shape of the slide, and then extract the text in it, so as to realize the extraction of the text content of the slide. The following are the operation steps:

  • establishPresentation object and use the() method to load the PPT.
  • Iterate over the slides in the PPT and then over the shapes in the slides.
  • Determine if the shape isIAutoShape instance. If so, the instance is passed through the Get the paragraphs in it, and then pass the property gets the text in the paragraph.
  • Write text to a text file.

Code Example:

Python

surname Cong  import (data) *
surname Cong  import (data) *

# Create an object of the Presentation class
pres = Presentation()

# Load a PowerPoint presentation
("Example.pptx")

text = []
# Loop through each slide
with regards to slide exist  center:
    # Loop over each shape
    with regards to shape exist  center:
        # Check if the shape is an instance of IAutoShape
        in the event that isinstance(shape, IAutoShape):
            # Extract text from shapes
            with regards to paragraph exist  center:
                ()

# Write text to a text file
f = open("output/slide text.txt", "w", encoding='utf-8')
with regards to s exist text center:
    (s + "\n")
()
()

Extraction results:

Python提取PPT幻灯片文本

Extracting PPT Note Text with Python

Notes are additional information added based on slides that can guide or prompt the speaker and are not shown to the audience. Notes for slides are stored in theNotesSlide object, which can be accessed via the attribute. Once you've gotten the changed object again, you can use the attribute to extract the text in it now. Here are the steps to do so:

  • establishPresentation object and use the() method to load the PPT.
  • Iterate through the slides in the PPT, by means of the Property AcquisitionNotesSlide object, which is then passed through the attribute to extract the note text.
  • Writes text to a text file.

Code Example:

Python

surname Cong  import (data) *
surname Cong  import (data) *

# Create an object of the Presentation class
pres = Presentation()

# Load a PowerPoint presentation
("Example.pptx")

notes_list = []
# Loop through each slide
with regards to slide exist  center:
    # Get notes slides
    notes_slide = 
    # Access to the contents of the note
    notes = notes_slide.
    notes_list.append(notes)

# Write notes to text files
f = open("output/remarks text.txt", "w", encoding="utf-8")
with regards to note exist notes_list center:
    (note)
    ("\n")
()
()

Extraction results:

Python提取PPT备注文本

Extract PPT annotated text with Python

We can also pass the property to get the annotations in the PPT slide with the property to get the text in the annotation. The following is the procedure:

  • establishPresentation object and use the() method to load the PPT.
  • Iterate through the slides by property gets the collection of annotations in each slide.
  • Iterate through the annotations by property extracts the text in the annotation.
  • Write text to a text file.

Code Example:

Python

from  import *
from  import *

# Create an object of the Presentation class
pres = Presentation()

# Load a PowerPoint presentation
("Example.pptx")

comments_list = []
# Iterate over all slides
for slide in :
    # Get all the comments in the slideshow
    comments = 
    # Traversing comments
    for comment in comments:
        # Get comment text
        comment_text = 
        comments_list.append(comment_text)

# Write comments to a text file
f = open("output/annotated text.txt", "w", encoding="utf-8")
for comment in comments_list:
    (comment + "\n")
()
()

Extraction results:

Python提取PPT批注文本

summarize

This article describes how to use Python to extract text content from slides, including extracting slide text, note text, and annotation text.
The API used in the text also supports many other PPT processing operations, go to the for Python TutorialLearn more about PPT operations.

Above is the use of Python to quickly extract the text content of the PPT code example of the details, more information about Python to extract the text content of the PPT please pay attention to my other related articles!