Skip to main content

Simple API to extract all text. - Community / GemBox.Document Feature Request - GemBox Support Center

Extracting text currently requires knowledge of many structures within Gembox for each kind of Microsoft document (PowerPoint, Doc, and Spreadsheet). A single common API call for extracting all text from all document types, without formatting, would be easier. This benefits full-text indexing.

Comment (1)

Mario - GemBox
Hi,

First, please note that each component has a different "main" class, so it's not possible to have an identical API.

Nevertheless, note that you can save Word files to TXT with DocumentModel.Save method, and you can save Excel files to CSV or TAB with ExcelFile.Save method.
So, essentially you can use a similar Save method calls on each "main" class.
With this you can save the documents and spreadsheet to a Stream in plain text format and then convert it to String using the same Encoding that was used for saving.

Unfortunately, the PresentationDocument.Save method currently doesn't have that capability (saving to TXT).
Would you be interested in something like that?

Regards,

Mario
GemBox d.o.o.