Wednesday, 25 January 2017

Converting Word, HTML, PowerPoint and PDF documents to text

The F# Journal just published an article:

"The first challenge in Natural Language Processing (NLP) is usually converting available documents into text ready for processing. This article looks at functions that convert Word, HTML, PowerPoint and PDF documents into text using the Microsoft.Office.Interop.Word, HtmlAgilityPack, Spire.Presentation and iTextSharp Nuget packages, respectively..."

