Monday, 19 September 2016

Relating PDFs

The F# Journal just published an article about processing documents:
"This article tackles the challenge of computing the relationships between a set of PDF files using the commonality of words within them. The iTextSharp library is used to extract the text in PDF documents and the StemmersNet library is then used to convert the words into word stems. A simple numerical method is used to compute the commonality between the word frequencies in different documents and the resulting relationships are visualized using GraphViz..."

To read this article and more, subscribe to The F# Journal today!

No comments: