During a large project, I was asked to develop a tool that could compare data (from text files) with data from another system (thoug formatted differently) and in PDF files!

I just needed to get the text of the individual PDF pages, then I could easily do the comparison with regular expression magic. To get the text from the PDF files, I could write my own PDF parser. Sigh! No solution, as this was on a tight schedule.

Lucky for me, there is an open source project called iTextSharp, which is a PDF library for C#. Using this I was able to read in the PDF and quickly extract the text of the individual pages, simply by doing as follows (of course linking to the appropriate assemblies):

using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;

private string PdfAsText(string file)
{
    var result = new StringBuilder();

    var reader = new PdfReader(file);
    var numPages = reader.NumberOfPages;

    for (var page = 1; page <= numPages; page++)
    {
        var text = PdfTextExtractor.GetTextFromPage(reader, page);
        result.AppendLine(text);
    }

    return result.ToString();
}