Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
ITextSharp and the conversion to TXT
Message
De
11/11/2011 19:57:32
 
 
À
Tous
Information générale
Forum:
ASP.NET
Catégorie:
Produits tierce partie
Titre:
ITextSharp and the conversion to TXT
Versions des environnements
Environment:
VB 9.0
OS:
Windows 7
Network:
Windows 2003 Server
Database:
MS SQL Server
Application:
Web
Divers
Thread ID:
01528722
Message ID:
01528722
Vues:
81
I use the following with iTextSharp to convert the PDF to TXT:
        ' Get the TXT from a PDF
        Public Function GetTXT() As Boolean
            Dim lcValue As String = ""
            Dim llSuccess As Boolean = False
            Dim lnCounter As Integer = 0
            Dim lnType As Integer = -1
            Dim loByte() As Byte = Nothing
            Dim loPRTokeniser As iTextSharp.text.pdf.PRTokeniser = Nothing
            Dim loStringBuilder As System.Text.StringBuilder = New System.Text.StringBuilder

            ' Reset the value
            cMessage = ""
            cPDF = ""

            Try

                ' For each page
                For lnCounter = 1 To nPage
                    loByte = oPdfReader.GetPageContent(lnCounter)

                    ' If we have something
                    If Not IsNothing(loByte) Then
                        loPRTokeniser = New iTextSharp.text.pdf.PRTokeniser(loByte)

                        ' For as long as we have something
                        While loPRTokeniser.NextToken()
                            lnType = loPRTokeniser.TokenType()
                            lcValue = loPRTokeniser.StringValue

                            ' If this is a string
                            If lnType = 2 Then
                                loStringBuilder.Append(loPRTokeniser.StringValue)
                            Else

                                ' If this is a numeric and -600
                                If lnType = 1 And lcValue = "-600" Then
                                    loStringBuilder.Append(" ")
                                Else

                                    ' If this is other and TJ
                                    If lnType = 10 And lcValue = "TJ" Then
                                        loStringBuilder.Append(" ")
                                    End If

                                End If

                            End If

                        End While

                    End If

                Next

                ' Make it available
                cPDF = loStringBuilder.ToString()

                llSuccess = True
            Catch loError As Exception
                cMessage = loError.Message
            End Try

            Return llSuccess
        End Function
This works and I do not have the French accented character problem on the values I had with PDF2TXT. However, this one converts it into a TXT in one big chunk. So, in order to extract the values, we have to find a delimiter, which is always the next field. However, in some places, the client has entered the data on several lines. One specific entry in the form is the address where it contains everything. So, the address is on the first line, the city on the next and so on. As this is one big chunk of text, I have no way of determine the end of the address and the beginning of the city. The reason is there is no marker between the end of the address and the city. So, both are simply concatenated. Another problem I have is for long memo fields where client has intentionnally enter the data on several lines where we need to recognize the carriage returns. This is no longer possible here.

So, basically, I am worst with that approach than with PDF2TXT. Does anyone know if there is a better approach to convert the PDF to TXT with iTextSharp in a similiar format as PDF2TXT where we have a readable report in TXT file as if we would be in the PDF?
Michel Fournier
Level Extreme Inc.
Designer, architect, owner of the Level Extreme Platform
Subscribe to the site at https://www.levelextreme.com/Home/DataEntry?Activator=55&NoStore=303
Subscription benefits https://www.levelextreme.com/Home/ViewPage?Activator=7&ID=52
Répondre
Fil
Voir

Click here to load this message in the networking platform