PDF conversion - Level Extreme

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

PDF conversion

Message

From

05/10/2015 13:53:36

Daniel Gadenne
France

05/10/2015 13:17:48

Colin Northway
Colin Northway Associates
London, United Kingdom

General information

Forum:

Visual FoxPro

Category:

Other

Title:

Re: PDF conversion

Environment versions

Visual FoxPro:

VFP 9 SP2

OS:

Windows 10

Network:

Windows 2008 Server

Database:

MS SQL Server

Application:

Web

Miscellaneous

Thread ID:

01625477

Message ID:

01625481

Views:

Hi Colin,

I have no idea about the best solution to achieve this. What I know is that, should I be in this situation, I'd certainly try to extract text content with my current copy of quickpdf. This considerable API has a lot in the field of "extraction" as it calls it. From GetPageText with it lot of parameters to more specific text extraction functions.

May I quote the documentation on their GetPageText:
Description - This function provides two different methods

"Using the standard text extraction algorithm:
0 = Extract text in human readable format
1 = Deprecated
2 = Return a CSV string including font, color, size and position of each piece
of text on the page
Using the more accurate but slower text extraction algorithm:
3 = Return a CSV string for each piece of text on the page with the following
format:
Font Name, Text Color, Text Size, X1, Y1, X2, Y2, X3, Y3, X4, Y4, Text
The co-ordinates are the four points bounding the text, measured using the
units set with the SetMeasurementUnits function and the origin set with
the SetOrigin function. Co-ordinate order is anti-clockwise with the bottom
left corner first.
4 = Similar to option 3, but individual words are returned, making searching
for words easier
5 = Similar to option 3 but character widths are output after each block of
text
6 = Similar to option 4 but character widths are output after each line of text
7 = Extract text in human readable format with improved accuracy compared
to option 0
8 = Similar output format as option 0 but using the more accurate algorithm.
Returns unformatted lines."

The issue? PDF is not exactly a nice format to extract data from:(

The .2 cents of a satisfied user with no vested interest in this dev shop in Australia:)

Daniel

Map

View

Click here to load this message in the networking platform