3

I do not like it but I am forced to ask a tech question because my friend google has no idea how to solve this problem...
So, I have a pdf with a bunch of points with a number inside. I have to produce a list of numbers with X and Y coordinate of the point.
What I have tried: convert pdf to HTML and extract the position of divs / completely failed because a lot of points were distorted, mixed up, contained more numbers, etc, it's just not precise enough after conversion.

Comments
  • 3
    Have you taken a look at the PDF with an hex editor?

    PDF can contain literally anything.

    If the coordinate system is e.g. a chain of commands inside the XREF, it boils down to matrix conversion I think.
  • 2
  • 6
    This is absurd. Why don't you have the original data? Whatever you do won't be robust
  • 2
    If you’re lucky the points are rendered as svgs and you can read it’s absolute positioning in their viewports.
  • 0
    @IntrusionCM I'll give it a try
  • 0
    @electrineer There is no original data. It was a visio document the points was assigned by hand. Basically everything is done manually here and I'm the person who wants to bring some automation...
  • 0
    @platypus This looks promising
  • 1
    @blindXfish to give a bit better explanation...

    If it uses eg. an box, it has coordinates within the PDF (the rectangular region within the PDF).

    The not so fun part is to extract the objects for the coordinate points inside the bbox.

    As you know the rectangular region for the bbox and every object has a position inside the bbox (one possibility, there are more) you can calculate the position inside the coordinate system.

    https://github.com/jsvine/...

    There are libraries (a fuckton of it) who exactly do this in an more easy manner, see above example.
  • 2
    OpenCV maybe?
  • 2
    Should be fairly simple

    Screenshot/render to image -> circle Hough transform -> the transform will give you the circle coordinates

    If you know that the circles are fixed in size you can use a pattern detection instead, like a convolution with the image of one circle. But the Hough transform is a pretty robust method.

    OpenCV should have this built in I think.
  • 2
    Thank you guy's. Finally I found a way. I could change the visio file from mm to point size, and run one built-in report functionality which gave me the point locations. 🤗
Add Comment