4th January, 2009 - Posted by MsD
*without making yourself crazy!
Has this ever happened to you? You dumped your source Word document, and you need the content from that PDF file to use in another Word or other file. No problem. You just open the PDF file, and choose File > Export > RTF.
Flash forward a few minutes and your confidence in the task takes a nose dive. Why? Each and every piece of your PDF file is safely ensconced in a Word frame.
Acrobat offers an export setting that gives you some predictability, at least, although you’re not likely to have the perfect output unless you started with the perfect source file prior to building the original PDF version. That is, a page with plain text, no styles, no image wraps, no headers or footers. Essentially, anything you write in Notepad works well in a multi-phase conversion. But I digress.
The key to the frames vs no-frames output depends on your setting. When exporting the file, choose File > Export > RTF. Click Settings to open the Settings dialog box where you’ll see two radio buttons.
The left option, Retain Flowing Text, technically exports the content as a string from top to bottom, keeping font and style information intact where possible. The right radio button, Retain Page Layout, exports the content as you see it laid out on the page. To maintain the content relationships, Acrobat encloses each placed item in a frame.
Look at the composite image next. At the center is the PDF page. The left pullout shows the impact of retaining the page layout, while the right pullout shows the effect of retaining the text flow.
So what does this mean to you? If you want a few discrete items, go for the retained layout; for a longer string, leave the default retained text flow. You’re not going to have perfect results regardless of your approach. For that, you’d need your original document.