Embedded fonts in PDF: copy and paste problems -
when trying copy , paste ms word document pdf document has sets of fonts embedded, result illegible.
several symbols changed or disappear.
using adobe acrobat can check specific fonts embedded.
- would installing such fonts in microsoft word work out?
- if so, can or create subsets of fonts need?
- if not, how solve problem?
you should check pdf document's fonts first of pdffonts
utility. part of xpdf package windows , can used without installing, dos box.
in order extract text (or copy'n'paste it) pdf, font should either use standard encoding (not custom
one), , should have /tounicode
table associated inside pdf.
pdffonts
returns few basic information items fonts used pdf.
example output:
$ pdffonts -f 3 -l 5 sample.pdf name type encoding emb sub uni object id ------------------------- ------------- ------------ --- --- --- --------- iadkrb+arial-boldmt cid truetype identity-h yes yes yes 10 0 sskfgj+arialmt cid truetype custom yes yes no 11 0
the command above asked fonts used in page range 3
(first check) 5
(last page check).
in above case, both used fonts embedded subsets (indicated xyzabc+
-prefixes names, yes
in emb
, sub
columns).
the font sskfgj+arialmt
uses custom encoding, pdf has no /tounicode
font, indicated no
entry column headed uni
.
hence not easy extract text shown font (extraction require manual reverse engineering -- can "read" pdf pages).
you should check first, if copy'n'pasting of text works if use simple text file target (not ms word document). if doesn't, can forget ms word...
- would installing such fonts in microsoft word work out?
- very likely: no. (i cannot give definite answer without having myself access pdf in question.)
- if so, can or create subsets of fonts need?
- you extract subsetted fonts pdf itself. (funnily, my popular stackoverflow answer deals question -- dunno why people seem crazy extracting fonts pdf files other debugging purposes...)
- if not, how solve problem?
- there no solution other doing manually.
update
you can, unfortunately, not same info fonts used pdf via acrobat or adobe reader. can via menu -> file -> properties...
- the font names,
- the subset info (but not prefixes used subset font names),
- the encoding and
- the font type.
but not info presence of /tounicode
table.
Comments
Post a Comment