Home » Php » PHP-GD: Dealing with Unicode characters

PHP-GD: Dealing with Unicode characters

Posted by: admin July 12, 2020 Leave a comment

Questions:

I am developing a web service that renders characters using the PHP GD extension, using a user-selected TTF font.

This works fine in ASCII-land, but there are a few problems:

  1. The string to be rendered comes in as UTF-8. I would like to limit the list of user-selectable fonts to be only those which can render the string properly, as some fonts only have glyphs for ASCII characters, ISO 8601, etc.

  2. In the case where some decorative characters are included, it would be fine to render the majority of characters in the selected font and render the decorative characters in Arial (or whatever font contains the extended glyphs).

It does not seem like PHP-GD has support for querying the font metadata sufficiently to figure out if a character can be rendered in a given font. What is a good way to get font metrics into PHP? Is there a command-line utility that can dump in XML or other parsable format?

How to&Answers:

PHP-Cairo built against Pango and fontconfig should have enough brains to do font substitution when appropriate.

Answer:

You can try to use pdf_info_font() from pdflib extension. Good example is there: http://www.pdflib.com/pdflib-cookbook/fonts/font-metrics-info/php-font-metrics-info/

Answer:

If you don’t have a unicode font, you’ll need to try something like

 <?php 
$trans = new Latin1UTF8(); 

$mixed = "MIXED TEXT INPUT"; 

print "Original: ".$mixed; 
print "Latin1:   ".$trans->mixed_to_latin1($mixed); 
print "UTF-8:    ".$trans->mixed_to_utf8($mixed); 

class Latin1UTF8 { 

    private $latin1_to_utf8; 
    private $utf8_to_latin1; 
    public function __construct() { 
        for($i=32; $i<=255; $i++) { 
            $this->latin1_to_utf8[chr($i)] = utf8_encode(chr($i)); 
            $this->utf8_to_latin1[utf8_encode(chr($i))] = chr($i); 
        } 
    } 

    public function mixed_to_latin1($text) { 
        foreach( $this->utf8_to_latin1 as $key => $val ) { 
            $text = str_replace($key, $val, $text); 
        } 
        return $text; 
    } 

    public function mixed_to_utf8($text) { 
        return utf8_encode($this->mixed_to_latin1($text)); 
    } 
} 
?>

Taken from http://php.net/manual/en/function.utf8-decode.php

If the mixed and utf-8 characters are equal then you can use it. If not, then you can’t.

Answer:

I ended up using the TTX utility to dump font metrics. I could then inspect the resulting .ttx files and look at the character->glyph map. I did this manually, but automatic parsing of the XML files is possible.

An example GNU Makefile which generates the .ttx files from a set of TrueType fonts in the same directory:

all: fontmetrics

fontmetrics: $(patsubst %.ttf,%.ttx,$(wildcard *.ttf))
.PHONY: fontmetrics

clean:
    rm -f *.ttx

%.ttx: %.ttf
    ttx -t cmap $<