perl - Could File::Find::Rule be patched to automatically handle filename character encoding/decoding? -

suppose have file name æ (unicode : 0xe6, utf8 : 0xc3 0xa6) in current directory.

then, use file::find::rule locate it:

use feature qw(say); use open qw( :std :utf8 ); use strict; use utf8; use warnings;  use file::find::rule;  $fn = 'æ'; @files = file::find::rule->new->name($fn)->in('.'); $_ @files;

the output empty, apparently did not work.

if try encode filename first:

use encode;  $fn = 'æ'; $fn_utf8 = encode::encode('utf-8', $fn, encode::fb_croak | encode::leave_src); @files = file::find::rule->new->name($fn_utf8)->in('.'); $_ @files;

the output is:

Ã¦

so found file, returned filename not decoded perl string. fix this, can decode result, replacing last line with:

say encode::decode('utf-8', $_, encode::fb_croak) @files;

the question if both encoding , decoding could/should have been done automatically file::find::rule have used original program , not have had worry encoding , decoding @ all?

(for example, file::find::rule have used i18n::langinfo determine current locale's codeset utf-8 ?? )

yeah, wish. if there's major perl project i'd work on, it.

the issue there badly-encoded file names, including file names encoded using different encoding expected. means first thing needed way of round-tripping badly-encoded file names through decode-encode process. think python uses surrogate pair code points represent bad bytes.

you need pragma ensure backwards compatibility.

WIKI

Search This Blog

perl - Could File::Find::Rule be patched to automatically handle filename character encoding/decoding? -

Comments

Post a Comment