Reading Chinese Menus: Concepts: grep
Aug. 16th, 2010 10:30 amJust a quick post today, to mention one of the most useful computer tools I've found so far for helping me access and organise my vocab lists and transcribed menus — grep.
grep is a commandline tool that should be available on all Unixes (Linux, Solaris, OS X, etc), and on all those I have access to, it deals just fine with Chinese characters. This means that I can easily check through all my textfile documents to find, for example, dishes with prawns in: grep 蝦 *.txt
This is pretty powerful on its own, really, but the one thing it can't do is take account of simplified vs. traditional characters — and some of my lists/menus are copy-pasted from sources that use simplified characters, while the ones I've written/transcribed myself are in traditional characters.
So I wrote some Perl to make this easier, and you can find it on CPAN. It includes a commandline utility called dets (desensitise traditional-simplified) which builds a regexp from a string and can be used like so: grep `dets 蝦` *.txt (dets 蝦 returns [虾蝦]).
I realise I don't usually write about geek stuff on here, so eyes may be glazing over at this point — but if the owners of the remaining eyes have any comments, patches, or bug reports, I would love to hear them.