kake: The word "菜單" (Chinese for "menu") in various shades of purple. (菜單)
Kake ([personal profile] kake) wrote2010-08-16 10:30 am

Reading Chinese Menus: Concepts: grep

Just a quick post today, to mention one of the most useful computer tools I've found so far for helping me access and organise my vocab lists and transcribed menus — grep.

grep is a commandline tool that should be available on all Unixes (Linux, Solaris, OS X, etc), and on all those I have access to, it deals just fine with Chinese characters. This means that I can easily check through all my textfile documents to find, for example, dishes with prawns in: grep 蝦 *.txt

This is pretty powerful on its own, really, but the one thing it can't do is take account of simplified vs. traditional characters — and some of my lists/menus are copy-pasted from sources that use simplified characters, while the ones I've written/transcribed myself are in traditional characters.

So I wrote some Perl to make this easier, and you can find it on CPAN. It includes a commandline utility called dets (desensitise traditional-simplified) which builds a regexp from a string and can be used like so: grep `dets 蝦` *.txt (dets 蝦 returns [虾蝦]).

I realise I don't usually write about geek stuff on here, so eyes may be glazing over at this point — but if the owners of the remaining eyes have any comments, patches, or bug reports, I would love to hear them.

If you have any questions or corrections, please leave a comment (here's how) and let me know (or email me at kake@earth.li). See my introductory post to the Chinese menu project for what these posts are all about.
emperor: (Default)

[personal profile] emperor 2010-08-16 12:42 pm (UTC)(link)
I think $() is better style than ``, but YMMV :)
emperor: (Default)

[personal profile] emperor 2010-08-16 02:39 pm (UTC)(link)
The quoting for $() is a bit saner, too. I think you can nest the latter by `foo \` bar \` `, but ...