cwn経由でextlibにUTF8.substringが定義されたと知り、じゃあオレオレな実装からサヨナラできるかな、と思って試してみましたが、
assert_equal (UTF8.substring "hoge" 0 5) "hoge"
は残念ながら "out of range" という例外で終わってしまう……
extlibのsubstringは、LLなsubstringとは違い、境界を超えるような値の定義を許さないみたいです。
でもこれだと、常に利用者が境界値のチェックを義務付けられるわけで、アプリケーション側からは使いにくいかなあと。
だって、例外を捕まえるにしても out of rangeってだけじゃ、どういう風に溢れたのかわからないですし。
substring "hoge" 0 100 で溢れるのも、substring "hoge" 100 1で溢れるのも、同じ例外ですよね。
でもLLな感覚のsubstringなら、前者は"hoge"を返したいし、後者は空文字を返したい。
というわけで、以下がjingooでの実装です(jingoo/jg_utils.mlより抜粋)。
open ExtLib let rec substring base count str = let len = UTF8.length str in if base >= len || count = 0 then "" else if base = 0 && count >= len then str else if base < 0 then substring (len + base) count str else if base + count >= len then let lp = UTF8.nth str base in let rp = UTF8.next str (UTF8.last str) in String.sub str lp (rp - lp) else let lp = UTF8.nth str base in let rp = UTF8.nth str (base + count) in String.sub str lp (rp - lp)
振る舞い方はこんなです。
assert_equal (substring 0 0) "hoge" "" assert_equal (substring 0 1) "hoge" "h" assert_equal (substring 0 4) "hoge" "hoge" assert_equal (substring 0 5) "hoge" "hoge" assert_equal (substring 1 1) "hoge" "o" assert_equal (substring 2 1) "hoge" "g" assert_equal (substring 3 1) "hoge" "e" assert_equal (substring 4 1) "hoge" "" assert_equal (substring 5 0) "hoge" "" assert_equal (substring 5 1) "hoge" "" (** negative base *) assert_equal (substring (-1) 1) "hoge" "e" assert_equal (substring (-2) 1) "hoge" "g" assert_equal (substring (-3) 1) "hoge" "o" assert_equal (substring (-4) 1) "hoge" "h" assert_equal (substring (-4) 2) "hoge" "ho" assert_equal (substring (-4) 3) "hoge" "hog" assert_equal (substring (-4) 4) "hoge" "hoge" assert_equal (substring (-4) 5) "hoge" "hoge" assert_equal (substring (-5) 1) "hoge" "e"