Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

hfsplus: fix decomposition of Hangul characters

Files created under macOS cannot be opened under linux if their names
contain Korean characters, and vice versa.

The Korean alphabet is special because its normalization is done without a
table. The module deals with it correctly when composing, but forgets
about it for the decomposition.

Fix this using the Hangul decomposition function provided in the Unicode
Standard. The code fits a bit awkwardly because it requires a buffer,
while all the other normalizations are returned as pointers to the
decomposition table. This is actually also a bug because reordering may
still be needed, but for now leave it as it is.

The patch will cause trouble for Hangul filenames already created by the
module in the past. This shouldn't really be concern because its main
purpose was always sharing with macOS. If a user actually needs to access
such a file the nodecompose mount option should be enough.

Link: http://lkml.kernel.org/r/20180717220951.p6qqrgautc4pxvzu@eaf
Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Reported-by: Ting-Chang Hou <tchou@synology.com>
Tested-by: Ting-Chang Hou <tchou@synology.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Ernesto A. Fernández and committed by
Linus Torvalds
afd6c9e1 31651c60

+56 -6
+56 -6
fs/hfsplus/unicode.c
··· 272 272 return size; 273 273 } 274 274 275 - /* Decomposes a single unicode character. */ 276 - static inline u16 *decompose_unichar(wchar_t uc, int *size) 275 + /* Decomposes a non-Hangul unicode character. */ 276 + static u16 *hfsplus_decompose_nonhangul(wchar_t uc, int *size) 277 277 { 278 278 int off; 279 279 ··· 296 296 return hfsplus_decompose_table + (off / 4); 297 297 } 298 298 299 + /* 300 + * Try to decompose a unicode character as Hangul. Return 0 if @uc is not 301 + * precomposed Hangul, otherwise return the length of the decomposition. 302 + * 303 + * This function was adapted from sample code from the Unicode Standard 304 + * Annex #15: Unicode Normalization Forms, version 3.2.0. 305 + * 306 + * Copyright (C) 1991-2018 Unicode, Inc. All rights reserved. Distributed 307 + * under the Terms of Use in http://www.unicode.org/copyright.html. 308 + */ 309 + static int hfsplus_try_decompose_hangul(wchar_t uc, u16 *result) 310 + { 311 + int index; 312 + int l, v, t; 313 + 314 + index = uc - Hangul_SBase; 315 + if (index < 0 || index >= Hangul_SCount) 316 + return 0; 317 + 318 + l = Hangul_LBase + index / Hangul_NCount; 319 + v = Hangul_VBase + (index % Hangul_NCount) / Hangul_TCount; 320 + t = Hangul_TBase + index % Hangul_TCount; 321 + 322 + result[0] = l; 323 + result[1] = v; 324 + if (t != Hangul_TBase) { 325 + result[2] = t; 326 + return 3; 327 + } 328 + return 2; 329 + } 330 + 331 + /* Decomposes a single unicode character. */ 332 + static u16 *decompose_unichar(wchar_t uc, int *size, u16 *hangul_buffer) 333 + { 334 + u16 *result; 335 + 336 + /* Hangul is handled separately */ 337 + result = hangul_buffer; 338 + *size = hfsplus_try_decompose_hangul(uc, result); 339 + if (*size == 0) 340 + result = hfsplus_decompose_nonhangul(uc, size); 341 + return result; 342 + } 343 + 299 344 int hfsplus_asc2uni(struct super_block *sb, 300 345 struct hfsplus_unistr *ustr, int max_unistr_len, 301 346 const char *astr, int len) ··· 348 303 int size, dsize, decompose; 349 304 u16 *dstr, outlen = 0; 350 305 wchar_t c; 306 + u16 dhangul[3]; 351 307 352 308 decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags); 353 309 while (outlen < max_unistr_len && len > 0) { 354 310 size = asc2unichar(sb, astr, len, &c); 355 311 356 312 if (decompose) 357 - dstr = decompose_unichar(c, &dsize); 313 + dstr = decompose_unichar(c, &dsize, dhangul); 358 314 else 359 315 dstr = NULL; 360 316 if (dstr) { ··· 390 344 unsigned long hash; 391 345 wchar_t c; 392 346 u16 c2; 347 + u16 dhangul[3]; 393 348 394 349 casefold = test_bit(HFSPLUS_SB_CASEFOLD, &HFSPLUS_SB(sb)->flags); 395 350 decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags); ··· 404 357 len -= size; 405 358 406 359 if (decompose) 407 - dstr = decompose_unichar(c, &dsize); 360 + dstr = decompose_unichar(c, &dsize, dhangul); 408 361 else 409 362 dstr = NULL; 410 363 if (dstr) { ··· 443 396 const char *astr1, *astr2; 444 397 u16 c1, c2; 445 398 wchar_t c; 399 + u16 dhangul_1[3], dhangul_2[3]; 446 400 447 401 casefold = test_bit(HFSPLUS_SB_CASEFOLD, &HFSPLUS_SB(sb)->flags); 448 402 decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags); ··· 461 413 len1 -= size; 462 414 463 415 if (decompose) 464 - dstr1 = decompose_unichar(c, &dsize1); 416 + dstr1 = decompose_unichar(c, &dsize1, 417 + dhangul_1); 465 418 if (!decompose || !dstr1) { 466 419 c1 = c; 467 420 dstr1 = &c1; ··· 476 427 len2 -= size; 477 428 478 429 if (decompose) 479 - dstr2 = decompose_unichar(c, &dsize2); 430 + dstr2 = decompose_unichar(c, &dsize2, 431 + dhangul_2); 480 432 if (!decompose || !dstr2) { 481 433 c2 = c; 482 434 dstr2 = &c2;