Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

udf: Fix leak of UTF-16 surrogates into encoded strings

OSTA UDF specification does not mention whether the CS0 charset in case
of two bytes per character encoding should be treated in UTF-16 or
UCS-2. The sample code in the standard does not treat UTF-16 surrogates
in any special way but on systems such as Windows which work in UTF-16
internally, filenames would be treated as being in UTF-16 effectively.
In Linux it is more difficult to handle characters outside of Base
Multilingual plane (beyond 0xffff) as NLS framework works with 2-byte
characters only. Just make sure we don't leak UTF-16 surrogates into the
resulting string when loading names from the filesystem for now.

CC: stable@vger.kernel.org # >= v4.6
Reported-by: Mingye Wang <arthur200126@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>

Jan Kara 44f06ba8 06856938

+6
+6
fs/udf/unicode.c
··· 28 28 29 29 #include "udf_sb.h" 30 30 31 + #define SURROGATE_MASK 0xfffff800 32 + #define SURROGATE_PAIR 0x0000d800 33 + 31 34 static int udf_uni2char_utf8(wchar_t uni, 32 35 unsigned char *out, 33 36 int boundlen) ··· 39 36 40 37 if (boundlen <= 0) 41 38 return -ENAMETOOLONG; 39 + 40 + if ((uni & SURROGATE_MASK) == SURROGATE_PAIR) 41 + return -EINVAL; 42 42 43 43 if (uni < 0x80) { 44 44 out[u_len++] = (unsigned char)uni;