BOM character
1.BOM
is a Unicode character used to identify the endianness of the text file or stream.
2. The UTF-8 representation of the BOM is the byte sequence0xEF
,0xBB
,0xBF
.
3. A text editor usingISO-8859-1
as character encoding will display the characters
for BOM.
4. BOM has no meaning in UTF-8 apart from signalling that the byte stream that follows is encoded in UTF-8
import java.nio.charset.Charset; /** * File: BOM.java * * The following class converts a string having bom character * from ISO-8859-1 encoding type to UTF-8 and back */ public class BOM { public static void main(String[] args) throws Exception { System.out.println("Default Encoding: " + Charset.defaultCharset()); // // Displays a simple string with bom prepended. // Uses system default character encoding // String bomString = "Hello World"; System.out.println(bomString + " Length: " + bomString.length()); // // convert string with bom character to utf string // byte[] byteArrayISO = bomString.getBytes("ISO-8859-1"); String utfString = new String(byteArrayISO, "UTF-8"); System.out.println(utfString + " Length: " + utfString.length()); // // convert the utf string back to windows character encoding // byte[] byteArrayUTF = utfString.getBytes("UTF-8"); String winString = new String(byteArrayUTF, "ISO-8859-1"); System.out.println(winString + " Length: " + winString.length()); } }
Output of the above program when run on a
UTF-8
supported console$ java BOM Default Encoding: windows-1252 Hello World Length: 17 Hello World Length: 14 Hello World Length: 17
No comments :
Post a Comment