BOM character
1.BOMis a Unicode character used to identify the endianness of the text file or stream.
2. The UTF-8 representation of the BOM is the byte sequence0xEF,0xBB,0xBF.
3. A text editor usingISO-8859-1as character encoding will display the charactersfor BOM.
4. BOM has no meaning in UTF-8 apart from signalling that the byte stream that follows is encoded in UTF-8
import java.nio.charset.Charset;
/**
* File: BOM.java
*
* The following class converts a string having bom character
* from ISO-8859-1 encoding type to UTF-8 and back
*/
public class BOM
{
public static void main(String[] args) throws Exception
{
System.out.println("Default Encoding: " + Charset.defaultCharset());
//
// Displays a simple string with bom prepended.
// Uses system default character encoding
//
String bomString = "Hello World";
System.out.println(bomString + " Length: " + bomString.length());
//
// convert string with bom character to utf string
//
byte[] byteArrayISO = bomString.getBytes("ISO-8859-1");
String utfString = new String(byteArrayISO, "UTF-8");
System.out.println(utfString + " Length: " + utfString.length());
//
// convert the utf string back to windows character encoding
//
byte[] byteArrayUTF = utfString.getBytes("UTF-8");
String winString = new String(byteArrayUTF, "ISO-8859-1");
System.out.println(winString + " Length: " + winString.length());
}
}
Output of the above program when run on a
UTF-8 supported console$ java BOM Default Encoding: windows-1252 Hello World Length: 17 Hello World Length: 14 Hello World Length: 17
No comments :
Post a Comment