Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 4949631
Votes 82
Synopsis String.getBytes() does not work on some strings larger than 16MB
Category java:char_encodings
Reported Against 1.4.2
Release Fixed mustang(b14)
State 10-Fix Delivered, bug
Priority: 3-Medium
Related Bugs 6192102
Submit Date 05-NOV-2003
Description


FULL PRODUCT VERSION :
java version "1.4.2"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-b28)
Java HotSpot(TM) Client VM (build 1.4.2-b28, mixed mode)



FULL OS VERSION :
Linux wks001 2.4.20-19.9 #1 Wed Jul 23 19:06:26 EDT 2003 i686 i686 i386 GNU/Linux
SunOS drip 5.8 Generic_108528-22 sun4u sparc SUNW,UltraAX-i2

A DESCRIPTION OF THE PROBLEM :
When a string gets over a certain length (16777216 characters), calling getBytes() on it will trigger a java.nio.BufferOverflowException for certain string lengths. Adding one character at a time, shows this to be 1 in every 4.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Create a file of at least 16777216 characters (this is the boundary at which the bug starts to occur). e.g.:

  dd if=/dev/zero of=/tmp/inputfile bs=1024 count=16384

Create a test program to read in this file to a string. Then repeatedly add a character to the string and call getBytes() on it. Each 4th character added will cause a java.nio.BufferOverflowException. See source code example for this.



EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
total length = 16777216
now at total length = 16777217
now at total length = 16777218
now at total length = 16777219
now at total length = 16777220
now at total length = 16777221
now at total length = 16777222
now at total length = 16777223
now at total length = 16777224
now at total length = 16777225
now at total length = 16777226
now at total length = 16777227
now at total length = 16777228
now at total length = 16777229
now at total length = 16777230
...etc...
ACTUAL -
total length = 16777216
now at total length = 16777217
Error at total length = 16777217
java.nio.BufferOverflowException
now at total length = 16777218
now at total length = 16777219
now at total length = 16777220
now at total length = 16777221
Error at total length = 16777221
java.nio.BufferOverflowException
now at total length = 16777222
now at total length = 16777223
now at total length = 16777224
now at total length = 16777225
Error at total length = 16777225
java.nio.BufferOverflowException
now at total length = 16777226
now at total length = 16777227
now at total length = 16777228
now at total length = 16777229
Error at total length = 16777229
java.nio.BufferOverflowException
now at total length = 16777230
...etc...

ERROR MESSAGES/STACK TRACES THAT OCCUR :
Exception in thread "main" java.nio.BufferOverflowException
        at java.nio.charset.CoderResult.throwException(CoderResult.java:259)
        at java.lang.StringCoding$CharsetSE.encode(StringCoding.java:340)
        at java.lang.StringCoding.encode(StringCoding.java:374)
        at java.lang.StringCoding.encode(StringCoding.java:380)
        at java.lang.String.getBytes(String.java:590)
        at TestBuffer2.main(TestBuffer2.java:21)


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.io.*;

class TestBuffer2 {

	public static void main(String[] args) throws IOException {

		StringBuffer output = new StringBuffer();

		byte[] buf = new byte[102400];
		FileInputStream fis = new FileInputStream("/tmp/inputfile");
		long totalLength=0;
		int bytes = 0;
		while((bytes = fis.read(buf))>0) {
			output.append(new String(buf,0,bytes));
			totalLength+=bytes;
		}
		System.out.println("total length = "+totalLength);

		for (int i = 0; i < 10000; i++) {
			try {
				byte bufferoverflow2[] = output.toString().getBytes();
			} catch (Exception e) {
				System.out.println("Error at total length = "+totalLength);
				System.out.println(e);
			}
			output.append("a");
			totalLength += 1;
			System.out.println("now at total length = "+totalLength);
		}
		System.out.println("Done!\n\n");
	}
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
If you are not concerned with the exact format of your output string (e.g. when using it for HTML or XML purposes), you can hack around the problem like this:

if (output.length() > 16777217 && output.length() % 4 == 1) {
    output.append("\n");
}
(Incident Review ID: 223082) 
======================================================================
 xxxxx@xxxxx  2004-11-09 00:34:44 GMT
Work Around
N/A
Evaluation
True.  --  xxxxx@xxxxx  2003/11/15

Here's a more concise test case:

----------------------------------------------------------
class Bug {
    static void test(int size) {
	try {new String(new char[size]).getBytes();}
	catch (Throwable t) {
	    System.out.println("Failed with size="+size);
	    t.printStackTrace();
	}}

    public static void main(String[] args) throws Exception {
	for (int i = 0; i < 10; i++)
	    test(16777216+i);
    }
}
----------------------------------------------------------

which fails in the same manner.

 xxxxx@xxxxx  2004-09-02

Ah yes, 16MB is 24 bits, which is the range of accuracy of a float,
and floats are used for maxBytesPerChar and friends.
We need to be more careful with losing bits near
Integer.MAX_VALUE.

 xxxxx@xxxxx  2004-09-05

Analysis reveals that both encoders and decoders have the same bug.
See this program:

class Bug4 {
    public static void main(String[] args) throws Exception {
	try {new String(new char[16777217]).getBytes("ASCII");}
	catch (Throwable t) {t.printStackTrace();}

	try {new String(new byte[16777217],"ASCII");}
	catch (Throwable t) {t.printStackTrace();}
    }
}

 xxxxx@xxxxx  2004-09-05
Comments
  
  Include a link with my name & email   

Submitted On 07-NOV-2003
bluppie4
Ok, we've tracked down a requirement for this problem to occur:

If the LANG environment variable is set to en_US the problem
occurs. If set to en_US.UTF-8 the problem doesn't occur.

Cheers!


Submitted On 01-DEC-2003
mpultz
Just wanted to let you all know that the customer submitted
workaround worked for me on SUSE 8.2 (JDK 1.4.2) and Windows
NT (JDK 1.4.2). Both platforms experienced the problem with
getBytes().


Submitted On 23-DEC-2003
solidmat
I am getting this bug as well on Windows XP running 
with JRE 1.4.2-b28, after running the exploit code given 
here.


Submitted On 23-DEC-2003
solidmat
SOURCE CODE TRACE

We took a look at the source code of the JVM. The 
problem stems from the fact that float values are used 
to indicate the maximum value of bytes per characters 
in java.nio.charset.CharsetEncoder.maxBytesPerChar.

The issue is that floats cannot accuratly hold more than 
2^24 integer values which is equals to 16,777,216. 
After that value is reached, the encoding operation in 
the character set classes incorrectly rounds down the 
amount of memory needed for the buffer. The correct 
solution would be to use doubles instead, or account 
for the round off problem by increasing the buffer size.

SUGGESTED WORKAROUND

The workaround that we are using, is to use to .
getBytes() on a substring that is smaller than 16MB, 
and combined the results by either using a 
ByteArrayOutputStream or a ByteBuffer.

NOTE: If you are planning on using more than one-byte 
characters sets, than you have to make sure that your 
buffer is set accordingly.


Submitted On 13-APR-2004
ridesmet
Can someone explain to me why creating a
FileOutputStream.open() calls into String.getBytes? I also
encounter the following bug, but with the following stack trace:

java.nio.BufferOverflowException
        at
java.nio.charset.CoderResult.throwException(CoderResult.java:259)
        at
java.lang.StringCoding$CharsetSE.encode(StringCoding.java:338)
        at java.lang.StringCoding.encode(StringCoding.java:372)
        at java.lang.StringCoding.encode(StringCoding.java:378)
        at java.lang.String.getBytes(String.java:608)
        at java.io.FileOutputStream.open(Native Method)
        at
java.io.FileOutputStream.<init>(FileOutputStream.java:176)
        at
java.io.FileOutputStream.<init>(FileOutputStream.java:131)


Submitted On 26-MAY-2004
Yasushi.Umezaki.Kana
I found the following codes can reproduce the same error.

---------- BEGIN SOURCE ----------

import javax.mail.*;
import javax.mail.internet.*;

public class MimeTest
{

    public static void main(String[] args)
    {

	try{
		System.out.println("The string 'އ ' can be encoded successfully...");
		System.out.println(MimeUtility.encodeText("އ ", "ISO-2022-JP", "B"));

		System.out.println("But, the string 'އ' can not...");
		System.out.println(MimeUtility.encodeText("އ", "ISO-2022-JP", "B"));
	   }
	   catch(Exception e)
	   {
		e.printStackTrace();
	   }
    }

}

---------- END SOURCE ----------

======= Results ======================

The string 'އ ' can be encoded successfully...
=?ISO-2022-JP?B?GyRCO2cbKEIg?=
But, the string 'އ' can not...
java.nio.BufferOverflowException
        at java.nio.charset.CoderResult.throwException(CoderResult.java:259)
        at java.lang.StringCoding$CharsetSE.encode(StringCoding.java:343)
        at java.lang.StringCoding.encode(StringCoding.java:374)
        at java.lang.String.getBytes(String.java:573)
        at javax.mail.internet.MimeUtility.doEncode(MimeUtility.java:635)
        at javax.mail.internet.MimeUtility.encodeWord(MimeUtility.java:617)
        at javax.mail.internet.MimeUtility.encodeText(MimeUtility.java:418)
        at MimeTest.main(MimeTest.java:15)



Submitted On 26-MAY-2004
Yasushi.Umezaki.Kana
Sorry... the particular characters appear as garbage in my previous comments.  Originally I wrote the particular Japanese character, 0x8e87.


Submitted On 02-JUL-2004
swisstom
I had the same problem with String.getBytes() throwing an java.nio.BufferOverflowException.  

THANKS for the workaround! It works for me too!

Cheers!


Submitted On 02-JUL-2004
swisstom
PS: BTW, for the workaround... does it work for the bordercase of length 16777217?  (I would guess NO)

shouldn't it be
    if (output.length() > 16777216 && output.length() % 4 == 1)
instead of
    if (output.length() > 16777217 && output.length() % 4 == 1)
??
(or >= instead of >)


Submitted On 13-JUL-2004
jacklty
It is a horrible bug.... took me few weeks to track it down..... Next time, I will check the bug database before digging into the code.........


Submitted On 30-JUL-2004
yoda22281
The same thing happens with StringBuffer.append when the size of the string buffer hits the aforementioned limit (16777217).


Submitted On 20-AUG-2004
jarouch
Simple new String(new byte[16777217]) causes exception too.. I tried it with last snapshot build (b60) and bug is still there..


Submitted On 09-OCT-2005
oberserk
Thank you.


Submitted On 14-MAR-2006
moizd
Is there a plan to bring this fix to a java 5 update release


Submitted On 14-MAR-2006
tflora
It really sucks that this bug has not been fixed in 1.4.2. How can Sun justify leaving a bug like this out in a critical release.

Is there any known charset that does not exhibit this problem?

Thanks,

Todd


Submitted On 25-APR-2006
    /**
     * String's getBytes() method instantiates the byte array
     * with a size equal to the integer equivalent of the 
     * floating point equivalent of the length of the string.
     * Similar to doing the following:
     * <code>byte[] b = new byte[(int)((float) foo.length())]</code> 
     * 
     * Primitive float's only keep track of the 24 most significant bits.
     * In order to avoid round off problems which could create
     * the BufferOverflowException with extremely large strings,
     * additional characters can be added so that the lost least significant
     * bits are all 0's.
     * 
     * This method takes a string, and returns the same string with as 
     * few as 0, and no more than 128 copies of c appended to the end,
     * thus converting it into a getBytes() compatible string.
     * 
     * @param foo A string, usually with length greater than 16777216
     * @param c The char to add at the end of the string if required.
     * @return A new string which won't cause a BufferOverflowException
     * when getBytes() is called, with up to 128 copies of c appended.
     * @throws Exception When the string is too long to allow any more
     * characters to be added, and thus cannot be made getBytes() compatible.
     */
   private String bug4949631(String foo, char c) throws Exception {
	   if (foo.length() <= (int) Math.pow(2, 24)) return foo;
	   if (foo.length() > (int) (Math.pow(2, 31) - 129)) 
		   throw new Exception("The string is too long to make getBytes() compatiable");
	   
	   // determine how many bits are being chopped off
	   // on conversion to float
	   int numberLSBLost = 7;  // assume worst case
	   int msbMask = (int) Math.pow(2, 30);
	   int i = foo.length();
	   while ((i & msbMask) == 0) {
		   numberLSBLost--;
		   i = i << 1;
	   }
	   // we want to add just enough chars to avoid rounding
	   int lostBitsMask = (int) Math.pow(2, numberLSBLost) - 1;
	   int lostBitsValue = foo.length() & lostBitsMask;
	   int numCharsToAdd = (lostBitsMask + 1) - lostBitsValue;
	   char[] bar = new char[numCharsToAdd];
	   // format
	   for (int j = 0; j < numCharsToAdd; j++) {
		   bar[j] = c;
	   }
	   return foo + new String(bar);
   }



PLEASE NOTE: JDK6 is formerly known as Project Mustang