Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 4843136
Votes 0
Synopsis (process) pipe file descriptor from Runtime.exec not being closed
Category java:classes_lang
Reported Against 1.4.1_02 , mantis-rc , mantis-beta
Release Fixed 1.4.1_05
State 10-Fix Delivered, bug
Priority: 3-Medium
Related Bugs 4413680 , 4763384 , 4850368 , 4894690
Submit Date 04-APR-2003
Description
Reading from the stream obtained from
Process.getInputStream() would sometimes not get
an EOF after the process had closed the write
end of that pipe.I'm running:

datsun$ /opt/jdk1.4/bin/java -version
java version "1.4.2-beta"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-beta-b16)
Java HotSpot(TM) Client VM (build 1.4.2-beta-b16, mixed mode)

and the following problem occurs.  Here's what's happening...

I have a server process that runs forever.  It sometimes uses Runtime.exec
to start subprocesses.

datsun$ ps ax | grep java
   4100 ?        S  3:42 /opt/jdk1.4/bin/java -Djava.ext.dirs=/home/shannon/ext

This is my server process.  It's been running since Mar 17.

datsun$ pfiles 4100
4100:   /opt/jdk1.4/bin/java -Djava.ext.dirs=/home/shannon/ext -jar /home/shan
   Current rlimit: 1024 file descriptors
    0: S_IFCHR mode:0666 dev:136,0 ino:60568 uid:0 gid:3 rdev:13,2
       O_RDONLY|O_LARGEFILE
    1: S_IFREG mode:0600 dev:32,16 ino:745340 uid:1011 gid:10 size:6037573
       O_WRONLY|O_LARGEFILE
    2: S_IFIFO mode:0000 dev:231,0 ino:131335 uid:0 gid:25 size:0
       O_RDWR
    3: S_IFCHR mode:0666 dev:136,0 ino:60570 uid:0 gid:3 rdev:13,12
       O_RDWR
    4: S_IFDOOR mode:0444 dev:236,0 ino:61825 uid:0 gid:0 size:0
       O_RDONLY|O_LARGEFILE FD_CLOEXEC  door to nscd[223]
    5: S_IFIFO mode:0664 dev:32,16 ino:745581 uid:1011 gid:10 size:780
       O_RDONLY|O_LARGEFILE
    6: S_IFREG mode:0600 dev:32,16 ino:746048 uid:1011 gid:10 size:7607
       O_RDONLY|O_LARGEFILE
    7: S_IFIFO mode:0664 dev:32,16 ino:745581 uid:1011 gid:10 size:780
       O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE
    8: S_IFREG mode:0600 dev:32,16 ino:745922 uid:1011 gid:10 size:1034402
       O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE
   10: S_IFREG mode:0600 dev:32,16 ino:745922 uid:1011 gid:10 size:1034402
       O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE

Note that fd 9 and fd 11 are the last two unused fds.  If I were to create
a pipe, the read end would be fd 9 and the write end would be fd 11.

datsun$ ps alx | grep 4100
  8  1011  4100  4097  0  29 3011865664992 0002d934 S ?         3:42 /opt/jdk1.4/
  8  1011 17934  4100  0  49 23 1096  744 300037f4c40 S ?         0:00 /bin/sh /hom
datsun$ ps alx | grep 17934
  8  1011 17934  4100  0  49 23 1096  744 300037f4c40 S ?         0:00 /bin/sh /hom
  8  1011 17936 17934  0  49 23  976  704 300043d8a80 S ?         0:00 cat

17934 is the shell started by 4100.

datsun$ pfiles 17934
17934:  /bin/sh /home/shannon/.mail/filtermsg metadata-jsr +In/metadata
   Current rlimit: 1024 file descriptors
    0: S_IFIFO mode:0000 dev:231,0 ino:178470 uid:1011 gid:10 size:0
       O_RDWR
    1: S_IFREG mode:0600 dev:0,2 ino:1624593 uid:1011 gid:10 size:7
       O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE
    2: S_IFREG mode:0600 dev:0,2 ino:1624593 uid:1011 gid:10 size:7
       O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE
   11: S_IFIFO mode:0000 dev:231,0 ino:178470 uid:1011 gid:10 size:0
       O_RDWR
   19: S_IFREG mode:0775 dev:32,16 ino:745551 uid:1011 gid:10 size:933
       O_RDONLY|O_LARGEFILE FD_CLOEXEC

Note that both fd 0 and fd 11 are the same pipe.

fd 1 and fd 2 were redirected to a debug output file by the shell script.

17936 is a "cat" command capturing the data from the pipe.
It should terminate when the pipe is closed.

datsun$ pfiles 17936
17936:  cat
   Current rlimit: 1024 file descriptors
    0: S_IFIFO mode:0000 dev:231,0 ino:178470 uid:1011 gid:10 size:0
       O_RDWR
    1: S_IFREG mode:0600 dev:0,2 ino:4450889 uid:1011 gid:10 size:7649
       O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE
    2: S_IFREG mode:0600 dev:0,2 ino:1624593 uid:1011 gid:10 size:7
       O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE
   11: S_IFIFO mode:0000 dev:231,0 ino:178470 uid:1011 gid:10 size:0
       O_RDWR

Note that it also has the pipe open on fd 11.

Because so many processes have the pipe open, the fact that the JVM
end closed the pipe makes no difference.  The pipe doesn't get an EOF
and so the cat process hangs forever.

It looks like there might be some sort of bug in the JVM where it
doesn't always close the write end of the pipe before starting the
subprocess that should only get the read end of the pipe.  The parent
should keep the write end open and close the read end, and the child
should keep the read end open and close the write end.

And the weird thing is this doesn't happen every time.  It only seems
to fail after running for a long time, and even then it only fails
sometimes.

This also fails on 1.4.1.
Work Around
N/A
Evaluation
We will attempt to resolve this for Tiger.

--  xxxxx@xxxxx  2003-04-07

This appears to be a Solaris bug in the implementation of /proc/<pid>/fd.

The bug is hard to reproduce, and I have not been able to
reproduce it consistently.  However, the following program

-------------------------------------------------------------------
public class PipeHang {
    public static void main (String[] args) throws Exception {
	Process cat = Runtime.getRuntime().exec(new String[] {"cat"});

	// Attack of the file descriptor snatchers
	for (int i = 0; i < 50; ++i)
	    Runtime.getRuntime().exec(new String[] {"/bin/sleep", "999"});

	cat.getOutputStream().write(new byte[] {'M','e','o','w','\n'});
	cat.getOutputStream().close();

	byte [] buffer = new byte [100];
	int len;
	while ((len = cat.getInputStream().read(buffer)) > 0)
	    System.out.print (new String (buffer,0,len));
    }
}
-------------------------------------------------------------------

hangs more than 50% of the time when run on a 
Solaris 7 or Solaris 8 machine.  You might have to fiddle with the
number of sleep processes to reproduce the hang.

The traditional way to close all file descriptors is a loop like this:

int max_fds = SYSTEM_DEPENDENT_VALUE():
for (int i = 3; i <= max_fds; ++i)
  close (i);

There are two problems with this:
- max_fds can be a large number, making this quite inefficient.
- The system might have no limit whatsoever on the number of file descriptors,
  e.g. via (as root)
  ulimit -Hn unlimited; ulimit -Sn unlimited
  In this case, the loop is impractical - inefficient and a stress test
  for the OS, which we should avoid.
  See bug 4413680.

This is why we want to use the approach of reading the directory entries for
/proc/getpid()/fd/

Unfortunately, this method is buggy on Solaris.

(This was already known. See this comment from UNIXProcess_md.c.solaris:
	/* Close pipe fds here since they don't always show up
	 * in /proc/<pid>/fd
	 */
)

Another approach is to use the close-on-exec flag on file descriptors
to explicitly specify that they should not be open in the child.
See fcntl(2).
The difficulty with that approach is that we would need to audit *all*
operations in the "java" process that could create file descriptors.
This cannot be a complete solution because file descriptors can be
created by user-defined native code.

Here is an interesting comment from
4413680 Runtime exec hangs if ulimit nofiles is unlimited.

For future reference to those who may encounter bugs like this: libc
(starting in Solaris 9) implements closefrom(3C) which efficiently
closes all FD's greater than a given number.


There is no 100% guaranteed reliable solution.

We should:
- set the close-on-exec flag on the parent end of pipe file descriptors
  created by Runtime.exec()
  This will probably make the specific reported bug just go away.
- close all file descriptors using the /proc/<pid>/fd method.
- Because we don't trust the previous step, we supplement it with
  additional bug workaround code.  For example, we could use the
  /proc/<pid>/fd method twice, or remember the maximum fd listed in
  /proc/<pid>/fd, and brute-force close all file descriptors
  3 <= fd <= max_fd_found + 256
  Experimentation is required to find a bug workaround that is effective
  in practice.
- perhaps consider using closefrom() in Solaris 9 via libdl.

 xxxxx@xxxxx  2003-04-18

The above analysis is wrong.  The implementation of /proc on
Solaris is NOT buggy.  Instead, it was the function closeDescriptors()
in UNIXProcess_md.c.solaris and UNIXProcess_md.c.linux that was buggy.

Here's what was happening:

After we fork(), the child process wants to close all of the file descriptors
greater than 3.  To do this, it enumerates the entries in the directory
/proc/PID/fd.  Of course, it should call getpid() to get its pid
(actually, even better is to use /proc/self/fd, but we can leave that
improvement for when
4790606 (process) Native forkAndExec code should be unified (sol, lnx)
is addressed.)
but instead it calls getppid (get PARENT pid).  Many engineer eyeballs have
read that code and their brains had filtered out that extra 'p', since it
doesn't make sense to enumerate the PARENT's fds.  

Of course, at the exact moment of the fork(), the parent and child have
identical sets of file descriptors, and so /proc/getpid()/fd and
/proc/getppid()/fd have identical contents.  But both processes then
execute independently.  In particular, the parent dutifully closes file
descriptors it doesn't need, and these closes cause FAILURES to close
in the child.  Of course, all of these are highly dependent on timing
and the availability of CPUs to run the two processes concurrently.

See bug 
4850368 (process) getInputStream() attaches to forked background processes (Linux)
for an example of a good way to tickle this bug on Uniprocessor Linux.

The test case reported in this bug is better for reproducing this bug
on Solaris.

 xxxxx@xxxxx  2003-04-22
Comments
  
  Include a link with my name & email   

Submitted On 11-AUG-2003
roferreira
I had this experience into Linux  RedHat 8.0 when running a 
jsp file:
org.apache.jasper.JasperException: Unable to compile class 
for JSP An error occurred at line: 7 in the jsp 
file: /NotasPorQuartoAnoForm.jsp Generated servlet error: 
[javac] Since fork is true, ignoring compiler setting. [javac] 
Compiling 1 source file [javac] Since fork is true, ignoring 
compiler setting. 
[javac] /home/tomcat/work/Standalone/localhost/homolog/Not
asPorQuartoAnoForm_jsp.java:54: cannot resolve symbol 
[javac] symbol : class Status [javac] location: class 
org.apache.jsp.NotasPorQuartoAnoForm_jsp [javac] Status 
status = new Status(); [javac] ^ An error occurred at line: 7 
in the jsp file: /NotasPorQuartoAnoForm.jsp Generated servlet 
error: 
[javac] /home/tomcat/work/Standalone/localhost/homolog/Not
asPorQuartoAnoForm_jsp.java:54: cannot resolve symbol 
[javac] symbol : class Status [javac] location: class 
org.apache.jsp.NotasPorQuartoAnoForm_jsp [javac] Status 
status = new Status(); [javac] ^ [javac] 2 errors 






PLEASE NOTE: JDK6 is formerly known as Project Mustang