Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 4165985
Votes 0
Synopsis javadoc tool: Use BreakIterator to determine end of first sentence
Category doclet:tbd
Reported Against 1.2.2 , 1.2beta4
Release Fixed 1.4(merlin-beta2)
State 10-Fix Delivered, Verified, request for enhancement
Priority: 5-Very Low
Related Bugs 4172961 , 4261388 , 4890489 , 4959985
Submit Date 12-AUG-1998
Description
This RFE has been implemented.  Javadoc now has two modes for computing the end of the first English sentence.  The default is the old behavior but it generates a new warning when the new behavior would be different.  The new behavior uses BreakIterator and is enabled by a new command-line flag called -breakiterator.

The main differences are (1) we would now accept a sentence ending in a question mark (which some people find useful in the synopsis of a boolean-returning method), and (2) We would NOT accept a period followed by a lower case letter as ending a sentence, which would allow you to use abbreviations in the first sentence.

It's not major, but people have complained. Fixing this is ultimately a code simplification because English would be treated the same as other languages.

We would hope to make the new mode the default in the next (Tiger) version of javadoc.

 xxxxx@xxxxx  2001-07-17Just prior to 1.2 Beta4 we tried using BreakIterator for the first
sentence breaks, and it had too many serious bugs for us to use,
so javadoc now special cases English and uses our old 1.1 algorithm 
that looks for a period (.) followed by white space.

Once the BreakIterator bugs are fixed, we should consider returning
to using BreakIterator for English.

The BreakIterator bugs are described in:

4140384 design bug: ambiguous "first sentence" rule
4158381 sentence BreakIterator stops too soon (submitted by Bill Shannon)
4113835 Some of BreakIterator's rules are not correct in JDK1.1.6G.
Work Around
N/A
Evaluation
The JLS first edition, section 18.3 says:

  The first sentence of each documentation comment should be 
  a summary sentence.  This sentence ends at the first period 
  that is followed by a blank, tab, or line terminator.

It doesn't care if the next letter is upper or lowercase.

(We infer that this rule was intended to apply only to languages 
 for which period is a sentence terminator.)

This is demonstrated in the following processMouseEvent method in
javax.swing.MenuElement, where the above rule would interpret the 
first sentence to be "Process a mouse event":

   Process a mouse event. event is a MouseEvent with source being the 
   receiving element's component. path is the path of the receiving 
   element in the menu hierarchy including the receiving element itself. 
   manager is the MenuSelectionManager for the menu hierarchy. 

(In my opinion, starting the second and third sentences with lowercase
words is poorly-constructed (but understandable) English.  
They should be rewritten so as not to begin with lowercase letters.  
But that aside...)

The engineer for BreakIterator is Rich Gillam ( xxxxx@xxxxx ).
Atul and I tested Rich Gillam's fixes to bugs in 4158381, and
they are fixed.  However, we discovered the BreakIterator follows 
this rule (which differs from the above rule):

   If a period is followed by white space and then a lowercase letter
   (or digit), it is not considered the end of a sentence.

See "Comments" for the exact rules that BreakIterator uses.

This rule would interpret the entire processMouseEvent paragraph 
shown above to be treated as one sentence.

For this reason, we are not using BreakIterator for English,
while we are for all other languages.  

Does it make sense to keep it this way?  Is there an upcoming change
to BreakIterator to allow it to work with Javadoc in English?

 xxxxx@xxxxx  1998-08-12

Neal, I'm just passing this bug on to you, for you to be aware of.
You can close it out if you feel nothing should be done, or
we could talk to the java.text people if we want to do more
research into it.

 xxxxx@xxxxx  2001-03-01

This should, indeed, be fixed. To help people migrate their doc comments
to the new definition, I would have javadoc emit a warning when the new
interpretation of the first sentence differs from the old interpretation
of the first sentence.

 xxxxx@xxxxx  2001-03-06

This RFE has been implemented.  Location of implementation:

src/share/javac/com/sun/tools/javadoc/DocEnv.java
src/share/javac/com/sun/tools/javadoc/DocLocale.java
src/share/javac/com/sun/tools/javadoc/JavadocTool.java
src/share/javac/com/sun/tools/javadoc/Start.java
src/share/javac/com/sun/tools/javadoc/resources/javadoc.properties

 xxxxx@xxxxx  2001-07-17
Comments
  
  Include a link with my name & email   

Submitted On 16-JAN-2002
sc0302
When I execute javadoc without the -breakiterator option I
get the following warning:

./tmpPcpSrc/com/aepona/pcp/parlayservices/fw/integrity/heartbeat/reqevents/HeartBeatSessionReqEvent.java:40:
warning - The first sentence is interpreted to be:
"<B>For internal use only.</B>"
This sentence is different from what would be generated
using -breakiterator:
"<B>For internal use only."

I got a lot of warnings when I ommited this option. 

I then switched to using the -breakiterator option.  This
cleared up all of the warnings, however, the text describing
what each method does was all in bold text.  When writing
the Javadoc comments, we had only specified that we wanted
the text reading 'For internal use only' to be in bold text.


Submitted On 19-MAR-2002
dougfelt
The intent is good, but this implementation is clearly 
going too far.  It is quite reasonable to insert html tags, 
such as P or BR, after the end of a sentence and before the 
next sentence.  The new code fails either to ignore these 
tag or to recognize them as signaling the end of the 
sentence (don't know if BR qualifies or not, P certainly 
should).  Probably it should use a combination of the two.  
BreakIterator certainly wasn't designed to work on marked-
up text like this, so you can't rely on it exclusively 
without additional processing (I haven't looked at the 
code, though).

Additionally, two more points.  The warning output is in 
ambiguous order when run on large batches of file-- 
sometimes the text is emitted before the diagnostic line, 
and sometimes an unrelated diagnostic line gets in between 
the original and the breakiterator versions of the 
sentence.  Second, the output swamps 'real' errors.  If the 
intent is to keep the current behavior for the time being, 
at least provide a mechanism to suppress these warnings.  
They're a pain to wade through while looking for real 
errors.



PLEASE NOTE: JDK6 is formerly known as Project Mustang