|
Quick Lists
|
|
Bug ID:
|
4165985
|
|
Votes
|
0
|
|
Synopsis
|
javadoc tool: Use BreakIterator to determine end of first sentence
|
|
Category
|
doclet:tbd
|
|
Reported Against
|
1.2.2
, 1.2beta4
|
|
Release Fixed
|
1.4(merlin-beta2)
|
|
State
|
10-Fix Delivered,
Verified,
request for enhancement
|
|
Priority:
|
5-Very Low
|
|
Related Bugs
|
4172961
,
4261388
,
4890489
,
4959985
|
|
Submit Date
|
12-AUG-1998
|
|
Description
|
This RFE has been implemented. Javadoc now has two modes for computing the end of the first English sentence. The default is the old behavior but it generates a new warning when the new behavior would be different. The new behavior uses BreakIterator and is enabled by a new command-line flag called -breakiterator.
The main differences are (1) we would now accept a sentence ending in a question mark (which some people find useful in the synopsis of a boolean-returning method), and (2) We would NOT accept a period followed by a lower case letter as ending a sentence, which would allow you to use abbreviations in the first sentence.
It's not major, but people have complained. Fixing this is ultimately a code simplification because English would be treated the same as other languages.
We would hope to make the new mode the default in the next (Tiger) version of javadoc.
xxxxx@xxxxx 2001-07-17Just prior to 1.2 Beta4 we tried using BreakIterator for the first
sentence breaks, and it had too many serious bugs for us to use,
so javadoc now special cases English and uses our old 1.1 algorithm
that looks for a period (.) followed by white space.
Once the BreakIterator bugs are fixed, we should consider returning
to using BreakIterator for English.
The BreakIterator bugs are described in:
4140384 design bug: ambiguous "first sentence" rule
4158381 sentence BreakIterator stops too soon (submitted by Bill Shannon)
4113835 Some of BreakIterator's rules are not correct in JDK1.1.6G.
|
|
Work Around
|
N/A
|
|
Evaluation
|
The JLS first edition, section 18.3 says:
The first sentence of each documentation comment should be
a summary sentence. This sentence ends at the first period
that is followed by a blank, tab, or line terminator.
It doesn't care if the next letter is upper or lowercase.
(We infer that this rule was intended to apply only to languages
for which period is a sentence terminator.)
This is demonstrated in the following processMouseEvent method in
javax.swing.MenuElement, where the above rule would interpret the
first sentence to be "Process a mouse event":
Process a mouse event. event is a MouseEvent with source being the
receiving element's component. path is the path of the receiving
element in the menu hierarchy including the receiving element itself.
manager is the MenuSelectionManager for the menu hierarchy.
(In my opinion, starting the second and third sentences with lowercase
words is poorly-constructed (but understandable) English.
They should be rewritten so as not to begin with lowercase letters.
But that aside...)
The engineer for BreakIterator is Rich Gillam ( xxxxx@xxxxx ).
Atul and I tested Rich Gillam's fixes to bugs in 4158381, and
they are fixed. However, we discovered the BreakIterator follows
this rule (which differs from the above rule):
If a period is followed by white space and then a lowercase letter
(or digit), it is not considered the end of a sentence.
See "Comments" for the exact rules that BreakIterator uses.
This rule would interpret the entire processMouseEvent paragraph
shown above to be treated as one sentence.
For this reason, we are not using BreakIterator for English,
while we are for all other languages.
Does it make sense to keep it this way? Is there an upcoming change
to BreakIterator to allow it to work with Javadoc in English?
xxxxx@xxxxx 1998-08-12
Neal, I'm just passing this bug on to you, for you to be aware of.
You can close it out if you feel nothing should be done, or
we could talk to the java.text people if we want to do more
research into it.
xxxxx@xxxxx 2001-03-01
This should, indeed, be fixed. To help people migrate their doc comments
to the new definition, I would have javadoc emit a warning when the new
interpretation of the first sentence differs from the old interpretation
of the first sentence.
xxxxx@xxxxx 2001-03-06
This RFE has been implemented. Location of implementation:
src/share/javac/com/sun/tools/javadoc/DocEnv.java
src/share/javac/com/sun/tools/javadoc/DocLocale.java
src/share/javac/com/sun/tools/javadoc/JavadocTool.java
src/share/javac/com/sun/tools/javadoc/Start.java
src/share/javac/com/sun/tools/javadoc/resources/javadoc.properties
xxxxx@xxxxx 2001-07-17
|
|
Comments
|
Submitted On 16-JAN-2002
sc0302
When I execute javadoc without the -breakiterator option I
get the following warning:
./tmpPcpSrc/com/aepona/pcp/parlayservices/fw/integrity/heartbeat/reqevents/HeartBeatSessionReqEvent.java:40:
warning - The first sentence is interpreted to be:
"<B>For internal use only.</B>"
This sentence is different from what would be generated
using -breakiterator:
"<B>For internal use only."
I got a lot of warnings when I ommited this option.
I then switched to using the -breakiterator option. This
cleared up all of the warnings, however, the text describing
what each method does was all in bold text. When writing
the Javadoc comments, we had only specified that we wanted
the text reading 'For internal use only' to be in bold text.
Submitted On 19-MAR-2002
dougfelt
The intent is good, but this implementation is clearly
going too far. It is quite reasonable to insert html tags,
such as P or BR, after the end of a sentence and before the
next sentence. The new code fails either to ignore these
tag or to recognize them as signaling the end of the
sentence (don't know if BR qualifies or not, P certainly
should). Probably it should use a combination of the two.
BreakIterator certainly wasn't designed to work on marked-
up text like this, so you can't rely on it exclusively
without additional processing (I haven't looked at the
code, though).
Additionally, two more points. The warning output is in
ambiguous order when run on large batches of file--
sometimes the text is emitted before the diagnostic line,
and sometimes an unrelated diagnostic line gets in between
the original and the breakiterator versions of the
sentence. Second, the output swamps 'real' errors. If the
intent is to keep the current behavior for the time being,
at least provide a mechanism to suppress these warnings.
They're a pain to wade through while looking for real
errors.
PLEASE NOTE: JDK6 is formerly known as Project Mustang
|
|
|
 |