25832 – Hebrew spellchecker (non-MySpell based)

Issue 25832 - Hebrew spellchecker (non-MySpell based)

Summary: Hebrew spellchecker (non-MySpell based)

Status:	CLOSED OBSOLETE

Alias:	None

Product:	Internationalization
Classification:	Code
Component:	code (show other issues)
Version:	OOo 1.1 RC5
Hardware:	All All

Importance:	P3 Trivial with 1 vote (vote)
Target Milestone:	---
Assignee:	AOO issues mailing list
QA Contact:

URL:
Keywords:

Depends on:	99796
Blocks:
	Show dependency tree

Reported:	2004-02-24 12:45 UTC by alan
Modified:	2019-10-06 16:57 UTC (History)
CC List:	16 users (show)

See Also:
Issue Type:	TASK
Latest Confirmation in:	---
Developer Difficulty:	---

Attachments
Code and dictionary files for Hspell Hebrew spellchecker (170.70 KB, application/x-compressed) 2004-02-24 12:46 UTC, alan	no flags	Details
Changes to existing OOo files for making hspell part of the OOo build (1.27 KB, text/plain) 2004-02-24 12:47 UTC, alan	no flags	Details
New breakiterator class for Hebrew, new file breakiterator/data/dict_word_he_IL.h file (1.86 KB, application/x-gzip) 2004-03-25 10:10 UTC, alan	no flags	Details
hspell files, without code licensed to Mike Martinec (100.53 KB, application/octet-stream) 2004-05-06 13:49 UTC, alan	no flags	Details
differences between first version of hspell and most recent one (1.58 KB, patch) 2004-05-06 13:51 UTC, alan	no flags	Details \| Diff
files for dictionaries/he_IL (113.51 KB, application/x-compressed) 2004-05-23 12:34 UTC, alan	no flags	Details
Patch for scp and config_office to enable/disable Hebrew spellchecker (2.46 KB, patch) 2004-05-23 12:36 UTC, alan	no flags	Details \| Diff
Changes to hspell files (endian code, ulingu) (2.11 KB, patch) 2004-05-23 12:37 UTC, alan	no flags	Details \| Diff
Hebrew spellchecker DLL for Windows (45.02 KB, application/x-compressed) 2004-06-08 11:28 UTC, alan	no flags	Details
Source, DLL, and pdb for Daniel Boelzle (159.42 KB, application/x-compressed) 2004-06-11 10:20 UTC, alan	no flags	Details
Hspell install package for non-Hebrew OOo on Linux (196.94 KB, application/x-gzip) 2004-07-12 14:16 UTC, alan	no flags	Details
script and patches to build hspell/OOo interface (85.46 KB, application/x-gzip) 2004-07-12 14:18 UTC, alan	no flags	Details
Build package for hspell, using new set_dictionary_path hspell function (89.74 KB, application/x-gzip) 2004-07-14 10:15 UTC, alan	no flags	Details
Hspell binary install package with patched libhspell.a (197.10 KB, application/x-gzip) 2004-07-15 14:26 UTC, alan	no flags	Details
scp and config_office patches to 1.1.2 for hebrew spellchecker (2.10 KB, patch) 2004-08-18 14:19 UTC, alan	no flags	Details \| Diff
Screenshot showing some random Hebrew news and hspell extension in use (30.94 KB, image/png) 2008-01-25 08:21 UTC, hatapitk	no flags	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description alan 2004-02-24 12:45:18 UTC

Attached are files containing code for the hspell Hebrew spellchecker, as well
as dictionary files, and changes in OOo build files which incorporate hspell.

Comment 1 alan 2004-02-24 12:46:08 UTC

Created attachment 13395 [details]
Code and dictionary files for Hspell Hebrew spellchecker

Comment 2 alan 2004-02-24 12:47:40 UTC

Created attachment 13396 [details]
Changes to existing OOo files for making hspell part of the OOo build

Comment 3 alan 2004-03-25 10:10:36 UTC

Created attachment 14060 [details]
New breakiterator class for Hebrew, new file breakiterator/data/dict_word_he_IL.h file

Comment 4 khendricks 2004-03-26 20:21:48 UTC

Hi,

Adding myself as CC to this issue since I am the lingucomponent lead where most of these changes go.

We exchanged-emails  about integrating this awhile back.  I never heard back if you were willing to join 
the lingucomponent project to support the code you are contributing?  The same thing applies to 
handling qa and related issues?

You should probably split out the breakiterator patches from the lingucomponent patches and 
dictionaries patches so that the breakiterator patches get looked at and evaluated more quickly.

Kevin

Comment 5 alan 2004-03-29 15:10:48 UTC

OK, I've joined the lingucomponent project, and I'll handle support and qa for
hspell. I've also split off the breakiterator code, and submitted it in issue
#27076.

Alan

Comment 6 khendricks 2004-04-04 20:02:00 UTC

Hi, 
 
Okay, I am going to integrate all of your patches into the next cws 680 tree that comes after 
cws_src680_ooo20040329 which just closed. 
 
You have the signed the JCA right? 
 
So this will all be part of OOo 2.0. 
 
Please ask mh@openoffice.org (Martin) if your changes are okay to go into the 1.1.X series 
(perhaps 1.1.3?). 
 
I will let you know which CVS tag to look at and verify once a new 680 cws opens. 
 
Thanks! 
 
Kevin

Comment 7 Martin Hollmichel 2004-04-05 15:58:24 UTC

set target to 1.1.2

Comment 8 khendricks 2004-04-05 16:36:04 UTC

Hi,

Since this is now targeted to OOo 1.1.2, I took a look at the code itself.

1. Your endian detection code will not work for all platforms. Perhaps including one of the endian
headers might be better than simply identifying platforms here.

For example: PPC Linux is Big Endian but does not define MACOSX
and I believe Solaris x86 is Little Endian but does define SOLARIS

Perhaps you should look at the endian detection code used in the old thesaurus
code instead of this approach.

// Check for bigendian representation of two-byte chars
#if defined(SOLARIS) || defined(MACOSX)
low_byte_position = 1;;
#else
low_byte_position = 0;
#endif

2. You named the sprophelp.cxx the same as the myspell names which will mean they will collide in the
obj directory of the output tree with those form myspell. I will change these to hprophelp* to prevent
conflicts where the object files are stored.

3. Your makefile.mk links with ulingu but does not need to in any way? That seemed to be a holdover
from the myspell makefile.mk? Can I remove that?

4. I do not see a patch to config_office to enable or disable the hebrew environment variable used
for the test (we should probably disable the complete component build based on that value and not
just the dictionary itself).

Please look in config_office at how the other dictionaries are enabled and disabled as examples and
please post a patch that uses that value in lingucomponent in hspell, in dictionaries, and in scp
to disable the component if not enabled.

Please set it up to enable things by default and only disable if not sepcifically listed in the
config_options (just like the other languages).

5. Have you tried building the hspell pieces on a Windows platform to make sure it will build? If not, at
least try doing the build with -ansi -pedantic -Wall to make sure it has no obvious problems.
Building it with debug_malloc might also be useful to detect any memory leakage.

Thanks,

Kevin

Comment 9 ooo 2004-04-06 12:48:26 UTC

I took a look at the i18npool breakiterator patches, looks good so far, just
that I wouldn't name them "he_IL" but "he" instead, since they're not region
dependent (I suppose). This should be corrected before applied. Of course I
can't say anything about the quality of the breakiterator rule itself, simply
because I can't judge it as I don't speak Hebrew and just know how to write my
name ;-)

@Ayaniger: please note that this implementation is only valid for OOo1.1.x, for
OOo2.0 we took a different approach of precompiled rules and you'd need to adapt
to it, please see the current SRC680 code base. For OOo2.0 please create a
separate issue and attach your patch there.

Btw: to the issue 27076 you mentioned there aren't any files attached.

As for the endian headers Kevin mentioned, please use osl/endian.h

Thanks
Eike

Comment 10 ooo 2004-04-21 11:26:39 UTC

Citing the latest comment Alan gave in issue 27076:

"My real name is Alan Yaniger, I work for Tk Open Systems, and we've signed a
corporate JCA several times."

So we're good to go with his contribution.

I'll take the breakiterator attached to this issue and submit it to the
ooo112fix1 CWS, so further development of the spellchecker will have the right
basis.

As OOo1.1.2 is close to closing, we might need to retarget this issue to OOo1.1.3

Comment 11 khendricks 2004-04-21 16:27:49 UTC

Hi,

As mentioned before, I have commited most of this to cws ooo112fix1.  I took it upon myself to  fix the 
endian issues, the build issues I pointed out earlier.  

But I still need the author to properly create config_office patches to enable and disable the build/
dictionaries (just like the other languages have)  and scp patches to enable/disable the inclusion of the 
library.

Without these changes the dictionary will never get built or delivered to the solver.

So Alan, please post new patches for scp and config_office then use switches to enable and disable the 
build and the delivery of hspell to the solver.

Also, please checkout and build ooo112fix1 to verify correct operation.

Kevin

Comment 12 khendricks 2004-04-26 16:51:16 UTC

Hi, 
 
Still waiting for scp, and config office patches from author to conditionally enable hspell 
component and dictionaries. 
 
So retargeting to OOo 1.1.3. 
 
Kevin

Comment 13 Martin Hollmichel 2004-04-28 13:11:22 UTC

I found the following copyrights in the hspell sources:

Copyright (C) 2003 Nadav Har'El and Dan Kenigsberg

Mark Martinec <mark.martinec@ijs.si>, April 1999.
Copyright 1999, Mark Martinec. All rights reserved

ayaninger: can we please add Nadav Har'El and Dan Kenigsberg email address to
the copyright notice so that I'm able to verify that they're covered by your
signed JCA. 
if their copyright is not covered by your JCA (if looking at
http://www.ivrix.org.il/projects/spell-checker/download.html I have exactly this
 impression) we have to remove this code.

I don't see how the copyright of Mark Martinec is covered by your JCA, can you
please explain.

mh->khendricks: I fear we have to revert these changes for 1.1.2 until the
copyright and license status is clarified.

Comment 14 khendricks 2004-04-28 13:26:12 UTC

i Martin,

Yes it looks like pure GPL with no special exclusions and things.

The author of this issue has not once responded to any of the postings I made to this issue so far.

Perhaps someone should e-mail him directly?

Do you want me to cvs remove those pieces now?  Noth lingucomponent/source/spellcheck/hspell/ 
and dictionaries/he_IL would need to be removed since the license explicitly covers both hspell and its 
dictionaries (based on the webpage you cited).

Please let me know what you want.

Kevin

Comment 15 alan 2004-05-06 13:46:22 UTC

Kevin and Martin, 
	I would like to check out ooo112fix1, but I'm having trouble doing it.  (This
is my first time checking out OOo code from CVS.) Kevin, could you give me
instructions about how to get it? (I tried 
cvs -d :pserver:anoncvs@anoncvs.services.openoffice.org:/cvs checkout
lingucomponent
but this didn't contain hspell code. Which version is this?)
	Regarding licensing issues: I've changed the hspell files so that code licensed
to Mike Martinec is no longer used. I've attahced a zip file with all files in
the hspell directory, as well as a patch file which shows the differences
between the new hspell files and the old ones.
	As Dina of TKOS has informed you, she has been in contact with the authors of
hspell about hspell's GPL license.
	I'll deal with the scp and config_office changes presently. 
	Regarding the other issues Kevin has raised (sprophelp, endian checking,
ulingu), once I check out the 000112fix1 code, I'll see what hasn't already been
done.
	Sorry for the delay. 
Alan

Comment 16 alan 2004-05-06 13:49:17 UTC

Created attachment 15019 [details]
hspell files, without code licensed to Mike Martinec

Comment 17 alan 2004-05-06 13:51:43 UTC

Created attachment 15020 [details]
differences between first version of hspell and most recent one

Comment 18 alan 2004-05-06 13:59:40 UTC

Kevin,
	We have built hspell under Windows, though we didn't build it with
debug_malloc.
Alan

Comment 19 khendricks 2004-05-06 14:02:56 UTC

Hi,

Don't bother checking out anything from CVS.  All of hspell that was committed to 
cws_srx645_ooo112fix1 was cvs removed from the repository pending clarification of the license 
issues.  We can't have full GPL pieces in the tree.

So there is nothing there for you to check out.

Once a new tree opens and Martin gives the go ahead we can recommit things.

Once that happens you would use the following sequence of commands:

export CVSROOT=:pserver:anoncvs@anoncvs.services.openoffice.org:/cvs
cvs login   # password is anoncvs

cvs co -r THETAG lingucomponent
cvs co -r THETAG dictionaries
cvs co -r THETAG config_office
cvs co -r THETAG scp

where you replace THETAG with the specific tag used for the child work space hspell gets committed to.

For example, up until it was removed, THETAG would have been cws_srx645_ooo112fix1

Hope this explains things ...

Kevin

Comment 20 rene 2004-05-07 12:57:42 UTC

Hi,

I have two questions :-)

a) why 0.6 and not 0.7? Why not the "normal" way with hspell.tar.gz or so in 
download/apply patches etc and link then with it as we do for the other
external stuff? This would probably making a patch for using system hspell
(static...) easier since it then is easier to distinguish...
Or probably I don't grok the diff between the stuff you attached and Debian's
hspell which is hspell 0.7....

b) why using uncompressed dictionary files? If this is in OOo I would like to
use the files from system hspell (why duplicate them) but they are in 0.6 and
0.7 gziop-compressed (*.wgz*)

Regards,

René

Comment 21 alan 2004-05-09 10:55:47 UTC

Hi Rene,

        a) When we integrated hspell into OOo, 0.6 was the latest
version, and 0.7 did not yet exist. Since hspell development is still in
progress, and the authors have begun work on an API to hspell but have
not yet finished it, we don't think it worth the time to update the OOo
integration with each new version of hspell that comes out.
        I've discussed with Kevin Hendricks whether to locate hspell under
lingucomponent. To me it seemed the natural place to put it since it's
based on the code in lingucomponent/source/spellcheck/spell. Kevin, in
emails to dev-lingucomponent on Oct. 9 and Oct. 15, wrote that if there
are no licensing problems, it could be integrated there. We're working on
the licensing issues right now.
        b) I originally decided to use uncompressed files in order to use
hspell in the Windows build. The original hspell code executed a shell
which ran gzip in order to uncompress them. This could not be assumed to
work under Windows, unless the user happened to have cygwin installed, and
a path pointing to it. Being a newbie to OOo at the time, I didn't look
into the compression library that exists in OOo. In the future, we plan to
use zlib to open compressed dictionay files.

Best,
Alan

Comment 22 alan 2004-05-23 12:32:52 UTC

Still waiting for an answer from the authors of hspell about licensing. In the
meantime, I'm posting the changes to made if the licensing issue works out.
Attached are :

1. the contents of the new directory "dictionaries/he_IL"
2. a patch for scp and config_office (diffed with ooo12fix),  for enabling and
disabling the Hebrew spellchecker
3. a patch for changing hspell's endian code and removing ulingu from the
makefile

Alan

Comment 23 alan 2004-05-23 12:34:39 UTC

Created attachment 15401 [details]
files for dictionaries/he_IL

Comment 24 alan 2004-05-23 12:36:22 UTC

Created attachment 15402 [details]
Patch for scp and config_office to enable/disable Hebrew spellchecker

Comment 25 alan 2004-05-23 12:37:39 UTC

Created attachment 15403 [details]
Changes to hspell files (endian code, ulingu)

Comment 26 alan 2004-06-08 11:25:31 UTC

hspell DLL for Windows is attached, for Daniel Boezle to check out

Comment 27 alan 2004-06-08 11:28:13 UTC

Created attachment 15732 [details]
Hebrew spellchecker DLL for Windows

Comment 28 alan 2004-06-11 10:20:16 UTC

Created attachment 15832 [details]
Source, DLL, and pdb for Daniel Boelzle

Comment 29 Daniel Boelzle [:dbo] 2004-06-11 13:25:10 UTC

@Alan: Your tools import library on Windows seems to be broken/not matching with
the actual tl645mi.dll. The symbols resolved (ordinals) do not match the ones
the compiler expected when linking against tl645mi.dll.  Because tools lib is no
SDK library, please linkl against the exact version of your OOo, e.g. m40 etc.
Please fix this first, then go on to the Tools/Languages problem.

Comment 30 nyh 2004-06-16 21:01:01 UTC

Hi. I'm Nadav Har'El, one of the Hspell authors, and I logged into this bug
tracking system so you can ask me questions directly :)

Hspell 0.8, which is due out next week, will use Zlib, rather than pipes and
gzip,  so hopefully if you use it you should have no problems reading the
compressed dictionaries, even on Windows (assuming you're already using the zlib
library - it has a BSD license). I wholeheartedly recommend using the compressed
dictionaries, because they are 10 times smaller on disk. Release 0.8 also
continues to improve Hspell's Hebrew vocabulary.

Unfortunately, there is still no solution to the license incompatibility. Hspell
is, and has always been, GPL, and we have not yet changed our mind about that. I
suggested to Dina a few ways to circumvent this "problem", which basically comes
down to OpenOffice using a seperately installed Hspell (if available). I also
suggested referring to what Aspell did to integrate the Hspell dictionary into
their upcoming 0.6 release, and to what "Mits Petel" did with his "Hspell
Service" for the Mac. If anybody wants to discuss this issue with me further,
you can do so here or to my personal email.

Comment 31 rene 2004-06-23 16:09:02 UTC

Hi Nadav,

this is good news since Debians hspell (of course(?)) uses uncompressed
dictionaries and using the system hspell dictionaries from hspell integrated in
OOo or the package providing the hspell extension package to OOo is then much
easier -- if Alan decides to upgrade to hspell 0.8.

In this way it probably would make sense to make the OpenOffice.org additions to
hspell a bit more seeable (i.e. not modifying the original hspell files or use
#ifdef's or extract the hspell tar.gz and apply a clear patch), diffing the
hspell versions and the OOo hspell is a mess to grok (especially if you have no
clue how hspell works). This would enable me to easily use libhspell.a included
in Debians hspell package (no need to build hspell extra when we already have it
- and Debian unstable already has hspell 0.8 which is now released it seems ;) ).

Regards,

René

Comment 32 rene 2004-07-03 17:05:36 UTC

Hi,

better late than never, but....
ermm. I of course meant compressed....

Regards,

René

Comment 33 rene 2004-07-03 18:10:46 UTC

Hi,

*sigh*

... and this of course if the license situation becomes clear....

I think using the installed hspell somehow if installed would be a good
way if - as it seems now - direct hspell support over the library could not
be integrated into OOo..

Regards,

René

Comment 34 alan 2004-07-04 09:53:22 UTC

There is a now separately installable hspell package for OOo, available at
http://www.openoffice.org.il. This package uses the code posted to this issue,
which is based on hspell 0.6.

Alan

Comment 35 alan 2004-07-04 10:22:41 UTC

Here's a precise URL for downloading the hspell package:

http://www.openoffice.org.il/hspell.tar.gz

Alan

Comment 36 rene 2004-07-04 13:04:56 UTC

Hi,

in my opinion this is deeply suboptimal.

We in Debian already have a wish to integrate hspell
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=255451) which is then marked
forwarded to this issue :-).

I would do it if I could but I would have to patch the hspell stuff into our
package build (or use the SDK and the appopriate headers which isn't packaged
for Debian yet and therefore not allowed to use) and then patch the hspell
package maintainer scripts to generate the uncompressed dictionaries for
OpenOffice.org on its installation/upgrade.

Why can't you just use hspell 0.8 which solves the compressed/uncompressed
issue? This - assuming you do a sane split - would also help to just use the
already available libhspell.a. I don't see why using an outdated version
internal is a good way.... Oh, and I wonder whether that would be possible
anyhow because of the license incompatibility....

And why can't you try to go for hspell "inclusion" in official OOo
via just using it if installed as Nadav pointed out. Which would be better
in some ways since it then a) is included in the official OOo and hebrew people
would install/have installed hspell anyhow, wouldn't they? And b) this removes
the need to add some crude hack to the package build and generated an extra
package for the hspell stuff...

Or is this impossible?

BTW, are the hspell 0.8 and 0.6 dictionaries compatible? I would just gunzip
them from their hsepll location and save into the OOo location as a crude hack.

I would help, but I don't understand hspell internals (yet), so....

Regards,

René

Comment 37 nyh 2004-07-04 20:16:06 UTC

Actually, yes - you can use the most recent Hspell 0.8 dictionaries instead of
the outdated ones in the package! We just tried it, and even created an RPM for
the entire pacakge, and it is working beautifully, and we'll release it to the
public momentarily (you'll see the announcements on the linux-il and
ivrix-discuss mailing lists). I was really excited seeing today, for the first
time, beautiful Hebrew+English spellchecking working on my own OpenOffice (with
English interface, of course) at home. Beautiful! (Dan told me I'm
over-reacting, but I was really excited; Thanks to Alan and Dan for their work!).

Moreover, once Alan releases the sources of his code (because his new package is
GPL), I can very easily help him add the code to read the compressed dictionary,
using zlib. This will not only allow the package's installation to be much
smaller - it will also allow the package not to contain the dictionaries at all,
and use the dictionaries from a system-installed copy of Hspell (maybe this will
even let you return the code back to being LGPL and not GPL, but I am not a
lawyer). Unfortunately this will spell trouble (no pun intended) if we change
the format of the dictionaries, so this part is not essential.

Comment 38 alan 2004-07-08 17:23:17 UTC

Rene,

I've written a working hspell module for non-Hebrew OOo that uses the hspell 0.8
library, and the new dictionaries, compressed. We'll be posting the module and
the sources in the coming days.

Regarding including hspell in the official OOo: licensing issues are preventing
that.

Alan

Comment 39 alan 2004-07-11 16:08:06 UTC

The updated hspell module for Linux, using libhspell.a of hspell version 0.8 ,
is now available.

The installation package is available at:
http://www.openoffice.org.il/hspell.tar.gz

The sources are available at:
http://www.openoffice.org.il/hspell_src.tar.gz

Alan

Comment 40 rene 2004-07-11 16:18:43 UTC

there is a permission problem. I get 403 Forbidden.

Comment 41 alan 2004-07-12 14:14:36 UTC

Hi Rene,

I'll post the code here so we won't have to wait until the sysadmin problem gets
worked out. 
You mentioned that if you already have a Debian hspell package installed that
you would not want to duplicate the dictionaries when installing the
hspell-interface-to-OOo package. On the other hand, I don't want to leave these
dictionaries out of the install package on our OOo Hebrew website, because some
people who download it may not have hspell installed on their system.  They
can't rely on dictionaries that they don't have. What you might do for a Debian
hspell/OOo package is to change my install script. Rather than copy the
dictionaries to OOoDir/share/dict/ooo as I do, you would create links there that
point to the dictionaries already installed.  This would avoid duplicating the
dictionaries.

Alan

Comment 42 alan 2004-07-12 14:16:41 UTC

Created attachment 16414 [details]
Hspell install package for non-Hebrew OOo on Linux

Comment 43 alan 2004-07-12 14:18:56 UTC

Created attachment 16415 [details]
script and patches to build hspell/OOo interface

Comment 44 rene 2004-07-12 14:54:40 UTC

Hi Alan,

yes. this was intended from the beginning. I never said *you* have to strip your
stuff away but I said that I want to use already instaalled dictionaries...

and as in the source (which I have to use since we have to build from source and
moreover support many other architectures besides i386) is no install script I
do this anyway. Even if there were I would change it...

Regards,

René

Comment 45 nyh 2004-07-12 15:25:09 UTC

Thanks Alan.

I just have one question. What is the "libhspell.a" that is in the source
package? Is this a straight compilation of Hspell 0.8, or did you modify that
distribution? If you did, can you please publish these changes? I may make some
of these changes to the general distributions, so that you won't have to repeat
them on Hspell 0.9.

Comment 46 rene 2004-07-12 15:31:31 UTC

Hi Alan,

your sspellimp.cxx patch does not apply here on 1.1.2 (the makefile.mk one, too
but that is because the makefile.mk is changed from use for use with system
myspell.diff).

OK, hunk 1 probably can be ignored (deleted), but what is with hunk 5?

Log:

tar xfvz debian/hspell_src.tar.gz
INSTALL
lingucomponent/source/spellcheck/hspell/
lingucomponent/source/spellcheck/hspell/build_hspell
lingucomponent/source/spellcheck/hspell/hspell.h
lingucomponent/source/spellcheck/hspell/makefile.mk.diff
lingucomponent/source/spellcheck/hspell/sreg.cxx.diff
lingucomponent/source/spellcheck/hspell/sspellimp.cxx.diff
lingucomponent/source/spellcheck/hspell/sspellimp.hxx.diff
lingucomponent/source/spellcheck/hspell/gzbuffered.h
lingucomponent/source/spellcheck/hspell/libhspell.a
lingucomponent/source/spellcheck/hspell/libz.a
# remove libz.a, libhspell.a and hspell.h since we want to build
# against the system ones
rm -f lingucomponent/source/spellcheck/hspell/libhspell.a
rm -f lingucomponent/source/spellcheck/hspell/libz.a
rm -f lingucomponent/source/spellcheck/hspell/hspell.h
cd lingucomponent/source/spellcheck/hspell && sh ./build_hspell
patching file makefile.mk
Hunk #2 FAILED at 63.
Hunk #3 FAILED at 108.
Hunk #4 FAILED at 140.
Hunk #5 succeeded at 169 (offset 8 lines).
3 out of 5 hunks FAILED -- saving rejects to file makefile.mk.rej
patching file sspellimp.cxx
Hunk #1 FAILED at 1.
Hunk #5 FAILED at 137.
Hunk #6 succeeded at 445 (offset 5 lines).
Hunk #7 succeeded at 528 (offset 5 lines).
Hunk #8 succeeded at 536 (offset 5 lines).
Hunk #9 succeeded at 636 (offset 5 lines).
Hunk #10 succeeded at 651 (offset 5 lines).
Hunk #11 succeeded at 712 (offset 5 lines).
Hunk #12 succeeded at 726 (offset 5 lines).
Hunk #13 succeeded at 735 (offset 5 lines).
Hunk #14 succeeded at 746 (offset 5 lines).
Hunk #15 succeeded at 771 (offset 5 lines).
Hunk #16 succeeded at 791 (offset 5 lines).
2 out of 16 hunks FAILED -- saving rejects to file sspellimp.cxx.rej
patching file sspellimp.hxx
patching file sreg.cxx
dmake:  makefile.mk:  line 190:  Error -- Unmatched .END[IF]
---* RULES.MK *---

Regards,

René

Comment 47 danken 2004-07-12 16:06:05 UTC

Alan,

It seems to me that the current binary looks for the dictionary files in 
/usr/local/share/hspell/ and not where the installer puts them
(/usr/lib/openoffice/share/dict/ooo). It seems that this wrong path is
hard-coded into the binary (which is a known issue of hspell
http://ivrix.org.il/bugzilla/show_bug.cgi?id=9 ).

I believe that it should be delt with rather thoroughly, as in compile time you
cannot tell where exactly will ooo will be installed.

Comment 48 rene 2004-07-12 17:02:31 UTC

Hi,

since I wasn't yet able to build it I did not catch this. Yeah, bad.
And distibution packages should *not* rely on *anything* on /usr/local so just
placing the links/files there is not possible...

Regards,

René

Comment 49 rene 2004-07-13 15:11:16 UTC

Hmm. When I think of it.. it isn't cardcoded in Alan's sources so it must be
hardcoded in libhspell.a in which case using Debian's hspell.h should be OK
and the dictionaries are searched in /usr/share/hspell....

Comment 50 nyh 2004-07-13 15:16:18 UTC

Yes, but libhspell.a is a static library, already compiled into Alan's binary,
so unless you find a way to recompile it, you can't make it use Debian's
libhspell.a.

Hopefully tonight or tomorrow night I'll add to Hspell the possibility to set
the dictionary directory on run-time (bug 9 in Ivrix.org.il/bugzilla), and Alan
will then be able to use that feature.

Comment 51 rene 2004-07-13 15:23:40 UTC

> Yes, but libhspell.a is a static library, already compiled into Alan's binary,
> so unless you find a way to recompile it, you can't make it use Debian's
> libhspell.a.

Sure I can. I kill the .a's and hspell.h off the *source* and build it
(which I have to do anyway since I have to build from source, binaries are not
allowed and we support architecures !i386, too)

Comment 52 nyh 2004-07-13 17:56:26 UTC

Rene, you have a good point there. If you intend to compile this OpenOffice
add-on from source, then indeed the solution you describe is probably the best one.

The solution which involves getting from me (in a few hours I'll have it ready)
a patch from Hspell that lets you define the dictionary path, is problematic in
the sense that this will force you to also have to compile a patched version of
Hspell. It will be convenient for Alan for creating his binary package, but
perhaps not for you for the Debian package.

Comment 53 nyh 2004-07-13 19:45:38 UTC

Strangely, I couldn't figure out how to add an attachment in this bug tracker
(is this a permissions thing?), so I put the patch in an attachment in Hspell's
bug tracker:

http://ivrix.org.il/bugzilla/show_bug.cgi?id=9

or directly
http://ivrix.org.il/bugzilla/attachment.cgi?id=7&action=view

This is a small patch that adds a new function
hspell_set_dictionary_path("/path/name/hebrew.wgz") function to be called before
hspell_init() for setting the path that hspell_init() tries. The patch also
patches the hspell(3) manual, so you can see the explanation of this function.

The patch is for Hspell 0.8. It also contains the one-character fix to
http://ivrix.org.il/bugzilla/show_bug.cgi?id=31

Alan, I hope this is what you wanted (if not, please tell me). Rene will
most-likely not need this at all, as she would use Debian's libhspell.a.

Comment 54 rene 2004-07-13 19:57:45 UTC

Hi,

he :-)

Regards,

René

Comment 55 nyh 2004-07-13 20:19:12 UTC

Oops, sorry...

Comment 56 alan 2004-07-14 10:13:37 UTC

Rene,

Here's a updated build package. It contains a call to Nadav's new function for
setting the dictionary path, so that hspell will not look for the dictionaries
in /usr/share/hspell. 

Alan

Comment 57 alan 2004-07-14 10:15:04 UTC

Created attachment 16451 [details]
Build package for hspell, using new set_dictionary_path hspell function

Comment 58 rene 2004-07-14 11:14:03 UTC

Hi Alan,

I think you did not understand - *I* have no problem with /usr/share/myspell.
Saves me some symlinks from /usr/share/myspell/dicts (which is where in Debian
the myspell dictionaries go; <ooo-dir>/share/dict/ooo is a symlink to there).

Besides I want to use the system hspell, I don't see the sense shipping it in
the source package (besides it is impossible since I need libhspell.a for
powerpc, s390 and sparc, too and in Debian you *have* to build from source).
And since this version does not have that patch (does not need to yet?) I have
no problem (besides the fact that the patch does not apply to 1.1.2, but I fixed
that now, currently testing).

Other persons may have use for that function, me not...

Regards,

René

Comment 59 danken 2004-07-14 14:35:40 UTC

Alan, would you please attach/send the binary library of your latest version, so
that I could check it out?

Comment 60 alan 2004-07-15 14:20:38 UTC

OK, Rene.

Comment 61 alan 2004-07-15 14:26:26 UTC

Created attachment 16483 [details]
Hspell binary install package with patched libhspell.a

Comment 62 danken 2004-07-15 15:16:30 UTC

Alan, Mooffie spotted a bug that seems to stem in the conversion of strings
between OO and hspell (See http://ivrix.org.il/bugzilla/show_bug.cgi?id=37).

If you start 
  LC_ALL=en_US.UTF-8 oowriter
and write a wrong hebrew word , you would receive wrong list of suggestions.
For נסיון, it gives a list of "וסיון", an empty word, and "ציון".

Comment 63 nyh 2004-07-15 20:55:09 UTC

Alan, I think I unfortunately found a serious bug in the new package, one that
didn't exist in the old one (based on the Hspell 0.6 code).

Whenever you have a word ending in a dot (like in the end of a sentence), the
dot gets joined into the word, and the whole word is considered wrong. Do add
insult to injury, hspell spews to standard error "Hspell: unknown letter ...."
(I should fix that - the hspell api should not print anything!). Strangely, this
only appears to be happening with a dot - words that are followed by a comma
don't exhibit this problem.

This problem did not exist in the previous version (which used Hspell 0.6 code).
I reproduced it in Fedora, with Dan's new ooo-hspell-0.8-2.i386.rpm), and with
LC_CTYPE=C (if that matters).

Comment 64 alan 2004-07-21 08:34:32 UTC

We've fixed the bugs that Dan and Nadav reported here in their comments of July
15. We'll be publishing the code soon.

Alan

Comment 65 nyh 2004-07-22 00:57:25 UTC

Alan, I'm curious how you solved the bug Dan (or Mooffie) reported, because as
strangely as it may seem, Hspell's hspell_trycorrect routine actually has a bug,
in that it uses snprintf which on glibc is locale-specific.
http://ivrix.org.il/bugzilla/show_bug.cgi?id=37
continues our evolution of the study of this bug. 

Kobi Zamir discovered an effective (but quite ugly) workaround for this bug - by
doing
+       {
+               char *lc_ctype;
+               
+               lc_ctype = g_strdup (setlocale (LC_CTYPE, NULL));
+               setlocale (LC_CTYPE, "he_IL.iso88598");
+               
+               hspell_trycorrect (hspell_common_dict, iso_word, &cl);
+               
+               setlocale (LC_CTYPE, lc_ctype);
+               g_free (lc_ctype);
+       }

Instead of just calling hspell_trycorrect. Is this what you also did?

Anyway, after http://ivrix.org.il/bugzilla/show_bug.cgi?id=37 is solved, the
next version's hspell_trycorrect will no longer have this problem, and it will
work (using iso-8859-8) regardless of the (irrelevant) locale setting.

Comment 66 alan 2004-07-22 07:08:11 UTC

Hi Nadav,
 
   I overrode snprintf, using the version at
http://www.ijs.si/software/snprintf/ .
 
Alan

Comment 67 ilan_shaviv 2004-07-22 09:07:45 UTC

Hi to you all,

I've just joined this issue. I am a novice to OpenOffice and I've got a Mac...

 I wanted to have the hspell compiled for the 1.1.2 installation. I managed to create 
the binary and install properly the version with hspell 0.6 but I ran into a few problems when 
attempting to compile the 0.8 version. 
Apparently I ran into the same problems that Rene encountered (described in her e-mail from Jul 12 07:
31:31 -0700 2004), specifically the patch to file sspellimp.cxx failed at Hunk #1 and #4. Here is the 
log:
patching file sspellimp.cxx
Hunk #1 FAILED at 1.
Hunk #4 FAILED at 72.
Hunk #5 succeeded at 353 (offset 5 lines).
Hunk #6 succeeded at 436 (offset 5 lines).
Hunk #7 succeeded at 444 (offset 5 lines).
Hunk #8 succeeded at 544 (offset 5 lines).
Hunk #9 succeeded at 559 (offset 5 lines).
Hunk #10 succeeded at 620 (offset 5 lines).
Hunk #11 succeeded at 634 (offset 5 lines).
Hunk #12 succeeded at 643 (offset 5 lines).
Hunk #13 succeeded at 654 (offset 5 lines).
Hunk #14 succeeded at 679 (offset 5 lines).
Hunk #15 succeeded at 699 (offset 5 lines).
2 out of 15 hunks FAILED -- saving rejects to file sspellimp.cxx.rej

From what I could decipher from the .diff file, hunk #4 has THE code change. 

Is there a possibility to have the latest version, without patches, posted? That is, only the OOo interface 
since I already have the hspell 0.8 library with the new set_dict_dir function patched and compiled.

Ilan Shaviv.

P.S. As noted, I've got the binary version for OS X with hspell 0.6 packed and ready to be applied. 
Where should I post it so it be available for other Mac users?

Comment 68 rene 2004-07-22 10:59:47 UTC

Hi Ilan,

http://people.debian.org/~rene/openoffice.org/hspell-sspellimp-patch-new.diff

is what I use for building with 1.1.2. (source from Jul 12 ported to 1.1.2).

It does not contain yet the fixes for the <word>. problem, though would have
to merge that once the official fix is there :-)

Wrt hspell: just remove libhspell.a ans hspell.h (and eventually libz.a too when
you want to use system's zlib) and change libhspell.a to -lhspell (and analogous
for libz.a)

That's how I change the source after extracting:

# remove libz.a, libhspell.a and hspell.h since we want to build
# against the system ones
rm -f ./lingucomponent/source/spellcheck/hspell/libhspell.a
rm -f ./lingucomponent/source/spellcheck/hspell/libz.a
rm -f ./lingucomponent/source/spellcheck/hspell/hspell.h
# exchange the makefile.mk patch with one which copes with the
# changes for system myspell and use -lz and -lhspell
cp debian/hspell-makefile-patch-new.diff \
./lingucomponent/source/spellcheck/hspell/makefile.mk.diff
# port over to 1.1.2
cp debian/hspell-sspellimp-patch-new.diff \
./lingucomponent/source/spellcheck/hspell/sspellimp.cxx.diff
# change build_hspell.sh to reference the right ENVFILE
perl -pi -e \
's/LinuxIntelEnv.Set.sh/LinuxIntelEnv.Set.sh/' \
    ./lingucomponent/source/spellcheck/hspell/build_hspell
cd ./lingucomponent/source/spellcheck/hspell && \
        sh ./build_hspell
patching file makefile.mk
patching file sspellimp.cxx
patching file sspellimp.hxx
patching file sreg.cxx
[...]

Regards,

René

Comment 69 rene 2004-07-22 12:09:37 UTC

Hi Nadav,

err.. istn't g_ glib? I think it is suboptimal to use it as it introduces a new
unneeded dependency.

and use of the "normal" strdup is problematic for != GNU/Linux too because
strdup is not really portable. That's why myspell uses a mystrdup...

Anyway, maybe the overroden snprintf may work but I just wanted to give
my 2¢ :)

Regards,

René

Comment 70 nyh 2004-07-22 13:12:21 UTC

Hi Rene, I am not sure which code you're referring to - mine has neither g_*
functions nor strdup(). My patch to hspell to fix this problem involved
replacing calls to snprintf with the "%.*s" format (which unfortunately uses the
user's locale) by a simple new function I wrote called "splice" (see
http://ivrix.org.il/bugzilla/attachment.cgi?id=14&action=view).

Alan's patch simply involves replacing snprintf by a version which doesn't use
locale information. Kobi Zamir described in
http://ivrix.org.il/bugzilla/show_bug.cgi?id=37 another workaround, of
surrounding the hspell_trycorrect calls by set_locale (ugly!, but works).

Comment 71 rene 2004-07-22 13:33:53 UTC

Hi,

that workaround you posted uses g_* :-)
(well, I added this in my working copy now without the g_* ;) )

Regards,

René

Comment 72 pavel 2004-08-17 11:25:21 UTC

What is the status? This issue is targeted to 1.1.3, but I do not think we can
made it...

Comment 73 alan 2004-08-18 14:16:16 UTC

Pavel,
Because of licensing issues (see earlier comments to this issue), we will not be submitting the 
spellchecker code for integration in the main source tree at this time. (The code is available at 
www.openoffice.org.il/hspell_src.tar.gz.) We are, however submitting, changes to scp and 
config_office for installing and building a Hebrew spellchecker. The necessary patches to 1.1.2 will 
be attached to this issue presently.

Comment 74 alan 2004-08-18 14:19:43 UTC

Created attachment 17211 [details]
scp and config_office patches to 1.1.2 for hebrew spellchecker

Comment 75 pavel 2004-08-18 17:09:30 UTC

retarget to 1.1.4.

Comment 76 Martin Hollmichel 2004-10-15 14:12:09 UTC

retarget.

Comment 77 Martin Hollmichel 2005-06-06 15:50:55 UTC

move target to OOo PleaseHelp since there is no short time solution in sight.

Comment 78 rene 2007-08-03 13:44:42 UTC

I had some thoughts about this:

It's true that we neither can link nor dlopen() libhspell directly inside OOo,
but who says we need to?

If I am not mistaken and/or oversee something there's two possibilities to add
hspell support:

- We just call the hspell binary. As it then doesn't include hspells code
  it's ok (see the libpaper example)

- The hspell support is not integrated into OOo at all but distributed as
  a standalone extension (like voikko -  http://voikko.sourceforge.net/).
  That would need a overhaul of what you currently do, since you currently
  need internal headers and you then can use the public API only, but if it's
  possible for voikko it probably should for hspell, too?.
  If it's an separate extension, it afais even *is* allowed to link against
  libhspell as the extension itself is then GPL but links with OOos LGPLed
  code (and not vice-vera) which is OK.

I've unfortunately neither the time nor the skills to do either, but...

Comment 79 rene 2007-08-29 11:49:36 UTC

reassign away from khendricks, assign to ayaniger who did the initial hspell things.

ayaniger: what do you think about my last comment (especially the second part)?

Comment 80 rene 2008-01-04 22:37:39 UTC

ayaniger: any status on this? any comment to my proposal? A "native" (hspell)
spellchecker in OOo or as an extension would be quite nice....

Comment 81 yba 2008-01-06 12:24:23 UTC

Hi Rene,
Hebrew OOo is as of this date, January 6, 2008, without budget, so we do not
have the resources to reply to your request. We are waiting for additional
funding from the Israel Ministry of Finance.
Regards,

 - yba

Comment 82 alan 2008-01-07 05:38:31 UTC

Rene: 

I think that calling the hspell binary is not a good option, since we want
something that will work for the Windows version, and for Windows we can safely
assume that there won't be a standalone binary. The authors of hspell have
written that they have no intention to port to Windows.

I haven't yet created a standalone extension to OOo, so there would be a
learning curve involved and it wouldn't be trivial for me to do. At present, we
don't have the resources for that.

Comment 83 hatapitk 2008-01-24 14:11:06 UTC

I noticed this issue and decided to try the extension approach of hspell 
integration by modifying the Finnish spellchecking extension to use hspell 
instead of voikko. I managed to get a working proof-of-concept done in just a 
few hours.

On Linux (at least Debian) this can be tried by installing subversion, 
openoffice.org-sdk and hspell and running the following commands:

$ svn co 
https://voikko.svn.sourceforge.net/svnroot/voikko/branches/ooovoikko/hspell
$ cd hspell
$ /usr/lib/openoffice/sdk/setsdkenv_unix
$ make oxt
$ unopkg add build/hspell.oxt

There are still many references to Voikko, and the documentation has not been 
touched at all. While I do not know any Hebrew, pasting some Hebrew text from 
the net to OOo seems to lead to most of words being accepted, and some that 
are flagged as errors will have suggested corrections in Hebrew. So it must be 
working at least somehow.

I will not work on this much more, since without knowing the language it is 
obviously quite difficult. But if you find this code useful, I can clean it up 
and answer any questions you might have about it. The original extension can 
be built on Linux, Windows and OS X and supports building both fully 
standalone versions (with all the required files in the extension) and system 
integrated versions (where spellchecking library has already been installed on 
the system, as in the Linux distributions).

Comment 84 danken 2008-01-25 07:56:38 UTC

hatapitk, thank you for your work!
I would like to check the Hebrew side of it, but I am afraid that my knowledge
of ooo is lesser than yours of Hebrew: I have ooo from my destribution, and
that's that. If you could tell me how I could test your port on my system (I
don't even have setsdkenv_unix), or just show me some screenshot, it would be great.

Comment 85 hatapitk 2008-01-25 08:21:46 UTC

Created attachment 51149 [details]
Screenshot showing some random Hebrew news and hspell extension in use

Comment 86 rene 2008-01-25 08:26:53 UTC

reopen, there's ongoing development/discussion (thanks hatapitk), so RESOLVED -
LATER doesn't fit that much anymore.

Comment 87 danken 2008-01-25 09:04:00 UTC

how typical, a passage about war :-(
Hebrew spelling is fine. And the correction list is smarter than that  of just
guessing.

Frankly, I hoped for a bigger improvement over "my" wordlist approach. The thing
is, that modern Hebrew uses quotation mark before the last letter to signify an
acronym, such as צה"ל (IDF). The current implementation considers quotation mark
as a delimiter, which makes acronyms unidentifiable. To make it complicated,
quotation marks are also used for the same task as in English, where they ARE
delimiters. 
I suspect the situation with single quote, that may also be part of words, is
not much different. For example ג'ינג'י (red-headed).

Can you do something regarding to this?

Comment 88 hatapitk 2008-01-25 14:36:20 UTC

Unfortunately this cannot be fixed in the extension, because the code that 
identifes word boundaries is internal to OOo. It is possible to make language 
specific changes to these rules, as we did for Finnish (see issue #58513). But 
I do not know what side effects would adding " as a middle letter in Hebrew 
have on other components (such as Hunspell Hebrew spellchecking). But if this 
is fixed inside OOo and hspell supports this, it should work without any 
changes to the extension.

Comment 89 kaplanlior 2010-08-21 18:07:08 UTC

Please make this issue depend on #99796. Thanks.

Comment 90 rene 2010-08-22 00:18:03 UTC

kaplan: this issue is not progressing right now at all anyways afaics, but ok :)

Comment 91 Pedro Giffuni 2011-12-01 16:17:24 UTC

We dont do GPL here, we recommend the Apache License 2, which
Is GPL3 compatible and in general more distribution friendly.

Dictionaries were removed as part of Apache IP Clearance process.

Comment 92 Marcus 2017-05-20 11:13:21 UTC

Reset assigne to the default "issues@openoffice.apache.org".

Comment 93 oooforum (fr) 2019-10-06 13:46:13 UTC

(In reply to Pedro Giffuni from comment #91)
> Dictionaries were removed as part of Apache IP Clearance process.
Dictionnaires are provided under extension (OXT)