Apache OpenOffice (AOO) Bugzilla – Issue 25832
Hebrew spellchecker (non-MySpell based)
Last modified: 2019-10-06 16:57:28 UTC
Attached are files containing code for the hspell Hebrew spellchecker, as well as dictionary files, and changes in OOo build files which incorporate hspell.
Created attachment 13395 [details] Code and dictionary files for Hspell Hebrew spellchecker
Created attachment 13396 [details] Changes to existing OOo files for making hspell part of the OOo build
Created attachment 14060 [details] New breakiterator class for Hebrew, new file breakiterator/data/dict_word_he_IL.h file
Hi, Adding myself as CC to this issue since I am the lingucomponent lead where most of these changes go. We exchanged-emails about integrating this awhile back. I never heard back if you were willing to join the lingucomponent project to support the code you are contributing? The same thing applies to handling qa and related issues? You should probably split out the breakiterator patches from the lingucomponent patches and dictionaries patches so that the breakiterator patches get looked at and evaluated more quickly. Kevin
OK, I've joined the lingucomponent project, and I'll handle support and qa for hspell. I've also split off the breakiterator code, and submitted it in issue #27076. Alan
Hi, Okay, I am going to integrate all of your patches into the next cws 680 tree that comes after cws_src680_ooo20040329 which just closed. You have the signed the JCA right? So this will all be part of OOo 2.0. Please ask mh@openoffice.org (Martin) if your changes are okay to go into the 1.1.X series (perhaps 1.1.3?). I will let you know which CVS tag to look at and verify once a new 680 cws opens. Thanks! Kevin
set target to 1.1.2
Hi, Since this is now targeted to OOo 1.1.2, I took a look at the code itself. 1. Your endian detection code will not work for all platforms. Perhaps including one of the endian headers might be better than simply identifying platforms here. For example: PPC Linux is Big Endian but does not define MACOSX and I believe Solaris x86 is Little Endian but does define SOLARIS Perhaps you should look at the endian detection code used in the old thesaurus code instead of this approach. // Check for bigendian representation of two-byte chars #if defined(SOLARIS) || defined(MACOSX) low_byte_position = 1;; #else low_byte_position = 0; #endif 2. You named the sprophelp.cxx the same as the myspell names which will mean they will collide in the obj directory of the output tree with those form myspell. I will change these to hprophelp* to prevent conflicts where the object files are stored. 3. Your makefile.mk links with ulingu but does not need to in any way? That seemed to be a holdover from the myspell makefile.mk? Can I remove that? 4. I do not see a patch to config_office to enable or disable the hebrew environment variable used for the test (we should probably disable the complete component build based on that value and not just the dictionary itself). Please look in config_office at how the other dictionaries are enabled and disabled as examples and please post a patch that uses that value in lingucomponent in hspell, in dictionaries, and in scp to disable the component if not enabled. Please set it up to enable things by default and only disable if not sepcifically listed in the config_options (just like the other languages). 5. Have you tried building the hspell pieces on a Windows platform to make sure it will build? If not, at least try doing the build with -ansi -pedantic -Wall to make sure it has no obvious problems. Building it with debug_malloc might also be useful to detect any memory leakage. Thanks, Kevin
I took a look at the i18npool breakiterator patches, looks good so far, just that I wouldn't name them "he_IL" but "he" instead, since they're not region dependent (I suppose). This should be corrected before applied. Of course I can't say anything about the quality of the breakiterator rule itself, simply because I can't judge it as I don't speak Hebrew and just know how to write my name ;-) @Ayaniger: please note that this implementation is only valid for OOo1.1.x, for OOo2.0 we took a different approach of precompiled rules and you'd need to adapt to it, please see the current SRC680 code base. For OOo2.0 please create a separate issue and attach your patch there. Btw: to the issue 27076 you mentioned there aren't any files attached. As for the endian headers Kevin mentioned, please use osl/endian.h Thanks Eike
Citing the latest comment Alan gave in issue 27076: "My real name is Alan Yaniger, I work for Tk Open Systems, and we've signed a corporate JCA several times." So we're good to go with his contribution. I'll take the breakiterator attached to this issue and submit it to the ooo112fix1 CWS, so further development of the spellchecker will have the right basis. As OOo1.1.2 is close to closing, we might need to retarget this issue to OOo1.1.3
Hi, As mentioned before, I have commited most of this to cws ooo112fix1. I took it upon myself to fix the endian issues, the build issues I pointed out earlier. But I still need the author to properly create config_office patches to enable and disable the build/ dictionaries (just like the other languages have) and scp patches to enable/disable the inclusion of the library. Without these changes the dictionary will never get built or delivered to the solver. So Alan, please post new patches for scp and config_office then use switches to enable and disable the build and the delivery of hspell to the solver. Also, please checkout and build ooo112fix1 to verify correct operation. Kevin
Hi, Still waiting for scp, and config office patches from author to conditionally enable hspell component and dictionaries. So retargeting to OOo 1.1.3. Kevin
I found the following copyrights in the hspell sources: Copyright (C) 2003 Nadav Har'El and Dan Kenigsberg Mark Martinec <mark.martinec@ijs.si>, April 1999. Copyright 1999, Mark Martinec. All rights reserved ayaninger: can we please add Nadav Har'El and Dan Kenigsberg email address to the copyright notice so that I'm able to verify that they're covered by your signed JCA. if their copyright is not covered by your JCA (if looking at http://www.ivrix.org.il/projects/spell-checker/download.html I have exactly this impression) we have to remove this code. I don't see how the copyright of Mark Martinec is covered by your JCA, can you please explain. mh->khendricks: I fear we have to revert these changes for 1.1.2 until the copyright and license status is clarified.
i Martin, Yes it looks like pure GPL with no special exclusions and things. The author of this issue has not once responded to any of the postings I made to this issue so far. Perhaps someone should e-mail him directly? Do you want me to cvs remove those pieces now? Noth lingucomponent/source/spellcheck/hspell/ and dictionaries/he_IL would need to be removed since the license explicitly covers both hspell and its dictionaries (based on the webpage you cited). Please let me know what you want. Kevin
Kevin and Martin, I would like to check out ooo112fix1, but I'm having trouble doing it. (This is my first time checking out OOo code from CVS.) Kevin, could you give me instructions about how to get it? (I tried cvs -d :pserver:anoncvs@anoncvs.services.openoffice.org:/cvs checkout lingucomponent but this didn't contain hspell code. Which version is this?) Regarding licensing issues: I've changed the hspell files so that code licensed to Mike Martinec is no longer used. I've attahced a zip file with all files in the hspell directory, as well as a patch file which shows the differences between the new hspell files and the old ones. As Dina of TKOS has informed you, she has been in contact with the authors of hspell about hspell's GPL license. I'll deal with the scp and config_office changes presently. Regarding the other issues Kevin has raised (sprophelp, endian checking, ulingu), once I check out the 000112fix1 code, I'll see what hasn't already been done. Sorry for the delay. Alan
Created attachment 15019 [details] hspell files, without code licensed to Mike Martinec
Created attachment 15020 [details] differences between first version of hspell and most recent one
Kevin, We have built hspell under Windows, though we didn't build it with debug_malloc. Alan
Hi, Don't bother checking out anything from CVS. All of hspell that was committed to cws_srx645_ooo112fix1 was cvs removed from the repository pending clarification of the license issues. We can't have full GPL pieces in the tree. So there is nothing there for you to check out. Once a new tree opens and Martin gives the go ahead we can recommit things. Once that happens you would use the following sequence of commands: export CVSROOT=:pserver:anoncvs@anoncvs.services.openoffice.org:/cvs cvs login # password is anoncvs cvs co -r THETAG lingucomponent cvs co -r THETAG dictionaries cvs co -r THETAG config_office cvs co -r THETAG scp where you replace THETAG with the specific tag used for the child work space hspell gets committed to. For example, up until it was removed, THETAG would have been cws_srx645_ooo112fix1 Hope this explains things ... Kevin
Hi, I have two questions :-) a) why 0.6 and not 0.7? Why not the "normal" way with hspell.tar.gz or so in download/apply patches etc and link then with it as we do for the other external stuff? This would probably making a patch for using system hspell (static...) easier since it then is easier to distinguish... Or probably I don't grok the diff between the stuff you attached and Debian's hspell which is hspell 0.7.... b) why using uncompressed dictionary files? If this is in OOo I would like to use the files from system hspell (why duplicate them) but they are in 0.6 and 0.7 gziop-compressed (*.wgz*) Regards, René
Hi Rene, a) When we integrated hspell into OOo, 0.6 was the latest version, and 0.7 did not yet exist. Since hspell development is still in progress, and the authors have begun work on an API to hspell but have not yet finished it, we don't think it worth the time to update the OOo integration with each new version of hspell that comes out. I've discussed with Kevin Hendricks whether to locate hspell under lingucomponent. To me it seemed the natural place to put it since it's based on the code in lingucomponent/source/spellcheck/spell. Kevin, in emails to dev-lingucomponent on Oct. 9 and Oct. 15, wrote that if there are no licensing problems, it could be integrated there. We're working on the licensing issues right now. b) I originally decided to use uncompressed files in order to use hspell in the Windows build. The original hspell code executed a shell which ran gzip in order to uncompress them. This could not be assumed to work under Windows, unless the user happened to have cygwin installed, and a path pointing to it. Being a newbie to OOo at the time, I didn't look into the compression library that exists in OOo. In the future, we plan to use zlib to open compressed dictionay files. Best, Alan
Still waiting for an answer from the authors of hspell about licensing. In the meantime, I'm posting the changes to made if the licensing issue works out. Attached are : 1. the contents of the new directory "dictionaries/he_IL" 2. a patch for scp and config_office (diffed with ooo12fix), for enabling and disabling the Hebrew spellchecker 3. a patch for changing hspell's endian code and removing ulingu from the makefile Alan
Created attachment 15401 [details] files for dictionaries/he_IL
Created attachment 15402 [details] Patch for scp and config_office to enable/disable Hebrew spellchecker
Created attachment 15403 [details] Changes to hspell files (endian code, ulingu)
hspell DLL for Windows is attached, for Daniel Boezle to check out
Created attachment 15732 [details] Hebrew spellchecker DLL for Windows
Created attachment 15832 [details] Source, DLL, and pdb for Daniel Boelzle
@Alan: Your tools import library on Windows seems to be broken/not matching with the actual tl645mi.dll. The symbols resolved (ordinals) do not match the ones the compiler expected when linking against tl645mi.dll. Because tools lib is no SDK library, please linkl against the exact version of your OOo, e.g. m40 etc. Please fix this first, then go on to the Tools/Languages problem.
Hi. I'm Nadav Har'El, one of the Hspell authors, and I logged into this bug tracking system so you can ask me questions directly :) Hspell 0.8, which is due out next week, will use Zlib, rather than pipes and gzip, so hopefully if you use it you should have no problems reading the compressed dictionaries, even on Windows (assuming you're already using the zlib library - it has a BSD license). I wholeheartedly recommend using the compressed dictionaries, because they are 10 times smaller on disk. Release 0.8 also continues to improve Hspell's Hebrew vocabulary. Unfortunately, there is still no solution to the license incompatibility. Hspell is, and has always been, GPL, and we have not yet changed our mind about that. I suggested to Dina a few ways to circumvent this "problem", which basically comes down to OpenOffice using a seperately installed Hspell (if available). I also suggested referring to what Aspell did to integrate the Hspell dictionary into their upcoming 0.6 release, and to what "Mits Petel" did with his "Hspell Service" for the Mac. If anybody wants to discuss this issue with me further, you can do so here or to my personal email.
Hi Nadav, this is good news since Debians hspell (of course(?)) uses uncompressed dictionaries and using the system hspell dictionaries from hspell integrated in OOo or the package providing the hspell extension package to OOo is then much easier -- if Alan decides to upgrade to hspell 0.8. In this way it probably would make sense to make the OpenOffice.org additions to hspell a bit more seeable (i.e. not modifying the original hspell files or use #ifdef's or extract the hspell tar.gz and apply a clear patch), diffing the hspell versions and the OOo hspell is a mess to grok (especially if you have no clue how hspell works). This would enable me to easily use libhspell.a included in Debians hspell package (no need to build hspell extra when we already have it - and Debian unstable already has hspell 0.8 which is now released it seems ;) ). Regards, René
Hi, better late than never, but.... ermm. I of course meant compressed.... Regards, René
Hi, *sigh* ... and this of course if the license situation becomes clear.... I think using the installed hspell somehow if installed would be a good way if - as it seems now - direct hspell support over the library could not be integrated into OOo.. Regards, René
There is a now separately installable hspell package for OOo, available at http://www.openoffice.org.il. This package uses the code posted to this issue, which is based on hspell 0.6. Alan
Here's a precise URL for downloading the hspell package: http://www.openoffice.org.il/hspell.tar.gz Alan
Hi, in my opinion this is deeply suboptimal. We in Debian already have a wish to integrate hspell (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=255451) which is then marked forwarded to this issue :-). I would do it if I could but I would have to patch the hspell stuff into our package build (or use the SDK and the appopriate headers which isn't packaged for Debian yet and therefore not allowed to use) and then patch the hspell package maintainer scripts to generate the uncompressed dictionaries for OpenOffice.org on its installation/upgrade. Why can't you just use hspell 0.8 which solves the compressed/uncompressed issue? This - assuming you do a sane split - would also help to just use the already available libhspell.a. I don't see why using an outdated version internal is a good way.... Oh, and I wonder whether that would be possible anyhow because of the license incompatibility.... And why can't you try to go for hspell "inclusion" in official OOo via just using it if installed as Nadav pointed out. Which would be better in some ways since it then a) is included in the official OOo and hebrew people would install/have installed hspell anyhow, wouldn't they? And b) this removes the need to add some crude hack to the package build and generated an extra package for the hspell stuff... Or is this impossible? BTW, are the hspell 0.8 and 0.6 dictionaries compatible? I would just gunzip them from their hsepll location and save into the OOo location as a crude hack. I would help, but I don't understand hspell internals (yet), so.... Regards, René
Actually, yes - you can use the most recent Hspell 0.8 dictionaries instead of the outdated ones in the package! We just tried it, and even created an RPM for the entire pacakge, and it is working beautifully, and we'll release it to the public momentarily (you'll see the announcements on the linux-il and ivrix-discuss mailing lists). I was really excited seeing today, for the first time, beautiful Hebrew+English spellchecking working on my own OpenOffice (with English interface, of course) at home. Beautiful! (Dan told me I'm over-reacting, but I was really excited; Thanks to Alan and Dan for their work!). Moreover, once Alan releases the sources of his code (because his new package is GPL), I can very easily help him add the code to read the compressed dictionary, using zlib. This will not only allow the package's installation to be much smaller - it will also allow the package not to contain the dictionaries at all, and use the dictionaries from a system-installed copy of Hspell (maybe this will even let you return the code back to being LGPL and not GPL, but I am not a lawyer). Unfortunately this will spell trouble (no pun intended) if we change the format of the dictionaries, so this part is not essential.
Rene, I've written a working hspell module for non-Hebrew OOo that uses the hspell 0.8 library, and the new dictionaries, compressed. We'll be posting the module and the sources in the coming days. Regarding including hspell in the official OOo: licensing issues are preventing that. Alan
The updated hspell module for Linux, using libhspell.a of hspell version 0.8 , is now available. The installation package is available at: http://www.openoffice.org.il/hspell.tar.gz The sources are available at: http://www.openoffice.org.il/hspell_src.tar.gz Alan
there is a permission problem. I get 403 Forbidden.
Hi Rene, I'll post the code here so we won't have to wait until the sysadmin problem gets worked out. You mentioned that if you already have a Debian hspell package installed that you would not want to duplicate the dictionaries when installing the hspell-interface-to-OOo package. On the other hand, I don't want to leave these dictionaries out of the install package on our OOo Hebrew website, because some people who download it may not have hspell installed on their system. They can't rely on dictionaries that they don't have. What you might do for a Debian hspell/OOo package is to change my install script. Rather than copy the dictionaries to OOoDir/share/dict/ooo as I do, you would create links there that point to the dictionaries already installed. This would avoid duplicating the dictionaries. Alan
Created attachment 16414 [details] Hspell install package for non-Hebrew OOo on Linux
Created attachment 16415 [details] script and patches to build hspell/OOo interface
Hi Alan, yes. this was intended from the beginning. I never said *you* have to strip your stuff away but I said that I want to use already instaalled dictionaries... and as in the source (which I have to use since we have to build from source and moreover support many other architectures besides i386) is no install script I do this anyway. Even if there were I would change it... Regards, René
Thanks Alan. I just have one question. What is the "libhspell.a" that is in the source package? Is this a straight compilation of Hspell 0.8, or did you modify that distribution? If you did, can you please publish these changes? I may make some of these changes to the general distributions, so that you won't have to repeat them on Hspell 0.9.
Hi Alan, your sspellimp.cxx patch does not apply here on 1.1.2 (the makefile.mk one, too but that is because the makefile.mk is changed from use for use with system myspell.diff). OK, hunk 1 probably can be ignored (deleted), but what is with hunk 5? Log: tar xfvz debian/hspell_src.tar.gz INSTALL lingucomponent/source/spellcheck/hspell/ lingucomponent/source/spellcheck/hspell/build_hspell lingucomponent/source/spellcheck/hspell/hspell.h lingucomponent/source/spellcheck/hspell/makefile.mk.diff lingucomponent/source/spellcheck/hspell/sreg.cxx.diff lingucomponent/source/spellcheck/hspell/sspellimp.cxx.diff lingucomponent/source/spellcheck/hspell/sspellimp.hxx.diff lingucomponent/source/spellcheck/hspell/gzbuffered.h lingucomponent/source/spellcheck/hspell/libhspell.a lingucomponent/source/spellcheck/hspell/libz.a # remove libz.a, libhspell.a and hspell.h since we want to build # against the system ones rm -f lingucomponent/source/spellcheck/hspell/libhspell.a rm -f lingucomponent/source/spellcheck/hspell/libz.a rm -f lingucomponent/source/spellcheck/hspell/hspell.h cd lingucomponent/source/spellcheck/hspell && sh ./build_hspell patching file makefile.mk Hunk #2 FAILED at 63. Hunk #3 FAILED at 108. Hunk #4 FAILED at 140. Hunk #5 succeeded at 169 (offset 8 lines). 3 out of 5 hunks FAILED -- saving rejects to file makefile.mk.rej patching file sspellimp.cxx Hunk #1 FAILED at 1. Hunk #5 FAILED at 137. Hunk #6 succeeded at 445 (offset 5 lines). Hunk #7 succeeded at 528 (offset 5 lines). Hunk #8 succeeded at 536 (offset 5 lines). Hunk #9 succeeded at 636 (offset 5 lines). Hunk #10 succeeded at 651 (offset 5 lines). Hunk #11 succeeded at 712 (offset 5 lines). Hunk #12 succeeded at 726 (offset 5 lines). Hunk #13 succeeded at 735 (offset 5 lines). Hunk #14 succeeded at 746 (offset 5 lines). Hunk #15 succeeded at 771 (offset 5 lines). Hunk #16 succeeded at 791 (offset 5 lines). 2 out of 16 hunks FAILED -- saving rejects to file sspellimp.cxx.rej patching file sspellimp.hxx patching file sreg.cxx dmake: makefile.mk: line 190: Error -- Unmatched .END[IF] ---* RULES.MK *--- Regards, René
Alan, It seems to me that the current binary looks for the dictionary files in /usr/local/share/hspell/ and not where the installer puts them (/usr/lib/openoffice/share/dict/ooo). It seems that this wrong path is hard-coded into the binary (which is a known issue of hspell http://ivrix.org.il/bugzilla/show_bug.cgi?id=9 ). I believe that it should be delt with rather thoroughly, as in compile time you cannot tell where exactly will ooo will be installed.
Hi, since I wasn't yet able to build it I did not catch this. Yeah, bad. And distibution packages should *not* rely on *anything* on /usr/local so just placing the links/files there is not possible... Regards, René
Hmm. When I think of it.. it isn't cardcoded in Alan's sources so it must be hardcoded in libhspell.a in which case using Debian's hspell.h should be OK and the dictionaries are searched in /usr/share/hspell....
Yes, but libhspell.a is a static library, already compiled into Alan's binary, so unless you find a way to recompile it, you can't make it use Debian's libhspell.a. Hopefully tonight or tomorrow night I'll add to Hspell the possibility to set the dictionary directory on run-time (bug 9 in Ivrix.org.il/bugzilla), and Alan will then be able to use that feature.
> Yes, but libhspell.a is a static library, already compiled into Alan's binary, > so unless you find a way to recompile it, you can't make it use Debian's > libhspell.a. Sure I can. I kill the .a's and hspell.h off the *source* and build it (which I have to do anyway since I have to build from source, binaries are not allowed and we support architecures !i386, too)
Rene, you have a good point there. If you intend to compile this OpenOffice add-on from source, then indeed the solution you describe is probably the best one. The solution which involves getting from me (in a few hours I'll have it ready) a patch from Hspell that lets you define the dictionary path, is problematic in the sense that this will force you to also have to compile a patched version of Hspell. It will be convenient for Alan for creating his binary package, but perhaps not for you for the Debian package.
Strangely, I couldn't figure out how to add an attachment in this bug tracker (is this a permissions thing?), so I put the patch in an attachment in Hspell's bug tracker: http://ivrix.org.il/bugzilla/show_bug.cgi?id=9 or directly http://ivrix.org.il/bugzilla/attachment.cgi?id=7&action=view This is a small patch that adds a new function hspell_set_dictionary_path("/path/name/hebrew.wgz") function to be called before hspell_init() for setting the path that hspell_init() tries. The patch also patches the hspell(3) manual, so you can see the explanation of this function. The patch is for Hspell 0.8. It also contains the one-character fix to http://ivrix.org.il/bugzilla/show_bug.cgi?id=31 Alan, I hope this is what you wanted (if not, please tell me). Rene will most-likely not need this at all, as she would use Debian's libhspell.a.
Hi, he :-) Regards, René
Oops, sorry...
Rene, Here's a updated build package. It contains a call to Nadav's new function for setting the dictionary path, so that hspell will not look for the dictionaries in /usr/share/hspell. Alan
Created attachment 16451 [details] Build package for hspell, using new set_dictionary_path hspell function
Hi Alan, I think you did not understand - *I* have no problem with /usr/share/myspell. Saves me some symlinks from /usr/share/myspell/dicts (which is where in Debian the myspell dictionaries go; <ooo-dir>/share/dict/ooo is a symlink to there). Besides I want to use the system hspell, I don't see the sense shipping it in the source package (besides it is impossible since I need libhspell.a for powerpc, s390 and sparc, too and in Debian you *have* to build from source). And since this version does not have that patch (does not need to yet?) I have no problem (besides the fact that the patch does not apply to 1.1.2, but I fixed that now, currently testing). Other persons may have use for that function, me not... Regards, René
Alan, would you please attach/send the binary library of your latest version, so that I could check it out?
OK, Rene.
Created attachment 16483 [details] Hspell binary install package with patched libhspell.a
Alan, Mooffie spotted a bug that seems to stem in the conversion of strings between OO and hspell (See http://ivrix.org.il/bugzilla/show_bug.cgi?id=37). If you start LC_ALL=en_US.UTF-8 oowriter and write a wrong hebrew word , you would receive wrong list of suggestions. For נסיון, it gives a list of "וסיון", an empty word, and "ציון".
Alan, I think I unfortunately found a serious bug in the new package, one that didn't exist in the old one (based on the Hspell 0.6 code). Whenever you have a word ending in a dot (like in the end of a sentence), the dot gets joined into the word, and the whole word is considered wrong. Do add insult to injury, hspell spews to standard error "Hspell: unknown letter ...." (I should fix that - the hspell api should not print anything!). Strangely, this only appears to be happening with a dot - words that are followed by a comma don't exhibit this problem. This problem did not exist in the previous version (which used Hspell 0.6 code). I reproduced it in Fedora, with Dan's new ooo-hspell-0.8-2.i386.rpm), and with LC_CTYPE=C (if that matters).
We've fixed the bugs that Dan and Nadav reported here in their comments of July 15. We'll be publishing the code soon. Alan
Alan, I'm curious how you solved the bug Dan (or Mooffie) reported, because as strangely as it may seem, Hspell's hspell_trycorrect routine actually has a bug, in that it uses snprintf which on glibc is locale-specific. http://ivrix.org.il/bugzilla/show_bug.cgi?id=37 continues our evolution of the study of this bug. Kobi Zamir discovered an effective (but quite ugly) workaround for this bug - by doing + { + char *lc_ctype; + + lc_ctype = g_strdup (setlocale (LC_CTYPE, NULL)); + setlocale (LC_CTYPE, "he_IL.iso88598"); + + hspell_trycorrect (hspell_common_dict, iso_word, &cl); + + setlocale (LC_CTYPE, lc_ctype); + g_free (lc_ctype); + } Instead of just calling hspell_trycorrect. Is this what you also did? Anyway, after http://ivrix.org.il/bugzilla/show_bug.cgi?id=37 is solved, the next version's hspell_trycorrect will no longer have this problem, and it will work (using iso-8859-8) regardless of the (irrelevant) locale setting.
Hi Nadav, I overrode snprintf, using the version at http://www.ijs.si/software/snprintf/ . Alan
Hi to you all, I've just joined this issue. I am a novice to OpenOffice and I've got a Mac... I wanted to have the hspell compiled for the 1.1.2 installation. I managed to create the binary and install properly the version with hspell 0.6 but I ran into a few problems when attempting to compile the 0.8 version. Apparently I ran into the same problems that Rene encountered (described in her e-mail from Jul 12 07: 31:31 -0700 2004), specifically the patch to file sspellimp.cxx failed at Hunk #1 and #4. Here is the log: patching file sspellimp.cxx Hunk #1 FAILED at 1. Hunk #4 FAILED at 72. Hunk #5 succeeded at 353 (offset 5 lines). Hunk #6 succeeded at 436 (offset 5 lines). Hunk #7 succeeded at 444 (offset 5 lines). Hunk #8 succeeded at 544 (offset 5 lines). Hunk #9 succeeded at 559 (offset 5 lines). Hunk #10 succeeded at 620 (offset 5 lines). Hunk #11 succeeded at 634 (offset 5 lines). Hunk #12 succeeded at 643 (offset 5 lines). Hunk #13 succeeded at 654 (offset 5 lines). Hunk #14 succeeded at 679 (offset 5 lines). Hunk #15 succeeded at 699 (offset 5 lines). 2 out of 15 hunks FAILED -- saving rejects to file sspellimp.cxx.rej From what I could decipher from the .diff file, hunk #4 has THE code change. Is there a possibility to have the latest version, without patches, posted? That is, only the OOo interface since I already have the hspell 0.8 library with the new set_dict_dir function patched and compiled. Ilan Shaviv. P.S. As noted, I've got the binary version for OS X with hspell 0.6 packed and ready to be applied. Where should I post it so it be available for other Mac users?
Hi Ilan, http://people.debian.org/~rene/openoffice.org/hspell-sspellimp-patch-new.diff is what I use for building with 1.1.2. (source from Jul 12 ported to 1.1.2). It does not contain yet the fixes for the <word>. problem, though would have to merge that once the official fix is there :-) Wrt hspell: just remove libhspell.a ans hspell.h (and eventually libz.a too when you want to use system's zlib) and change libhspell.a to -lhspell (and analogous for libz.a) That's how I change the source after extracting: # remove libz.a, libhspell.a and hspell.h since we want to build # against the system ones rm -f ./lingucomponent/source/spellcheck/hspell/libhspell.a rm -f ./lingucomponent/source/spellcheck/hspell/libz.a rm -f ./lingucomponent/source/spellcheck/hspell/hspell.h # exchange the makefile.mk patch with one which copes with the # changes for system myspell and use -lz and -lhspell cp debian/hspell-makefile-patch-new.diff \ ./lingucomponent/source/spellcheck/hspell/makefile.mk.diff # port over to 1.1.2 cp debian/hspell-sspellimp-patch-new.diff \ ./lingucomponent/source/spellcheck/hspell/sspellimp.cxx.diff # change build_hspell.sh to reference the right ENVFILE perl -pi -e \ 's/LinuxIntelEnv.Set.sh/LinuxIntelEnv.Set.sh/' \ ./lingucomponent/source/spellcheck/hspell/build_hspell cd ./lingucomponent/source/spellcheck/hspell && \ sh ./build_hspell patching file makefile.mk patching file sspellimp.cxx patching file sspellimp.hxx patching file sreg.cxx [...] Regards, René
Hi Nadav, err.. istn't g_ glib? I think it is suboptimal to use it as it introduces a new unneeded dependency. and use of the "normal" strdup is problematic for != GNU/Linux too because strdup is not really portable. That's why myspell uses a mystrdup... Anyway, maybe the overroden snprintf may work but I just wanted to give my 2¢ :) Regards, René
Hi Rene, I am not sure which code you're referring to - mine has neither g_* functions nor strdup(). My patch to hspell to fix this problem involved replacing calls to snprintf with the "%.*s" format (which unfortunately uses the user's locale) by a simple new function I wrote called "splice" (see http://ivrix.org.il/bugzilla/attachment.cgi?id=14&action=view). Alan's patch simply involves replacing snprintf by a version which doesn't use locale information. Kobi Zamir described in http://ivrix.org.il/bugzilla/show_bug.cgi?id=37 another workaround, of surrounding the hspell_trycorrect calls by set_locale (ugly!, but works).
Hi, that workaround you posted uses g_* :-) (well, I added this in my working copy now without the g_* ;) ) Regards, René
What is the status? This issue is targeted to 1.1.3, but I do not think we can made it...
Pavel, Because of licensing issues (see earlier comments to this issue), we will not be submitting the spellchecker code for integration in the main source tree at this time. (The code is available at www.openoffice.org.il/hspell_src.tar.gz.) We are, however submitting, changes to scp and config_office for installing and building a Hebrew spellchecker. The necessary patches to 1.1.2 will be attached to this issue presently.
Created attachment 17211 [details] scp and config_office patches to 1.1.2 for hebrew spellchecker
retarget to 1.1.4.
retarget.
move target to OOo PleaseHelp since there is no short time solution in sight.
I had some thoughts about this: It's true that we neither can link nor dlopen() libhspell directly inside OOo, but who says we need to? If I am not mistaken and/or oversee something there's two possibilities to add hspell support: - We just call the hspell binary. As it then doesn't include hspells code it's ok (see the libpaper example) - The hspell support is not integrated into OOo at all but distributed as a standalone extension (like voikko - http://voikko.sourceforge.net/). That would need a overhaul of what you currently do, since you currently need internal headers and you then can use the public API only, but if it's possible for voikko it probably should for hspell, too?. If it's an separate extension, it afais even *is* allowed to link against libhspell as the extension itself is then GPL but links with OOos LGPLed code (and not vice-vera) which is OK. I've unfortunately neither the time nor the skills to do either, but...
reassign away from khendricks, assign to ayaniger who did the initial hspell things. ayaniger: what do you think about my last comment (especially the second part)?
ayaniger: any status on this? any comment to my proposal? A "native" (hspell) spellchecker in OOo or as an extension would be quite nice....
Hi Rene, Hebrew OOo is as of this date, January 6, 2008, without budget, so we do not have the resources to reply to your request. We are waiting for additional funding from the Israel Ministry of Finance. Regards, - yba
Rene: I think that calling the hspell binary is not a good option, since we want something that will work for the Windows version, and for Windows we can safely assume that there won't be a standalone binary. The authors of hspell have written that they have no intention to port to Windows. I haven't yet created a standalone extension to OOo, so there would be a learning curve involved and it wouldn't be trivial for me to do. At present, we don't have the resources for that.
I noticed this issue and decided to try the extension approach of hspell integration by modifying the Finnish spellchecking extension to use hspell instead of voikko. I managed to get a working proof-of-concept done in just a few hours. On Linux (at least Debian) this can be tried by installing subversion, openoffice.org-sdk and hspell and running the following commands: $ svn co https://voikko.svn.sourceforge.net/svnroot/voikko/branches/ooovoikko/hspell $ cd hspell $ /usr/lib/openoffice/sdk/setsdkenv_unix $ make oxt $ unopkg add build/hspell.oxt There are still many references to Voikko, and the documentation has not been touched at all. While I do not know any Hebrew, pasting some Hebrew text from the net to OOo seems to lead to most of words being accepted, and some that are flagged as errors will have suggested corrections in Hebrew. So it must be working at least somehow. I will not work on this much more, since without knowing the language it is obviously quite difficult. But if you find this code useful, I can clean it up and answer any questions you might have about it. The original extension can be built on Linux, Windows and OS X and supports building both fully standalone versions (with all the required files in the extension) and system integrated versions (where spellchecking library has already been installed on the system, as in the Linux distributions).
hatapitk, thank you for your work! I would like to check the Hebrew side of it, but I am afraid that my knowledge of ooo is lesser than yours of Hebrew: I have ooo from my destribution, and that's that. If you could tell me how I could test your port on my system (I don't even have setsdkenv_unix), or just show me some screenshot, it would be great.
Created attachment 51149 [details] Screenshot showing some random Hebrew news and hspell extension in use
reopen, there's ongoing development/discussion (thanks hatapitk), so RESOLVED - LATER doesn't fit that much anymore.
how typical, a passage about war :-( Hebrew spelling is fine. And the correction list is smarter than that of just guessing. Frankly, I hoped for a bigger improvement over "my" wordlist approach. The thing is, that modern Hebrew uses quotation mark before the last letter to signify an acronym, such as צה"ל (IDF). The current implementation considers quotation mark as a delimiter, which makes acronyms unidentifiable. To make it complicated, quotation marks are also used for the same task as in English, where they ARE delimiters. I suspect the situation with single quote, that may also be part of words, is not much different. For example ג'ינג'י (red-headed). Can you do something regarding to this?
Unfortunately this cannot be fixed in the extension, because the code that identifes word boundaries is internal to OOo. It is possible to make language specific changes to these rules, as we did for Finnish (see issue #58513). But I do not know what side effects would adding " as a middle letter in Hebrew have on other components (such as Hunspell Hebrew spellchecking). But if this is fixed inside OOo and hspell supports this, it should work without any changes to the extension.
Please make this issue depend on #99796. Thanks.
kaplan: this issue is not progressing right now at all anyways afaics, but ok :)
We dont do GPL here, we recommend the Apache License 2, which Is GPL3 compatible and in general more distribution friendly. Dictionaries were removed as part of Apache IP Clearance process.
Reset assigne to the default "issues@openoffice.apache.org".
(In reply to Pedro Giffuni from comment #91) > Dictionaries were removed as part of Apache IP Clearance process. Dictionnaires are provided under extension (OXT)