Change spellchecking to hunspell in TinyMCE

Created:
Last Update:

Author: Christoph Stoettner
Read in about 9 min · 1712 words

Steamship cruising on Lake Wakatipu, Queenstown

Photo by Michael | Unsplash

In the last few years, I have had issues with application servers using a large amount of CPU and even hanging application servers running the Tiny Spellchecking service. It ended with disabled spellchecking in the Tiny Editors’ config.js.

SharedDirectory/customization/javascript/tiny/editors/connections/config.js

...
// Set to false to disable Tiny's spell checking service in TinyMCE and Textbox.io.
spellingServiceEnabled: false,
...

Now, after updating to the actual editor version, TinyMCE 5.10.2, we decided to re-enable the spellchecker, and for the first few days it looked like the issue was really resolved. Sadly, after about a week, the first application server started to use 800% CPU just for the server hosting the spelling service.

In the application server logs, we found messages like:

SystemOut.log of the application server running spellchecking service

So first, we see debug messages without enabling a trace, and on the top of the image, we see that a request ran over 1000 ms.

Support sent me the steps to disable the debug messages:

  1. Create a file called /opt/ephox/logback.xml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
   <appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
       <target>System.out</target>
       <encoder>
           <pattern>%date{yyyy-MM-dd HH:mm:ss.SSSX} [%thread] %-5level %logger{36} - %msg%n</pattern>
       </encoder>
   </appender>
   <logger name="ironbark" level="WARN"/>
   <root level="INFO">
       <appender-ref ref="CONSOLE"/>
   </root>
</configuration>

Important is line 9, which is set to DEBUG for TinyMCE 5.10.2, but WARN or ERROR will prevent these log messages.

  1. Add a custom JVM property (Server > Server Types > WebSphere Application Servers → server name > Process Definition > Java Virtual Machine > Custom Properties) to the application server where you installed the spellchecker.
logback.configurationFile: /opt/ephox/logback.xml

After this, the performance was slightly better, but still not good.

Today, I got the following update from Tiny:

Broadly, we believe that WinterTree spelling library is having problems with long words with possible hyphens, especially in German. In this case, we recommending trying the Hunspell library instead.

We can see that the problem language is always German, and the number of characters is higher than 20. Due to implementation aspects with how WinterTree’s spelling engine works, these cases can be particularly problematic.

The most egregious offender is:

Took 25270 milliseconds.

Which meant that it took over 25 seconds to generate suggestions for 1 word in a document. As you can imagine, when this starts happening, sending lots of words becomes a problem. However, there aren’t many words that take more than 1 second to generate, because this is the entire list in the logs sent to us.

In general, you could likely avoid this behavior by using Hunspell libraries, particularly for German. Here is our documentation about adding Hunspell dictionaries to Spellchecker Pro. You likely have specific separate instructions for setting up Hunspell, but it will be effectively the same under the hood, as it’s a server-only setting.

https://www.tiny.cloud/docs/tinymce/6/self-hosting-hunspell/

Tiny/HCL Support

So here I could stop and point you to support, but I have had some issues during the activation of Hunspell so far.

First, the webpage says, “Tiny provides two downloadable bundles of Hunspell dictionaries,” which I couldn’t find. So I searched for other download options. The best match were the dictionaries included with LibreOffice : https://github.com/libreoffice/dictionaries , but the folder structure and naming do not match the one requested by Tiny.

#!/usr/bin/env bash

git clone https://github.com/LibreOffice/dictionaries.git /tmp/dictionaries
for i in af_ZA da de_DE en_AU en_CA en_GB en_US es fr hu it_IT nb_NO nl_NL nn pl pt_BR pt_PT sv_FI sv_SE ; do
  mkdir -p /opt/ephox/hunspell-dictionaries/$i
  find /tmp/dictionaries -iname $i*.aff -exec cp {} /opt/ephox/hunspell-dictionaries/$i/$i.aff \;
  find /tmp/dictionaries -iname $i*.dic -exec cp {} /opt/ephox/hunspell-dictionaries/$i/$i.dic \;
done

This script creates the expected folder structure and copies the dictionaries to the right place.

tree /opt/ephox/hunspell-dictionaries/
/opt/ephox/hunspell-dictionaries/
├── af_ZA
│   ├── af_ZA.aff
│   └── af_ZA.dic
├── da
│   ├── da.aff
│   └── da.dic
├── de_DE
│   ├── de_DE.aff
│   └── de_DE.dic
├── en_AU
│   ├── en_AU.aff
│   └── en_AU.dic
├── en_CA
│   ├── en_CA.aff
│   └── en_CA.dic
├── en_GB
│   ├── en_GB.aff
│   └── en_GB.dic
├── en_US
│   ├── en_US.aff
│   └── en_US.dic
├── es
│   ├── es.aff
│   └── es.dic
├── fr
│   ├── fr.aff
│   └── fr.dic
├── hu
│   ├── hu.aff
│   └── hu.dic
├── it_IT
│   ├── it_IT.aff
│   └── it_IT.dic
├── nb_NO
│   ├── nb_NO.aff
│   └── nb_NO.dic
├── nl_NL
│   ├── nl_NL.aff
│   └── nl_NL.dic
├── nn
│   ├── nn.aff
│   └── nn.dic
├── pl
│   ├── pl.aff
│   └── pl.dic
├── pt_BR
│   ├── pt_BR.aff
│   └── pt_BR.dic
├── pt_PT
│   ├── pt_PT.aff
│   └── pt_PT.dic
├── sv_FI
│   ├── sv_FI.aff
│   └── sv_FI.dic
└── sv_SE
    ├── sv_SE.aff
    └── sv_SE.dic

19 directories, 38 files

Now we have to enable the Hunspell-dictionaries in /opt/ephox/application.conf and restart the spellchecking service.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
cat /opt/ephox/application.conf
ephox {
	allowed-origins {
		origins = [
			"http://cnx7-rh8-was.stoeps.home",
			"https://cnx7-rh8-was.stoeps.home",
			"https://cnx7-rh8.stoeps.home",
			"http://cnx7-rh8-was.stoeps.home:9081",
			"https://cnx7-rh8-was.stoeps.home:9444"
		]
	}
	spelling {
		hunspell-dictionaries-path = "/opt/ephox/hunspell-dictionaries"
	}
}

Don’t forget to enable spell checking in SharedDirectory/customization/javascript/tiny/editors/connections/config.js

...
// Set to false to disable Tiny's spell checking service in TinyMCE and Textbox.io.
spellingServiceEnabled: true,
...

Results

I tested with WinterTree (default) and Hunspell.

Testing some long words with WinterTree

Here we see the first long word underlined red, this generated the log message that the request needed longer than 1000ms

[7/12/22 17:35:35:152 UTC] 00000132 SystemOut     O 2022-07-12 17:35:35.152Z [ioapp-compute-1] INFO  ironbark - request [ uuid-47ac0625-f6dc-4876-8127-59b50595cd0f ] Response => Status: 200 OK (12 ms)
[7/12/22 17:35:35:212 UTC] 00000139 SystemOut     O 2022-07-12 17:35:35.212Z [ioapp-compute-4] DEBUG ironbark - request [ uuid-ac3ac5bf-eb12-4e72-a98b-a9c93f288093 ] Spellall (100.0 % - 1 / 1 incorrect)
[7/12/22 17:35:35:212 UTC] 00000139 SystemOut     O 2022-07-12 17:35:35.212Z [ioapp-compute-4] DEBUG ironbark - request [ uuid-ac3ac5bf-eb12-4e72-a98b-a9c93f288093 ] Spellall (1 words) (BEGIN)
[7/12/22 17:35:38:865 UTC] 00000139 SystemOut     O 2022-07-12 17:35:38.865Z [ioapp-compute-4] WARN  ironbark -

          request [ uuid-ac3ac5bf-eb12-4e72-a98b-a9c93f288093 ] PERFORMANCE_ALERT: word took longer than 1000 milliseconds. Took 3652 milliseconds.

          * Language: de
          * Number of characters: 48
          * Number of hyphens: 0
          * Number of apostrophes: 0
          * Number of suggestions generated: 16


[7/12/22 17:35:38:865 UTC] 00000139 SystemOut     O 2022-07-12 17:35:38.865Z [ioapp-compute-4] DEBUG ironbark - request [ uuid-ac3ac5bf-eb12-4e72-a98b-a9c93f288093 ] Spellall (1 words) (END)
[7/12/22 17:35:38:866 UTC] 00000139 SystemOut     O 2022-07-12 17:35:38.866Z [ioapp-compute-4] INFO  ironbark - request [ uuid-9347efc7-7705-4bcb-911c-1506d1d3b90a ] Response => Status: 200 OK (3726 ms)

We see the request needs 3.6 seconds and the word was 48 characters long.

Testing the same with Hunspell enabled

Here the result appeared faster and no warning message is logged.

[7/12/22 20:10:12:798 UTC] 00000134 SystemOut     O 2022-07-12 20:10:12.798Z [ioapp-compute-4] DEBUG ironbark - request [ uuid-0958072a-bf4c-4cb6-8acd-e2e7e8fb2870 ] Spellall (7 words) (BEGIN)
[7/12/22 20:10:12:798 UTC] 00000134 SystemOut     O 2022-07-12 20:10:12.798Z [ioapp-compute-4] DEBUG ironbark - request [ uuid-0958072a-bf4c-4cb6-8acd-e2e7e8fb2870 ] Spellall (7 words) (END)
[7/12/22 20:10:12:800 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.800Z [ioapp-compute-2] DEBUG ironbark - request [ uuid-199c0261-36e8-4173-807a-13a4a8ebce6b ] Spellall (0.0 % - 0 / 1 incorrect)
[7/12/22 20:10:12:801 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.800Z [ioapp-compute-2] DEBUG ironbark - request [ uuid-199c0261-36e8-4173-807a-13a4a8ebce6b ] Spellall (1 words) (BEGIN)
[7/12/22 20:10:12:801 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.801Z [ioapp-compute-2] DEBUG ironbark - request [ uuid-199c0261-36e8-4173-807a-13a4a8ebce6b ] Spellall (1 words) (END)
[7/12/22 20:10:12:801 UTC] 00000134 SystemOut     O 2022-07-12 20:10:12.801Z [ioapp-compute-4] INFO  ironbark - request [ uuid-4655f8a9-a466-4ad4-8874-d91f5fc8fc9b ] Response => Status: 200 OK (18 ms)
[7/12/22 20:10:12:801 UTC] 0000012f SystemOut     O 2022-07-12 20:10:12.801Z [ioapp-compute-1] INFO  ironbark - request [ uuid-9117e582-90f5-4246-bd17-56d00c12b975 ] Request => POST /tiny-spelling/2/suggestions
[7/12/22 20:10:12:803 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.803Z [ioapp-compute-2] INFO  ironbark - request [ uuid-938f2c71-701b-4312-9183-426b81829297 ] Response => Status: 200 OK (16 ms)
[7/12/22 20:10:12:803 UTC] 00000133 SystemOut     O 2022-07-12 20:10:12.803Z [ioapp-compute-3] INFO  ironbark - request [ uuid-67b5ef69-2079-43eb-908f-bd0017f715e2 ] Request => POST /tiny-spelling/2/suggestions
[7/12/22 20:10:12:808 UTC] 0000012f SystemOut     O 2022-07-12 20:10:12.808Z [ioapp-compute-1] DEBUG ironbark - request [ uuid-202c5596-0fae-4cc4-8793-7feb458b3b0c ] Incoming suggestions-V2 request for: 1 word(s) in language: de from API Key: none
[7/12/22 20:10:12:811 UTC] 0000012f SystemOut     O 2022-07-12 20:10:12.811Z [ioapp-compute-1] DEBUG ironbark - request [ uuid-202c5596-0fae-4cc4-8793-7feb458b3b0c ] Spellall (0.0 % - 0 / 1 incorrect)
[7/12/22 20:10:12:811 UTC] 0000012f SystemOut     O 2022-07-12 20:10:12.811Z [ioapp-compute-1] DEBUG ironbark - request [ uuid-202c5596-0fae-4cc4-8793-7feb458b3b0c ] Spellall (1 words) (BEGIN)
[7/12/22 20:10:12:812 UTC] 0000012f SystemOut     O 2022-07-12 20:10:12.811Z [ioapp-compute-1] DEBUG ironbark - request [ uuid-202c5596-0fae-4cc4-8793-7feb458b3b0c ] Spellall (1 words) (END)
[7/12/22 20:10:12:814 UTC] 0000012f SystemOut     O 2022-07-12 20:10:12.814Z [ioapp-compute-1] INFO  ironbark - request [ uuid-9117e582-90f5-4246-bd17-56d00c12b975 ] Response => Status: 200 OK (13 ms)
[7/12/22 20:10:12:817 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.817Z [ioapp-compute-2] DEBUG ironbark - request [ uuid-8f7cc6c4-60f7-4535-ad4a-04a49aa4b389 ] Incoming suggestions-V2 request for: 1 word(s) in language: de from API Key: none
[7/12/22 20:10:12:819 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.819Z [ioapp-compute-2] DEBUG ironbark - request [ uuid-8f7cc6c4-60f7-4535-ad4a-04a49aa4b389 ] Spellall (0.0 % - 0 / 1 incorrect)
[7/12/22 20:10:12:819 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.819Z [ioapp-compute-2] DEBUG ironbark - request [ uuid-8f7cc6c4-60f7-4535-ad4a-04a49aa4b389 ] Spellall (1 words) (BEGIN)
[7/12/22 20:10:12:819 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.819Z [ioapp-compute-2] DEBUG ironbark - request [ uuid-8f7cc6c4-60f7-4535-ad4a-04a49aa4b389 ] Spellall (1 words) (END)
[7/12/22 20:10:12:822 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.821Z [ioapp-compute-2] INFO  ironbark - request [ uuid-67b5ef69-2079-43eb-908f-bd0017f715e2 ] Response => Status: 200 OK (18 ms)
[7/12/22 20:10:12:854 UTC] 00000133 SystemOut     O 2022-07-12 20:10:12.854Z [ioapp-compute-3] INFO  ironbark - request [ uuid-0b4d740e-a06b-4a1a-a75e-e6a680a2d41d ] Request => POST /tiny-spelling/2/suggestions
[7/12/22 20:10:12:860 UTC] 00000135 SystemOut     O 2022-07-12 20:10:12.860Z [ioapp-compute-5] DEBUG ironbark - request [ uuid-15c6b13d-a3f7-4fbb-8b43-6bf6a6074b26 ] Incoming suggestions-V2 request for: 1 word(s) in language: de from API Key: none
[7/12/22 20:10:12:862 UTC] 00000135 SystemOut     O 2022-07-12 20:10:12.862Z [ioapp-compute-5] DEBUG ironbark - request [ uuid-15c6b13d-a3f7-4fbb-8b43-6bf6a6074b26 ] Spellall (0.0 % - 0 / 1 incorrect)
[7/12/22 20:10:12:862 UTC] 00000135 SystemOut     O 2022-07-12 20:10:12.862Z [ioapp-compute-5] DEBUG ironbark - request [ uuid-15c6b13d-a3f7-4fbb-8b43-6bf6a6074b26 ] Spellall (1 words) (BEGIN)
[7/12/22 20:10:12:862 UTC] 00000135 SystemOut     O 2022-07-12 20:10:12.862Z [ioapp-compute-5] DEBUG ironbark - request [ uuid-15c6b13d-a3f7-4fbb-8b43-6bf6a6074b26 ] Spellall (1 words) (END)
[7/12/22 20:10:12:864 UTC] 00000135 SystemOut     O 2022-07-12 20:10:12.864Z [ioapp-compute-5] INFO  ironbark - request [ uuid-0b4d740e-a06b-4a1a-a75e-e6a680a2d41d ] Response => Status: 200 OK (10 ms)

So for German spellchecking, it appears that Hunspell is working faster and giving suggestions even for long words. No, high CPU or waiting message has appeared so far. I never thought about these long German words until I read the answer from Tiny Support. When your users write documents in Connections in German, I would suggest you change the spellchecker too.

Author
Add a comment
Error
There was an error sending your comment, please try again.
Thank you!
Your comment has been submitted and will be published once it has been approved.

Your email address will not be published. Required fields are marked with *

Suggested Reading
Card image cap

The annual conference of DNUG took place in Constance from 22nd to 23rd of June 2022.

I attended the HCL Connections Roadmap session given by Rene Schimmer and David Strachan. They showed the updates for version 8 and beyond.

Created: Read in about 2 min
Card image cap

Some time back, I stumbled upon a flaw in HCL Connections 7 and 8 that allowed for user enumeration. This flaw could be exploited by anonymous users.

Created: Read in about 2 min
Card image cap

Last week, I had three systems with issues displaying the Top Updates in the Orient Me. So I tried to find out which applications and containers are involved in generating the content for this view.

Created: Read in about 4 min