Change spellchecking to hunspell in TinyMCE

Created:
Author: Christoph Stoettner
Read in about 8 min · 1692 words

Steamship cruising on Lake Wakatipu, Queenstown

Photo by Michael | Unsplash

The last years I had issues with application servers using large amount of CPU and even hanging application servers running the Tiny Spellchecking service. It ended with disabled spellchecking in the Tiny editors config.js.

SharedDirectory/customization/javascript/tiny/editors/connections/config.js

...
// Set to false to disable Tiny's spell checking service in TinyMCE and Textbox.io.
spellingServiceEnabled: false,
...

Now after updating to the actual editor version TinyMCE 5.10.2 we decided to reenable the spellchecker and the first days it looked like, that the issue was really fixed. Sadly after about a week the first application server started to use 800% CPU just for the server hosting the spelling service.

In the application server logs we found messages like:

SystemOut.log of the application server running spellchecking service

So first of all we see debug messages without enabling a trace and on the top of the image we see that a request ran over 1000ms.

Support sent me the steps to disable the debug messages:

  1. Create a file /opt/ephox/logback.xml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
   <appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
       <target>System.out</target>
       <encoder>
           <pattern>%date{yyyy-MM-dd HH:mm:ss.SSSX} [%thread] %-5level %logger{36} - %msg%n</pattern>
       </encoder>
   </appender>
   <logger name="ironbark" level="WARN"/>
   <root level="INFO">
       <appender-ref ref="CONSOLE"/>
   </root>
</configuration>

Important is line 9, this is set to DEBUG for TinyMCE 5.10.2, but WARN or ERROR will prevent these log messages.

  1. Add a custom JVM property (Server > Server Types > WebSphere Application Servers -> server name > Process Definition > Java Virtual Machine > Custom Properties) to the application server, where you installed the spellchecker.
logback.configurationFile: /opt/ephox/logback.xml

After this the performance was slightly better, but still not good.

Today I got the following update from Tiny:

Broadly, we believe that WinterTree spelling library is having problems with long words with possible hyphens, especially in German. In this case, we recommending trying the Hunspell library instead.

We can see that the problem language is always German, and the number of characters is higher than 20. Due to implementation aspects with how WinterTree’s spelling engine works, these cases can be particularly problematic.

The most egregious offender is:

Took 25270 milliseconds.

Which meant that it took over 25 seconds to generate suggestions for 1 word in a document. As you can imagine, when this starts happening, sending lots of words becomes a problem. However, there aren’t many words that take more than 1 second to generate, because this is the entire list in the logs sent to us.

In general, you could likely avoid this behavior by using Hunspell libraries, particularly for German. Here is our documentation about adding Hunspell dictionaries to Spellchecker Pro. You likely have specific separate instructions for setting up Hunspell, but it will be effectively the same under the hood, as it’s a server-only setting.

https://www.tiny.cloud/docs/tinymce/6/self-hosting-hunspell/

Tiny/HCL Support

So here I could stop and point you to support, but I had some issues during the activation of hunspell so far.

First of all the webpage says “Tiny provides two downloadable bundles of Hunspell dictionaries.” which I couldn’t find. So I searched for other download options. The best match were the dictionaries included with LibreOffice : https://github.com/libreoffice/dictionaries , but the folder structure and naming does not match the one requested by Tiny.

#!/usr/bin/env bash

git clone https://github.com/LibreOffice/dictionaries.git /tmp/dictionaries
for i in af_ZA da de_DE en_AU en_CA en_GB en_US es fr hu it_IT nb_NO nl_NL nn pl pt_BR pt_PT sv_FI sv_SE ; do
  mkdir -p /opt/ephox/hunspell-dictionaries/$i
  find /tmp/dictionaries -iname $i*.aff -exec cp {} /opt/ephox/hunspell-dictionaries/$i/$i.aff \;
  find /tmp/dictionaries -iname $i*.dic -exec cp {} /opt/ephox/hunspell-dictionaries/$i/$i.dic \;
done

This script creates the expected folder structure and copies the dictionaries to the right place.

tree /opt/ephox/hunspell-dictionaries/
/opt/ephox/hunspell-dictionaries/
├── af_ZA
│   ├── af_ZA.aff
│   └── af_ZA.dic
├── da
│   ├── da.aff
│   └── da.dic
├── de_DE
│   ├── de_DE.aff
│   └── de_DE.dic
├── en_AU
│   ├── en_AU.aff
│   └── en_AU.dic
├── en_CA
│   ├── en_CA.aff
│   └── en_CA.dic
├── en_GB
│   ├── en_GB.aff
│   └── en_GB.dic
├── en_US
│   ├── en_US.aff
│   └── en_US.dic
├── es
│   ├── es.aff
│   └── es.dic
├── fr
│   ├── fr.aff
│   └── fr.dic
├── hu
│   ├── hu.aff
│   └── hu.dic
├── it_IT
│   ├── it_IT.aff
│   └── it_IT.dic
├── nb_NO
│   ├── nb_NO.aff
│   └── nb_NO.dic
├── nl_NL
│   ├── nl_NL.aff
│   └── nl_NL.dic
├── nn
│   ├── nn.aff
│   └── nn.dic
├── pl
│   ├── pl.aff
│   └── pl.dic
├── pt_BR
│   ├── pt_BR.aff
│   └── pt_BR.dic
├── pt_PT
│   ├── pt_PT.aff
│   └── pt_PT.dic
├── sv_FI
│   ├── sv_FI.aff
│   └── sv_FI.dic
└── sv_SE
    ├── sv_SE.aff
    └── sv_SE.dic

19 directories, 38 files

Now we have to enable the hunspell-dictionaries in /opt/ephox/application.conf and restart the spellchecking service.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
cat /opt/ephox/application.conf 
ephox {
	allowed-origins {
		origins = [
			"http://cnx7-rh8-was.stoeps.home",
			"https://cnx7-rh8-was.stoeps.home",
			"https://cnx7-rh8.stoeps.home",
			"http://cnx7-rh8-was.stoeps.home:9081",
			"https://cnx7-rh8-was.stoeps.home:9444"
		]
	}
	http {
		websphere.use-ssl-config = false
		trust-all-cert = true
	}
	spelling {
		hunspell-dictionaries-path: "/opt/ephox/hunspell-dictionaries"
	}
}

Results

I tested with WinterTree (default) and hunspell.

Testing some long words with WinterTree

Here we see the first long word underlined red, this generated the log message that the request needed longer than 1000ms

[7/12/22 17:35:35:152 UTC] 00000132 SystemOut     O 2022-07-12 17:35:35.152Z [ioapp-compute-1] INFO  ironbark - request [ uuid-47ac0625-f6dc-4876-8127-59b50595cd0f ] Response => Status: 200 OK (12 ms)
[7/12/22 17:35:35:212 UTC] 00000139 SystemOut     O 2022-07-12 17:35:35.212Z [ioapp-compute-4] DEBUG ironbark - request [ uuid-ac3ac5bf-eb12-4e72-a98b-a9c93f288093 ] Spellall (100.0 % - 1 / 1 incorrect)
[7/12/22 17:35:35:212 UTC] 00000139 SystemOut     O 2022-07-12 17:35:35.212Z [ioapp-compute-4] DEBUG ironbark - request [ uuid-ac3ac5bf-eb12-4e72-a98b-a9c93f288093 ] Spellall (1 words) (BEGIN)
[7/12/22 17:35:38:865 UTC] 00000139 SystemOut     O 2022-07-12 17:35:38.865Z [ioapp-compute-4] WARN  ironbark -

          request [ uuid-ac3ac5bf-eb12-4e72-a98b-a9c93f288093 ] PERFORMANCE_ALERT: word took longer than 1000 milliseconds. Took 3652 milliseconds.

          * Language: de
          * Number of characters: 48
          * Number of hyphens: 0
          * Number of apostrophes: 0
          * Number of suggestions generated: 16


[7/12/22 17:35:38:865 UTC] 00000139 SystemOut     O 2022-07-12 17:35:38.865Z [ioapp-compute-4] DEBUG ironbark - request [ uuid-ac3ac5bf-eb12-4e72-a98b-a9c93f288093 ] Spellall (1 words) (END)
[7/12/22 17:35:38:866 UTC] 00000139 SystemOut     O 2022-07-12 17:35:38.866Z [ioapp-compute-4] INFO  ironbark - request [ uuid-9347efc7-7705-4bcb-911c-1506d1d3b90a ] Response => Status: 200 OK (3726 ms)

We see the request needs 3.6 seconds and the word was 48 characters long.

Testing the same with hunspell enabled

Here the result appeared faster and no warning message is logged.

[7/12/22 20:10:12:798 UTC] 00000134 SystemOut     O 2022-07-12 20:10:12.798Z [ioapp-compute-4] DEBUG ironbark - request [ uuid-0958072a-bf4c-4cb6-8acd-e2e7e8fb2870 ] Spellall (7 words) (BEGIN)
[7/12/22 20:10:12:798 UTC] 00000134 SystemOut     O 2022-07-12 20:10:12.798Z [ioapp-compute-4] DEBUG ironbark - request [ uuid-0958072a-bf4c-4cb6-8acd-e2e7e8fb2870 ] Spellall (7 words) (END)
[7/12/22 20:10:12:800 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.800Z [ioapp-compute-2] DEBUG ironbark - request [ uuid-199c0261-36e8-4173-807a-13a4a8ebce6b ] Spellall (0.0 % - 0 / 1 incorrect)
[7/12/22 20:10:12:801 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.800Z [ioapp-compute-2] DEBUG ironbark - request [ uuid-199c0261-36e8-4173-807a-13a4a8ebce6b ] Spellall (1 words) (BEGIN)
[7/12/22 20:10:12:801 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.801Z [ioapp-compute-2] DEBUG ironbark - request [ uuid-199c0261-36e8-4173-807a-13a4a8ebce6b ] Spellall (1 words) (END)
[7/12/22 20:10:12:801 UTC] 00000134 SystemOut     O 2022-07-12 20:10:12.801Z [ioapp-compute-4] INFO  ironbark - request [ uuid-4655f8a9-a466-4ad4-8874-d91f5fc8fc9b ] Response => Status: 200 OK (18 ms)
[7/12/22 20:10:12:801 UTC] 0000012f SystemOut     O 2022-07-12 20:10:12.801Z [ioapp-compute-1] INFO  ironbark - request [ uuid-9117e582-90f5-4246-bd17-56d00c12b975 ] Request => POST /tiny-spelling/2/suggestions
[7/12/22 20:10:12:803 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.803Z [ioapp-compute-2] INFO  ironbark - request [ uuid-938f2c71-701b-4312-9183-426b81829297 ] Response => Status: 200 OK (16 ms)
[7/12/22 20:10:12:803 UTC] 00000133 SystemOut     O 2022-07-12 20:10:12.803Z [ioapp-compute-3] INFO  ironbark - request [ uuid-67b5ef69-2079-43eb-908f-bd0017f715e2 ] Request => POST /tiny-spelling/2/suggestions
[7/12/22 20:10:12:808 UTC] 0000012f SystemOut     O 2022-07-12 20:10:12.808Z [ioapp-compute-1] DEBUG ironbark - request [ uuid-202c5596-0fae-4cc4-8793-7feb458b3b0c ] Incoming suggestions-V2 request for: 1 word(s) in language: de from API Key: none
[7/12/22 20:10:12:811 UTC] 0000012f SystemOut     O 2022-07-12 20:10:12.811Z [ioapp-compute-1] DEBUG ironbark - request [ uuid-202c5596-0fae-4cc4-8793-7feb458b3b0c ] Spellall (0.0 % - 0 / 1 incorrect)
[7/12/22 20:10:12:811 UTC] 0000012f SystemOut     O 2022-07-12 20:10:12.811Z [ioapp-compute-1] DEBUG ironbark - request [ uuid-202c5596-0fae-4cc4-8793-7feb458b3b0c ] Spellall (1 words) (BEGIN)
[7/12/22 20:10:12:812 UTC] 0000012f SystemOut     O 2022-07-12 20:10:12.811Z [ioapp-compute-1] DEBUG ironbark - request [ uuid-202c5596-0fae-4cc4-8793-7feb458b3b0c ] Spellall (1 words) (END)
[7/12/22 20:10:12:814 UTC] 0000012f SystemOut     O 2022-07-12 20:10:12.814Z [ioapp-compute-1] INFO  ironbark - request [ uuid-9117e582-90f5-4246-bd17-56d00c12b975 ] Response => Status: 200 OK (13 ms)
[7/12/22 20:10:12:817 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.817Z [ioapp-compute-2] DEBUG ironbark - request [ uuid-8f7cc6c4-60f7-4535-ad4a-04a49aa4b389 ] Incoming suggestions-V2 request for: 1 word(s) in language: de from API Key: none
[7/12/22 20:10:12:819 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.819Z [ioapp-compute-2] DEBUG ironbark - request [ uuid-8f7cc6c4-60f7-4535-ad4a-04a49aa4b389 ] Spellall (0.0 % - 0 / 1 incorrect)
[7/12/22 20:10:12:819 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.819Z [ioapp-compute-2] DEBUG ironbark - request [ uuid-8f7cc6c4-60f7-4535-ad4a-04a49aa4b389 ] Spellall (1 words) (BEGIN)
[7/12/22 20:10:12:819 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.819Z [ioapp-compute-2] DEBUG ironbark - request [ uuid-8f7cc6c4-60f7-4535-ad4a-04a49aa4b389 ] Spellall (1 words) (END)
[7/12/22 20:10:12:822 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.821Z [ioapp-compute-2] INFO  ironbark - request [ uuid-67b5ef69-2079-43eb-908f-bd0017f715e2 ] Response => Status: 200 OK (18 ms)
[7/12/22 20:10:12:854 UTC] 00000133 SystemOut     O 2022-07-12 20:10:12.854Z [ioapp-compute-3] INFO  ironbark - request [ uuid-0b4d740e-a06b-4a1a-a75e-e6a680a2d41d ] Request => POST /tiny-spelling/2/suggestions
[7/12/22 20:10:12:860 UTC] 00000135 SystemOut     O 2022-07-12 20:10:12.860Z [ioapp-compute-5] DEBUG ironbark - request [ uuid-15c6b13d-a3f7-4fbb-8b43-6bf6a6074b26 ] Incoming suggestions-V2 request for: 1 word(s) in language: de from API Key: none
[7/12/22 20:10:12:862 UTC] 00000135 SystemOut     O 2022-07-12 20:10:12.862Z [ioapp-compute-5] DEBUG ironbark - request [ uuid-15c6b13d-a3f7-4fbb-8b43-6bf6a6074b26 ] Spellall (0.0 % - 0 / 1 incorrect)
[7/12/22 20:10:12:862 UTC] 00000135 SystemOut     O 2022-07-12 20:10:12.862Z [ioapp-compute-5] DEBUG ironbark - request [ uuid-15c6b13d-a3f7-4fbb-8b43-6bf6a6074b26 ] Spellall (1 words) (BEGIN)
[7/12/22 20:10:12:862 UTC] 00000135 SystemOut     O 2022-07-12 20:10:12.862Z [ioapp-compute-5] DEBUG ironbark - request [ uuid-15c6b13d-a3f7-4fbb-8b43-6bf6a6074b26 ] Spellall (1 words) (END)
[7/12/22 20:10:12:864 UTC] 00000135 SystemOut     O 2022-07-12 20:10:12.864Z [ioapp-compute-5] INFO  ironbark - request [ uuid-0b4d740e-a06b-4a1a-a75e-e6a680a2d41d ] Response => Status: 200 OK (10 ms)

So for german spellchecking it seems that hunspell is working faster and gives suggestions even for long words. No high cpu or waiting message appear so far. I never thought about these long german words until I read the answer from Tiny support. When your users write documents in Connections in german, I would suggest you cnange the spellchecker too.

Author
Add a comment
Error
There was an error sending your comment, please try again.
Thank you!
Your comment has been submitted and will be published once it has been approved.

Your email address will not be published. Required fields are marked with *

Suggested Reading
Card image cap

The annual conference of DNUG took place in Constance from 22nd to 23rd of June 2022.

I attended the HCL Connections Roadmap session given by Rene Schimmer and David Strachan. They showed the updates for version 8 and beyond.

Created: Read in about 2 min
Card image cap

During a migration from Cognos Metrics to Elasticsearch Metrics, I had some issues with the index. So I wanted to create a backup of the already migrated data and start over from scratch.

The official documentation has an article on the topic: Backing up and restoring data for Elasticsearch-based components , but I had to slightly adjust the commands to get a successful snapshot.

Created:
Last Update:
Read in about 6 min
Card image cap

I created a git repository with some smaller CSS files to fix some annoyances within HCL Connections.

I started with this to prevent Orient Me to load fonts from external URLs or Elasticsearch Metrics to break the UI on larger screens. These issues are solved after the last updates I got from support, but Blogs and Tailored Experience Wizard can be improved with some simple rules.

Created: Read in about 3 min