2016-09-28 05:29:03 +02:00
|
|
|
Proofreading with Crowbook
|
|
|
|
==========================
|
|
|
|
|
|
|
|
Since version 0.9.1, Crowbook includes some proofreading features,
|
|
|
|
that can be enabled if you set one of the
|
|
|
|
|
|
|
|
* `output.proofread.html`
|
|
|
|
* `output.proofread.html_dir`
|
|
|
|
* `output.proofread.pdf`
|
|
|
|
|
|
|
|
output files. This allows you to generate different files for
|
|
|
|
publishing and proofreading (you probably don't want to publish a
|
|
|
|
version that highlights your grammar errors or your repetitions).
|
|
|
|
|
|
|
|
Current proofreading features are:
|
|
|
|
|
|
|
|
* repetition detection;
|
|
|
|
* grammar check;
|
|
|
|
* highlighting non-breaking spaces.
|
|
|
|
|
2016-09-29 01:23:31 +02:00
|
|
|
Enabling proofreading
|
|
|
|
---------------------
|
|
|
|
|
2016-10-21 19:36:50 +02:00
|
|
|
Since proofreading can take quite a lot of time, particularly for a long
|
2016-09-29 01:23:31 +02:00
|
|
|
book, it is disabled by default. You'll have to run
|
|
|
|
|
|
|
|
```bash
|
|
|
|
$ crowbook --proofread my.book
|
|
|
|
```
|
|
|
|
|
|
|
|
or
|
|
|
|
|
|
|
|
```bash
|
|
|
|
$ crowbook -p my.book
|
|
|
|
```
|
|
|
|
|
|
|
|
to generate proofreading copies. Alternatively, if you want it to be
|
|
|
|
activated each time you run `crowbook` on this book (which is *not*
|
|
|
|
recommanded for long books, particularly if you want to perform a
|
|
|
|
grammar check), you can set
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
proofread: true
|
|
|
|
```
|
|
|
|
|
|
|
|
in the book configuration file.
|
|
|
|
|
|
|
|
|
|
|
|
|
2016-09-28 05:29:03 +02:00
|
|
|
Repetition detection
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
Repetition detection is enabled with:
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
proofread.repetitions: true
|
|
|
|
```
|
|
|
|
|
|
|
|
It uses [Caribon](https://github.com/lise-henry/caribon) library to
|
|
|
|
detect the repetition in your text. Since the notion of a repetition
|
|
|
|
is relatively arbitrary, it is possible to adapt the settings. Default
|
|
|
|
are:
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
# The maximum distance between two identical words to
|
|
|
|
# consider them a repetition
|
|
|
|
proofread.repetitions.max_distance: 25
|
|
|
|
# The minimal number of occurences to consider it a repetition
|
|
|
|
proofread.repetitions.threshold: 2.0
|
|
|
|
# Ignore proper nouns (words starting by a capital,
|
|
|
|
# not at a beginning of a sentence)
|
|
|
|
proofread.repetitions.ignore_proper: true
|
|
|
|
|
|
|
|
# Activate fuzzy string matching
|
|
|
|
proofread.repetitions.fuzzy: true
|
|
|
|
# The maximal ratio of difference to consider
|
|
|
|
# that two words are identical
|
|
|
|
# (E.g., with 0.2, "Rust" and "Lust" won't be
|
|
|
|
# considered as the same word, but they will be with 0.5)
|
|
|
|
proofread.repetitions.fuzzy.threshold: 0.2
|
|
|
|
```
|
|
|
|
|
|
|
|
For more information, see
|
|
|
|
[Caribon](https://github.com/lise-henry/caribon)'s documentation.
|
|
|
|
|
|
|
|
|
|
|
|
> Currently, repetitions are not displayed in PDF proofreading
|
|
|
|
> output.
|
|
|
|
|
|
|
|
Grammar checking
|
|
|
|
----------------
|
|
|
|
|
|
|
|
Crowbook can also use [LanguageTool](https://languagetool.org/) to
|
|
|
|
detect grammar errors in your text. It is, however, a bit more
|
2016-10-21 19:36:50 +02:00
|
|
|
complex to activate.
|
2016-09-28 05:29:03 +02:00
|
|
|
|
|
|
|
First, you'll have to activate this feature in your book configuration
|
|
|
|
file:
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
# Activate language tool support
|
|
|
|
proofread.languagetool: true
|
|
|
|
# (Optional) Sets the port number to connect to (default below)
|
|
|
|
proofread.languagetool.port: 8081
|
|
|
|
```
|
|
|
|
|
|
|
|
You'll then have to download the stand-alone version of
|
|
|
|
[LanguageTool](https://languagetool.org/). It includes a server mode,
|
|
|
|
which you'll have to launch:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
$ java -cp languagetool-server.jar org.languagetool.server.HTTPServer --port 8081
|
|
|
|
```
|
|
|
|
|
|
|
|
You can also use the LanguageTool GUI (`languagetool.jar`) and start
|
|
|
|
the server from the menu "Text Checking -> Options". This also allows
|
|
|
|
you to configure LanguageTool more precisely by activating or
|
|
|
|
deactivating rules.
|
|
|
|
|
|
|
|
You can then run Crowbook, and it will highlight grammar errors in
|
|
|
|
HTML or PDF proofreading output files.
|
|
|
|
|
|
|
|
> Note: running a grammar check on a long book (like a novel) can take
|
|
|
|
> up to a few minutes.
|
|
|
|
|
|
|
|
|
|
|
|
Highlighting non-breaking spaces
|
|
|
|
--------------------------------
|
|
|
|
|
|
|
|
The last proofreading feature is a bit less important, but it can be
|
|
|
|
useful in some cases. It is is dis/activated by setting
|
2016-10-21 19:36:50 +02:00
|
|
|
`proofread.nb_spaces` to "true" or "false", and it will highlight
|
2016-09-28 05:29:03 +02:00
|
|
|
different sort of non-breaking spaces in HTML proofreading output
|
|
|
|
files. This can be useful in some cases, but it is mostly a debugging
|
|
|
|
feature to check that the french cleaner of Crowbook correctly
|
|
|
|
replaces spaces with correct non-breaking spaces in the relevant places.
|
|
|
|
|
|
|
|
|