zero-el - a Chinese input method framework built from scratch for Emacs
Table of Contents
Created On: 2019-09-01 Updated On: 2021-07-27
zero-el is a new Chinese input method framework for emacs. A pinyin input method is included in zero-el.
Here is a screenshot of zero-el pinyin input in action:
How to use this input method
zero-el is tested to work on Emacs 24+, Debian 9 stretch, Debian 10 buster, Debian 11 bullseye, Ubuntu 18.04 LTS.
zero-el works in linux xorg environment. Run on wayland requires some update on zero-panel.
zero-pinyin is the only input method included in zero-el. So it only works for pinyin users for now.
zero-pinyin depends on dbus, zero-pinyin-service, zero-panel.
zero-pinyin-service and zero-panel binary release (deb package) is only tested on debian and ubuntu system mentioned above. It should work on other debian derivatives released after debian 9 as well. Other OS doesn't have binary release at the moment. Building from source requires C libs and meson tool, See project README file for details on building from source code.
Here is how to install and use zero-el:
install dependencies
add my private apt repo:
# switch to root shell sudo -i # add my private apt repo: echo 'deb http://deb.emacsos.com/ debian main' > /etc/apt/sources.list.d/deb-emacsos.list # import my apt sign key curl https://deb.emacsos.com/apt-pub.key |apt-key add - # install zero-el dependencies apt update apt install zero-pinyin-service zero-panel
- install zero-input package from melpa
- install zero-input from melpa or melpa-stable. add melpa-stable to
package-archives, then run
M-x package-list-packages
andM-x package-install RET zero-input RET
add to
~/.emacs.d/init.el
:;; general emacs package config, if you don't have them already. (package-initialize) (setq package-archives '(("gnu" . "http://elpa.gnu.org/packages/") ("melpa-stable" . "http://stable.melpa.org/packages/") ;; ("melpa" . "https://melpa.org/packages/") )) ;; zero-el config (require 'zero-input) (zero-input-set-default-im 'pinyin) ;; config any key you like for zero-toggle, this is used to toggle zero input ;; method. (global-set-key (kbd "<f5>") 'zero-input-mode)
- install zero-input from melpa or melpa-stable. add melpa-stable to
package-archives, then run
use zero-el
Inside emacs buffer, press F5 and start typing pinyin. Use SPC or digit keys to select candidates. Type Chinese punctuation will select the first candidate automatically, then insert the punctuation. New phrases are saved to user phrase db automatically. You can press Ctrl+digit key to delete a phrase. Both built-in and user created phrase can be deleted.
By default only a few punctuation are mapped to Chinese punctuation. Press
C-c , ,
(M-x zero-input-cycle-punctuation-level) you can get more Chinese punctuation mapping.Zero-el support full-width mode, you can enable it in current buffer by
C-c , .
(M-x zero-input-toggle-full-width) or enable full-width for all buffers by default via(setq-default zero-input-full-width-p t)
Debug zero-pinyin problems
zero-pinyin-service and zero-panel are dbus services that is auto started on
first service invocation. You can check their log in /var/log/syslog
or via
journalctl
command.
You can also test zero-pinyin-service and zero-panel by running them on console. Since they have single instance check on startup, kill existing process before running on console.
A word on phrase table and sogou cell dict
zero-pinyin use the phrase table in libpyzy (specifically
/usr/share/pyzy/db/open-phrase.db
) to provide Chinese characters and
phrases. libpyzy is the lib from ibus-pinyin project.
Sogou cell dict are user contributed phrase tables for sogou pinyin. Sogou cell dict can be easily imported to libpyzy db, thus is supported by zero-pinyin. But I will leave it for another post.
Why zero-el is developed
Emacs Chinese input method support has been poor. For a long time, only XIM is supported. Sometimes XIM only works in CJK locales for emacs. When ibus-daemon or SCIM daemon is restarted, existing emacs frame will not be able to trigger IM via ibus/SCIM trigger key.
Emacs built-in quail based input method is limited in feature and usability for Chinese input. For example, you can't type 强 (qiang), because it only have a "jiang" pronunciation in emacs chinese-py. Other problem of chinese-py includes limited phrase db. Pagination, cursor control and candidate commit is more difficult compared to other mainstream IMs in China. I do have a pinyin and wubi input method file for use with quail from sometime ago. They offer better phrase table compared to built-in chinese-py input method.
ibus-el was a emacs client that works with ibus daemon to provide input method to emacs. But ibus-el has stopped working and it is difficult to debug because lack of developer document. ibus's python binding no longer work for latest ibus release make it difficult to fix ibus-el. According to this post, it's unlikely ibus-el will work again.
So to have a more modern input method support in emacs, I started zero-el from scratch. With emacs minor mode and dbus based RPC (remote procedure call), I was able to make zero-framework and zero-pinyin a reality.
- Edit: As mentioned in the comments section, I discovered pyim after I created zero-el. Pyim is a Chinese input method for emacs. It has a much longer history (since 2008) and supports much more input methods compared to zero-el. You may wish to try both input methods and choose the one you prefer.
zero-el architecture
Notes:
- zero-panel is made to only receive a call but not send a response. So it keeps no local state. Candidate selection is done by emacs side, not in panel side. Click to select candidate on panel is deliberately not implemented. This is to keep panel code simple.
- I used an ad-hoc class implementation because I started coding a concrete input method before I realize much code can be reused for other Chinese input method. Then it's too late to rewrite everything in a more formal elisp OO system. I think using a proper OO system would clean up the code.
How to write an input method for emacs?
An input method basically converts some sequence of characters to some other characters. Emacs has powerful key handling and text handling facility, so it is quite easy to write simple input methods in emacs.
Here is the proof of concept input method written as emacs minor mode. This is how I started.
;;; -*- lexical-binding: t -*- ;;; zero-quickdial --- quickdial input method written as an emacs minor mode. (defun zero-quickdial-insert-one () (interactive) (insert "one")) (defun zero-quickdial-insert-two () (interactive) (insert "two")) (defun zero-quickdial-insert-three () (interactive) (insert "three")) (defvar zero-quickdial-mode-map '(keymap (49 . zero-quickdial-insert-one) (50 . zero-quickdial-insert-two) (51 . zero-quickdial-insert-three)) "zero-quickdial-mode keymap") (define-minor-mode zero-quickdial-mode "a simple input method written as an emacs minor mode" nil " Quickdial" zero-quickdial-mode-map) (provide 'zero-quickdial)
To use this input method,
Turn on the IM by M-x zero-quickdial-mode
.
Now, type 1 will insert one, type 2 will insert two, type 3 will insert three.
Turn off the IM by M-x zero-quickdial-mode
.
How to develop a new input method based on zero-framework
To get a quick start, check the zero-input-table.el file, which includes a minimal input method based on zero-framework. This is zero-input-table.el key code with some comments and tests removed:
;; -*- lexical-binding: t -*- ;; a demo table based input method based on zero-framework.el ;;============== ;; dependencies ;;============== (require 'zero-input-framework) ;;=============================== ;; basic data and emacs facility ;;=============================== (defvar zero-input-table-table nil "The table used by zero-input-table input method, map string to string.") (defvar zero-input-table-sequence-initials nil "Used in `zero-input-table-can-start-sequence'.") ;;===================== ;; key logic functions ;;===================== (defun zero-input-table-sort-key (lhs rhs) "A predicate function to sort candidates. Return t if LHS should sort before RHS." (string< (car lhs) (car rhs))) (defun zero-input-table-build-candidates (preedit-str &optional _fetch-size) "Build candidates by looking up PREEDIT-STR in `zero-input-table-table'." (mapcar 'cdr (sort (cl-remove-if-not (lambda (pair) (string-prefix-p preedit-str (car pair))) zero-input-table-table) 'zero-input-table-sort-key))) (defun zero-input-table-can-start-sequence (ch) "Return t if char CH can start a preedit sequence." (member (make-string 1 ch) zero-input-table-sequence-initials)) ;;=============================== ;; register IM to zero framework ;;=============================== (zero-input-register-im 'zero-input-table '((:build-candidates . zero-input-table-build-candidates) (:can-start-sequence . zero-input-table-can-start-sequence))) ;;============ ;; public API ;;============ (defun zero-input-table-set-table (alist) "Set the conversion table. the ALIST should be a list of (key . value) pairs. when user type \(part of) key, the IM will show all matching value. To use demo data, you can call: \(zero-input-table-set-table \\='((\"phone\" . \"18612345678\") (\"mail\" . \"foo@example.com\") (\"map\" . \"https://ditu.amap.com/\") (\"m\" . \"https://msdn.microsoft.com/en-us\") (\"address\" . \"123 Happy Street\")))" (setq zero-input-table-table alist) (setq zero-input-table-sequence-initials (delete-dups (mapcar (lambda (pair) (substring (car pair) 0 1)) zero-input-table-table)))) ;;=========== ;; test data ;;=========== (unless zero-input-table-table (zero-input-table-set-table '(("phone" . "18612345678") ("mail" . "foo@example.com") ("map" . "https://ditu.amap.com/") ("m" . "https://msdn.microsoft.com/en-us") ("address" . "123 Happy Street")))) (provide 'zero-input-table)
To write a real world input method, you need to understand how zero-framework works. There are a few functions that you can override.
zero-framework manages input state and input logic via FSM. This is the key to understand what it does and what feature it provides.
Here is the FSM implemented in zero-input-framework.el
state | action | next state | trigger action |
---|---|---|---|
IM_OFF | M-x zero-input-on or zero-input-mode | IM_WAITING_INPUT | turn on minor mode |
IM_WAITING_INPUT | type M-x zero-input-off or zero-input-mode | IM_OFF | turn off minor mode |
IM_WAITING_INPUT | type character that can start a sequence | IM_PREEDITING | update preedit str, show candidate list |
IM_WAITING_INPUT | type character that can not start a sequence | IM_WAITING_INPUT | insert character |
IM_WAITING_INPUT | type [,.?!\:] | IM_WAITING_INPUT | insert Chinese punctuation character |
IM_PREEDITING | type character (that is not SPC, digit keys) | IM_PREEDITING | update preedit str, update and show candidate list |
IM_PREEDITING | type RET | IM_WAITING_INPUT | commit preedit str, hide candidate list, reset preedit str |
IM_PREEDITING | type SPC | IM_WAITING_INPUT | commit first candidate or preedit str, reset preedit str |
IM_PREEDITING | type digit keys | IM_WAITING_INPUT | commit nth candidate if it exists, otherwise, append to preedit str |
IM_PREEDITING | type M-x zero-input-off or zero-input-mode | IM_OFF | reset IM, turn off minor mode |
IM_PREEDITING | type <backspace>, when preedit str is longer than 1 | IM_PREEDITING | update preedit str, update and show candidate list |
IM_PREEDITING | type <backspace>, when preedit str is length 1 | IM_WAITING_INPUT | reset IM |
IM_PREEDITING | focus in | IM_PREEDITING | show candidat list |
IM_PREEDITING | focus out | IM_PREEDITING | hide panel |
IM_PREEDITING | type [,.?!\:] | IM_WAITING_INPUT | commit first candidate or preedit str, insert Chinese punctuation |
IM_PREEDITING | type -/= | IM_PREEDITING | candiate page up/down |
- State meaning
- IM_OFF
- zero minor mode is off. zero should not do any preedit try nor do punctuation translation.
- IM_WAITING_INPUT
- User started typing in zero-mode
- IM_PREEDITING
- User typed some preedit character, but neither preedit string nor candidate is committed yet.
- For the functions you can override in input method, the docstring is in
their corresponding variable.
- zero-build-candidates-func
- zero-build-candidates-async-func
- zero-can-start-sequence-func
- zero-handle-preedit-char-func
- zero-get-preedit-str-for-panel-func
- zero-backspace-func
- zero-preedit-start-func
- zero-preedit-end-func
There is also init and shutdown function, which doesn't have a variable holding docstring.
init function is called when this im is turned on. shutdown function is called when this im is turned off.
Where to go from here
- Make zero-el and dependencies easier to install.
- config CI and CD for zero-pinyin-service and zero-panel.
- Write more document for zero-framework and zero-pinyin.
- Make zero-pinyin-service better for pinyin input.
- Make zero-panel prettier.
Known issues
- Emacs dbus binding doesn't work in windows with windbus. So zero-pinyin can't be ported to windows.
- zero-panel doesn't work in GNOME 3 with GNOME Shell. This can be fixed, I haven't looked into it.