zero-el - a Chinese input method framework built from scratch for emacs

Table of Contents

Created On: 2019-09-01 Updated On: 2019-09-02

zero-el is a new Chinese input method framework for emacs. A pinyin input method is included in zero-el.

Here is a screenshot of zero-el pinyin input in action:

zero-el in action

How to use this input method

zero-el is tested to work on emacs 24+, debian 9 stretch, debian 10 buster, ubuntu 18.04 LTS.

zero-el works in linux xorg environment. Wayland is not tested.

zero-pinyin is the only input method included in zero-el. So it only works for pinyin users for now.

zero-pinyin depends on dbus, zero-pinyin-service, zero-panel.

zero-pinyin-service and zero-panel only have binary release (deb package) that is tested on debian and ubuntu system mentioned above. It should work on other debian derivatives released after debian 9 as well. Other OS are not supported at the moment.

Here is how to install and use zero-el:

  1. install dependencies

    add my private apt repo:

    # switch to root shell
    sudo -i
    # add my private apt repo:
    echo 'deb https://apt.emacsos.com/debian /' > /etc/apt/sources.list.d/emacsos.list
    # import my apt sign key
    curl https://apt.emacsos.com/debian/apt-pub.key | apt-key add -
    # install zero-el dependencies
    apt update
    apt install zero-pinyin-service zero-panel
    
  2. install zero-el
    • Get latest zero-el release from gitlab tags page, unzip to ~/fromsource/zero. Or you can clone the git repo.
      mkdir -p ~/fromsource/
      cd ~/fromsource/
      git clone https://gitlab.emacsos.com/sylecn/zero-el.git zero
      
    • install s from melpa or melpa-stable. add melpa-stable to package-archives, then run M-x package-list-packages and M-x package-install RET s RET
    • add to ~/.emacs.d/init.el:
      ;; general emacs package config, if you don't have them already.
      (package-initialize)
      (setq package-archives
            '(("gnu" . "http://elpa.gnu.org/packages/")
              ("melpa-stable" . "http://stable.melpa.org/packages/")
              ;; ("melpa" . "https://melpa.org/packages/")
              ))
      
      ;; zero-el config
      (add-to-list 'load-path "~/fromsource/zero/")
      (require 'zero-pinyin)
      (zero-set-default-im 'pinyin)
      ;; config any key you like for zero-toggle, this is used to toggle zero input
      ;; method.
      (global-set-key (kbd "<f1>") 'zero-toggle)
      
  3. use zero-el

    Inside emacs buffer, press F1 and start typing pinyin. Use SPC or digit keys to select candidates. Type Chinese punctuation will select the first candidate automatically, then insert the punctuation. New phrases are saved to user phrase db automatically. You can press Ctrl+digit key to delete a phrase. Both built-in and user created phrase can be deleted.

    By default only a few punctuation are mapped to Chinese punctuation. Press C-. to zero-cycle-punctuation-level you can get more Chinese punctuation mapping.

Debug zero-pinyin problems

zero-pinyin-service and zero-panel are dbus services that is auto started on first service invocation. You can check their log in /var/log/syslog or via journalctl command.

You can also test zero-pinyin-service and zero-panel by running them on console. Since they have single instance check on startup, kill existing process before running on console.

A word on phrase table and sogou cell dict

zero-pinyin use the phrase table in libpyzy (specifically /usr/share/pyzy/db/open-phrase.db) to provide Chinese characters and phrases. libpyzy is the lib from ibus-pinyin project.

Sogou cell dict are user contributed phrase tables for sogou pinyin. Sogou cell dict can be easily imported to libpyzy db, thus is supported by zero-pinyin. But I will leave it for another post.

Why zero-el is developed

Emacs Chinese input method support has been poor. For a long time, only XIM is supported. Sometimes XIM only works in CJK locales for emacs. When ibus-daemon or SCIM daemon is restarted, existing emacs frame will not be able to trigger IM via ibus/SCIM trigger key.

Emacs built-in quail based input method is limited in feature and usability for Chinese input. For example, you can't type 强 (qiang), because it only have a "jiang" pronunciation in emacs chinese-py. Other problem of chinese-py includes limited phrase db. Pagination, cursor control and candidate commit is more difficult compared to other mainstream IMs in China. I do have a pinyin and wubi input method file for use with quail from sometime ago. They offer better phrase table compared to built-in chinese-py input method.

ibus-el was a emacs client that works with ibus daemon to provide input method to emacs. But ibus-el has stopped working and it is difficult to debug because lack of developer document. ibus's python binding no longer work for latest ibus release make it difficult to fix ibus-el. According to this post, it's unlikely ibus-el will work again.

So to have a more modern input method support in emacs, I started zero-el from scratch. With emacs minor mode and dbus based RPC (remote procedure call), I was able to make zero-framework and zero-pinyin a reality.

zero-el architecture

zero-el architecture

Notes:

  • zero-panel is made to only receive a call but not send a response. So it keeps no local state. Candidate selection is done by emacs side, not in panel side. Click to select candidate on panel is deliberately not implemented. This is to keep panel code simple.
  • I used an ad-hoc class implementation because I started coding a concrete input method before I realize much code can be reused for other Chinese input method. Then it's too late to rewrite everything in a more formal elisp OO system. I think using a proper OO system would clean up the code.

How to write an input method for emacs?

An input method basically converts some sequence of characters to some other characters. Emacs has powerful key handling and text handling facility, so it is quite easy to write simple input methods in emacs.

Here is the proof of concept input method written as emacs minor mode. This is how I started.

;;; -*- lexical-binding: t -*-
;;; zero-quickdial --- quickdial input method written as an emacs minor mode.

(defun zero-quickdial-insert-one ()
  (interactive)
  (insert "one"))

(defun zero-quickdial-insert-two ()
  (interactive)
  (insert "two"))

(defun zero-quickdial-insert-three ()
  (interactive)
  (insert "three"))

(defvar zero-quickdial-mode-map
  '(keymap
    (49 . zero-quickdial-insert-one)
    (50 . zero-quickdial-insert-two)
    (51 . zero-quickdial-insert-three))
  "zero-quickdial-mode keymap")

(define-minor-mode zero-quickdial-mode
  "a simple input method written as an emacs minor mode"
  nil
  " Quickdial"
  zero-quickdial-mode-map)

(provide 'zero-quickdial)

To use this input method,

Turn on the IM by M-x zero-quickdial-mode.

Now, type 1 will insert one, type 2 will insert two, type 3 will insert three.

Turn off the IM by M-x zero-quickdial-mode.

How to develop a new input method based on zero-framework

To get a quick start, check the zero-table.el file, which includes a minimal input method based on zero-framework. This is zero-table.el key code with some comments and tests removed:

;; -*- lexical-binding: t -*-
;; a demo table based input method based on zero-framework.el

;;==============
;; dependencies
;;==============

(require 's)
(require 'zero-framework)

;;===============================
;; basic data and emacs facility
;;===============================

(defvar zero-table-table nil "zero-table's table, map string to string")
(defvar zero-table-sequence-initials nil "used in `zero-table-can-start-sequence'")

;;=====================
;; key logic functions
;;=====================

(defun zero-table-sort-key (lhs rhs)
  "a predicate function to sort candidates. return t if lhs
should sort before rhs."
  (string< (car lhs) (car rhs)))

(defun zero-table-build-candidates (preedit-str &optional _fetch-size)
  (mapcar 'cdr (sort (cl-remove-if-not (lambda (pair) (string-prefix-p preedit-str (car pair))) zero-table-table) 'zero-table-sort-key)))

(ert-deftest zero-table-build-candidates ()
  (should (equal (zero-table-build-candidates "ph") '("18612345678")))
  (should (equal (zero-table-build-candidates "m") '("https://msdn.microsoft.com/en-us"
                                                     "foo@example.com"
                                                     "https://ditu.amap.com/"))))

(defun zero-table-can-start-sequence (ch)
  "return t if char ch can start a preedit sequence."
  (member (make-string 1 ch) zero-table-sequence-initials))

;;===============================
;; register IM to zero framework
;;===============================

(zero-register-im
 'zero-table
 '((:build-candidates . zero-table-build-candidates)
   (:can-start-sequence . zero-table-can-start-sequence)))

;;============
;; public API
;;============

(defun zero-table-set-table (alist)
  "set the conversion table.

the alist should be a list of (key . value) pairs. when user type
(part of) key, the IM will show all matching value.

e.g.
'((\"phone\" . \"18612345678\")
  (\"mail\" . \"foo@example.com\")
  (\"map\" . \"https://ditu.amap.com/\")
  (\"m\" . \"https://msdn.microsoft.com/en-us\")
  (\"address\" . \"123 Happy Street\"))
"
  (setq zero-table-table alist)
  (setq zero-table-sequence-initials
        (delete-dups (mapcar (lambda (pair) (substring (car pair) 0 1))
                             zero-table-table))))

;;===========
;; test data
;;===========

(unless zero-table-table
  (zero-table-set-table
   '(("phone" . "18612345678")
     ("mail" . "foo@example.com")
     ("map" . "https://ditu.amap.com/")
     ("m" . "https://msdn.microsoft.com/en-us")
     ("address" . "123 Happy Street"))))

(provide 'zero-table)

To write a real world input method, you need to understand how zero-framework works. There are a few functions that you can override.

zero-framework manages input state and input logic via FSM. This is the key to understand what it does and what feature it provides.

Here is the FSM implemented in zero-framework.el

state action next state trigger action
IM_OFF M-x zero-on or zero-toggle IM_WAITING_INPUT turn on minor mode
IM_WAITING_INPUT type M-x zero-off or zero-toggle IM_OFF turn off minor mode
IM_WAITING_INPUT type character that can start a sequence IM_PREEDITING update preedit str, show candidate list
IM_WAITING_INPUT type character that can not start a sequence IM_WAITING_INPUT insert character
IM_WAITING_INPUT type [,.?!\:] IM_WAITING_INPUT insert Chinese punctuation character
IM_PREEDITING type character (that is not SPC, digit keys) IM_PREEDITING update preedit str, update and show candidate list
IM_PREEDITING type RET IM_WAITING_INPUT commit preedit str, hide candidate list, reset preedit str
IM_PREEDITING type SPC IM_WAITING_INPUT commit first candidate or preedit str, reset preedit str
IM_PREEDITING type digit keys IM_WAITING_INPUT commit nth candidate if it exists, otherwise, append to preedit str
IM_PREEDITING type M-x zero-off or zero-toggle IM_OFF reset IM, turn off minor mode
IM_PREEDITING type <backspace>, when preedit str is longer than 1 IM_PREEDITING update preedit str, update and show candidate list
IM_PREEDITING type <backspace>, when preedit str is length 1 IM_WAITING_INPUT reset IM
IM_PREEDITING focus in IM_PREEDITING show candidat list
IM_PREEDITING focus out IM_PREEDITING hide panel
IM_PREEDITING type [,.?!\:] IM_WAITING_INPUT commit first candidate or preedit str, insert Chinese punctuation
IM_PREEDITING type -/= IM_PREEDITING candiate page up/down
  • State meaning
    IM_OFF
    zero minor mode is off. zero should not do any preedit try nor do punctuation translation.
    IM_WAITING_INPUT
    User started typing in zero-mode
    IM_PREEDITING
    User typed some preedit character, but neither preedit string nor candidate is committed yet.
  • For the functions you can override in input method, the docstring is in their corresponding variable.
    • zero-build-candidates-func
    • zero-build-candidates-async-func
    • zero-can-start-sequence-func
    • zero-handle-preedit-char-func
    • zero-get-preedit-str-for-panel-func
    • zero-backspace-func
    • zero-preedit-start-func
    • zero-preedit-end-func
  • There is also init and shutdown function, which doesn't have a variable holding docstring.

    init function is called when this im is turned on. shutdown function is called when this im is turned off.

Where to go from here

  • Make zero-el and dependencies easier to install.
    • config CI and CD for zero-pinyin-service and zero-panel.
  • Write more document for zero-framework and zero-pinyin.
  • Make zero-pinyin-service better for pinyin input.
  • Make zero-panel prettier.

Known issues

  • Emacs dbus binding doesn't work in windows with windbus. So zero-pinyin can't be ported to windows.
  • zero-panel doesn't work in GNOME 3 on Wayland, which is the default GNOME session. It's because of a focusing problem of gtk application on Wayland. GNOME Classic and GNOME on Xorg session has no such problem.

Other resources

Is this post helpful?