Publishing org-roam via GitLab CI

*
planted: 20/10/2021last tended: 05/02/2022

I've set up a basic pipeline in GitLab CI to build and publish my org-roam wiki automatically. Every time I commit some changes to the git repo that I store it in, the pipeline runs my org-publish bizness and then rsyncs the output html to my server.

1. Why?

I've been publishing from my local machine up until now. It's OK, but it takes a long time to rebuild all of the pages. As such, I tend to just publish files individually as I create them. But sometimes I miss pages, and sometimes I forget to I'm getting to the point where I'd like to have it auto-published after making changes.

Another approach could be to just make the build quicker… but I'm not sure how to do that just yet.

But generally, I do just like this kind of stuff, setting up continuous integration and things. Your mileage may vary - if you find this ops stuff a bit tedious, the benefits might not really be worth it.

2. Requirements

What do I want to happen?

  • each time I commit to the git repo that I store my raw wiki files in, I want the static html to be rebuilt and published to the web
    • this should include my various custom build things, like building backlinks, svg maps, etc
  • I want to still be able to publish locally from time to time, for local testing
  • I might still want to publish individual files to the web every now and again

3. How

To get a CI pipeline on Gitlab you just need a .gitlab-ci.yml file in the root of the repo.

My .gitlab-ci.yml for the wiki has two stages in its pipeline - one to build the org files, and one to rsync them to my server.

stages:
  - build-org
  - publish

The second one, the deployment stage, could be swapped out to deploy them to gitlab pages or something similar.

3.1. Building

org-generation:
  image:
    name: silex/emacs
  artifacts:
    paths:
      - _posts
  before_script:
    - apt-get update
    - apt-get --yes --force-yes install sqlite3
    - apt-get --yes --force-yes install git-restore-mtime
  script:
    - git restore-mtime
    - emacs -batch -q -l publish.el -f commonplace/publish-gitlab
  stage: build-org

I use silex/emacs image for the org-generation job, which is just Ubuntu with Emacs installed. The job calls a function commonplace/publish-gitlab in my publish.el. This function configures org-publish-project-alist. I used to have this configuration happen just globally in publish.el, but now I need it parameterised based on where I'm calling it from (locally, or from CI), to pass in the source / output directories.

I started out using Alpine Linux images for the steps, because (I think?) they pull down quicker given they're so barebones. But I got to a couple of points where I wasn't sure how to do a couple of things on Alpine, so I reverted back to Debian/Ubuntu for now. Maybe I'll go back to Alpine at a later date to speed up the build a bit.

3.2. Deploying

I found setting up ssh capabilities a bit of a faff in GitLab CI. Maybe it's just the first time you do it, probably easier in future now I'm used to it.

rsync:
  image: debian
  before_script:
    - apt-get update
    - apt-get --yes --force-yes install rsync
    - 'which ssh-agent || ( apt-get update -y && apt-get install openssh-client git -y )'
    - eval $(ssh-agent -s)
    - echo "$SSH_PRIVATE_KEY" | tr -d '\r' | ssh-add -
    - mkdir -p ~/.ssh
    - chmod 700 ~/.ssh
    - ssh-keyscan commonplace.doubleloop.net >> ~/.ssh/known_hosts
    - chmod 644 ~/.ssh/known_hosts
    - '[[ -f /.dockerenv ]] && echo -e "Host *\n\tStrictHostKeyChecking no\n\n" > ~/.ssh/config'
  script:
    - rsync -chavz _posts/ 37.218.246.201:/var/www/commonplace/
  stage: publish

4. A few problems I hit

4.1. ssh host key checking ssh

I still have a problem with host key checking that I need to sort out better.

- '[[ -f /.dockerenv ]] && echo -e "Host *\n\tStrictHostKeyChecking no\n\n" > ~/.ssh/config'

This solution is a hack.

4.2. Recent Changes sitemap

I use org-publish's :sitemap-sort-files set to anti-chronologically to produce a Recent Changes file. This uses the org files' mtimes in order to figure out when they last change. When you clone a git repo, you lose these.

So I added in some steps to call git-restore-mtime to set the files mtimes based on their last git commits.

By default, the GitLab runner only pulls in the last 50 commits of your repo. Which means restore-mtime doesn't see all of the logs. I changed this with:

variables:
  GIT_DEPTH: 0

in the yaml. I don't think it's ideal, as doing a shallow clone is a good idea for performance. To be revisited.

4.3. Backlinks

I have a function that is a preprocessor of org-export, that generates backlinks and inserts them in to the pages. It was working locally, but not remotely.

In order for backlinks to get generated correctly again, I needed to force org-roam's cache to be rebuilt on the CI server.

I added:

(org-roam-db-build-cache t)

To commonplace/publish-gitlab. The argument forces a clearout of the cache before rebuilding.

I also changed things to ensure I am using the absolute path of the project on the CI server. I'm not sure if this is necessary or not, but I thought maybe org-roam's DB query was expecting absolute paths, so I chucked it in.

(commonplace/configure (file-truename ".") "posts")

Unfortunately the cache rebuild slows the pipeline down a fair bit.

4.4. Speed

Build speed continues to be a massive issue. org-export/org-publish itself is really slow. Add in the org-roam cache rebuild, and there's a lot going on.

Around when I first introduced the pipeline, it took about 6 minutes - way too slow. Currently, as of Feb 2022 with over 3000 org files, it's taking around 30 minutes!!! This is horrendous. Speeding up org-export and org-publish.

Maybe by switching back to Alpine again, I might trim some time off. And using caching between stages. And if I really cared, I could set up my own Docker images that have all the bits needed already added.

One nice thing is that having it timed in Gitlab means I can get a bit of a grasp of where tweaks can be made.

5. TODO

6. Log

6.1. [2021-02-21 Sun]

6.1.1. Speed

Seems marginally quicker with the Debian based image rather than Ubuntu for the rsync step. Only by about 10 seconds though.

6.1.2. Backlinks issue

No error messages to note in the CI output. I see that locally I am using a slightly older version of org-roam - maybe it is actually just a breaking change in the new one?

No, still seems to work fine locally.

Ah, it seems links are stored in the roam db as absolute paths.

Might need to regenerate roam db on CI?

That seems to have fixed it.

6.2. [2021-02-20 Sat] First go at setting it up.

Why? I'm getting to the point where I'd like to have it auto-published after making changes.

6.3. set up .gitlab-ci.yml.

6.3.1. build stage

6.3.1.1. Issues
  • htmlize not installed. need to find a way to only run the package installations on CI.
  • Opening output file: No such file or directory, /home/shared/commonplace/recentchanges.org. Referring to this directly somewhere?
    • resolved by making a configure function
      • this is nice actually, I can call this interactively as well
      • better than relying on it being globally configured in publish.el

6.3.2. publish stage

6.4. [2021-10-20 Wed]

The pipeline has become horribly slow lately. Not sure what changed, but now it is failing to publish at all because it is timing out on one of the steps.

It seems perhaps to be bailing on tracks.org which is a massive bullet list. Is that the problem? It didn't used to be. What changed?

7. Resources

8. Elsewhere

8.3. Mentions

Recent changes. Source. Peer Production License.