Publishing org-roam via GitLab CI
*I've set up a basic pipeline in GitLab CI to build and publish my org-roam wiki automatically. Every time I commit some changes to the git repo that I store it in, the pipeline runs my org-publish bizness and then rsyncs the output html to my server.
1. Why?
I've been publishing from my local machine up until now. It's OK, but it takes a long time to rebuild all of the pages. As such, I tend to just publish files individually as I create them. But sometimes I miss pages, and sometimes I forget to I'm getting to the point where I'd like to have it auto-published after making changes.
Another approach could be to just make the build quicker… but I'm not sure how to do that just yet.
But generally, I do just like this kind of stuff, setting up continuous integration and things. Your mileage may vary - if you find this ops stuff a bit tedious, the benefits might not really be worth it.
2. Requirements
What do I want to happen?
- each time I commit to the git repo that I store my raw wiki files in, I want the static html to be rebuilt and published to the web
- this should include my various custom build things, like building backlinks, svg maps, etc
- I want to still be able to publish locally from time to time, for local testing
- I might still want to publish individual files to the web every now and again
3. How
To get a CI pipeline on Gitlab you just need a .gitlab-ci.yml file in the root of the repo.
My .gitlab-ci.yml for the wiki has two stages in its pipeline - one to build the org files, and one to rsync them to my server.
stages: - build-org - publish
The second one, the deployment stage, could be swapped out to deploy them to gitlab pages or something similar.
3.1. Building
org-generation: image: name: silex/emacs artifacts: paths: - _posts before_script: - apt-get update - apt-get --yes --force-yes install sqlite3 - apt-get --yes --force-yes install git-restore-mtime script: - git restore-mtime - emacs -batch -q -l publish.el -f commonplace/publish-gitlab stage: build-org
I use silex/emacs image for the org-generation
job, which is just Ubuntu with Emacs installed. The job calls a function commonplace/publish-gitlab
in my publish.el. This function configures org-publish-project-alist
. I used to have this configuration happen just globally in publish.el, but now I need it parameterised based on where I'm calling it from (locally, or from CI), to pass in the source / output directories.
I started out using Alpine Linux images for the steps, because (I think?) they pull down quicker given they're so barebones. But I got to a couple of points where I wasn't sure how to do a couple of things on Alpine, so I reverted back to Debian/Ubuntu for now. Maybe I'll go back to Alpine at a later date to speed up the build a bit.
3.2. Deploying
I found setting up ssh capabilities a bit of a faff in GitLab CI. Maybe it's just the first time you do it, probably easier in future now I'm used to it.
rsync: image: debian before_script: - apt-get update - apt-get --yes --force-yes install rsync - 'which ssh-agent || ( apt-get update -y && apt-get install openssh-client git -y )' - eval $(ssh-agent -s) - echo "$SSH_PRIVATE_KEY" | tr -d '\r' | ssh-add - - mkdir -p ~/.ssh - chmod 700 ~/.ssh - ssh-keyscan commonplace.doubleloop.net >> ~/.ssh/known_hosts - chmod 644 ~/.ssh/known_hosts - '[[ -f /.dockerenv ]] && echo -e "Host *\n\tStrictHostKeyChecking no\n\n" > ~/.ssh/config' script: - rsync -chavz _posts/ 37.218.246.201:/var/www/commonplace/ stage: publish
4. A few problems I hit
4.1. ssh host key checking ssh
I still have a problem with host key checking that I need to sort out better.
- '[[ -f /.dockerenv ]] && echo -e "Host *\n\tStrictHostKeyChecking no\n\n" > ~/.ssh/config'
This solution is a hack.
4.2. Recent Changes sitemap
I use org-publish's :sitemap-sort-files
set to anti-chronologically
to produce a Recent Changes file. This uses the org files' mtimes in order to figure out when they last change. When you clone a git repo, you lose these.
So I added in some steps to call git-restore-mtime to set the files mtimes based on their last git commits.
By default, the GitLab runner only pulls in the last 50 commits of your repo. Which means restore-mtime doesn't see all of the logs. I changed this with:
variables: GIT_DEPTH: 0
in the yaml. I don't think it's ideal, as doing a shallow clone is a good idea for performance. To be revisited.
4.3. Backlinks
I have a function that is a preprocessor of org-export, that generates backlinks and inserts them in to the pages. It was working locally, but not remotely.
In order for backlinks to get generated correctly again, I needed to force org-roam's cache to be rebuilt on the CI server.
I added:
(org-roam-db-build-cache t)
To commonplace/publish-gitlab
. The argument forces a clearout of the cache before rebuilding.
I also changed things to ensure I am using the absolute path of the project on the CI server. I'm not sure if this is necessary or not, but I thought maybe org-roam's DB query was expecting absolute paths, so I chucked it in.
(commonplace/configure (file-truename ".") "posts")
Unfortunately the cache rebuild slows the pipeline down a fair bit.
4.4. Speed
Build speed continues to be a massive issue. org-export/org-publish itself is really slow. Add in the org-roam cache rebuild, and there's a lot going on.
Around when I first introduced the pipeline, it took about 6 minutes - way too slow. Currently, as of Feb 2022 with over 3000 org files, it's taking around 30 minutes!!! This is horrendous. Speeding up org-export and org-publish.
Maybe by switching back to Alpine again, I might trim some time off. And using caching between stages. And if I really cared, I could set up my own Docker images that have all the bits needed already added.
One nice thing is that having it timed in Gitlab means I can get a bit of a grasp of where tweaks can be made.
5. TODO
[X]
get Recent Changes timestamps working again- seems to work OK here? https://www.pank.eu/blog/blog-setup.html
- possibly just use
git-restore-mtime
https://github.com/MestreLion/git-tools/- but does alpine linux have it?
[X]
get backlinks working again[ ]
build the map svg in the pipeline- this would require my org-roam tweak though.
[ ]
find a way to only run the package installations on CI[ ]
see if can get to bottom of host key verification issue[ ]
boost performance[ ]
switch back to alpine? probably quicker[ ]
build own Docker image?[ ]
speed up org-publish?
6. Log
6.1.
6.1.1. Speed
Seems marginally quicker with the Debian based image rather than Ubuntu for the rsync step. Only by about 10 seconds though.
6.1.2. Backlinks issue
No error messages to note in the CI output. I see that locally I am using a slightly older version of org-roam - maybe it is actually just a breaking change in the new one?
No, still seems to work fine locally.
Ah, it seems links are stored in the roam db as absolute paths.
Might need to regenerate roam db on CI?
That seems to have fixed it.
6.2. First go at setting it up.
Why? I'm getting to the point where I'd like to have it auto-published after making changes.
6.3. set up .gitlab-ci.yml.
6.3.1. build stage
6.3.1.1. Issues
- htmlize not installed. need to find a way to only run the package installations on CI.
- possibly like here: https://gitlab.com/pages/org-mode
- Opening output file: No such file or directory, /home/shared/commonplace/recentchanges.org. Referring to this directly somewhere?
- resolved by making a configure function
- this is nice actually, I can call this interactively as well
- better than relying on it being globally configured in publish.el
- resolved by making a configure function
6.3.2. publish stage
- Setup a simple CI/CD using Gitlab Pipeline | by Muhammad Ndako | Medium
- Using SSH keys with GitLab CI/CD | GitLab
- set up an ssh key just for gitlab (make sure no passphrase)
- Issue with Host Key verification failed.
- https://gitlab.com/gitlab-org/gitlab-runner/-/issues/3679
- ci tip: remove the build stage temporarily as that takes ages, just focus on the deploy stage while resolving that
- some kind of problem with recent changes page when publishing via gitlab ci, they all have today's date
6.4.
The pipeline has become horribly slow lately. Not sure what changed, but now it is failing to publish at all because it is timing out on one of the steps.
It seems perhaps to be bailing on tracks.org
which is a massive bullet list. Is that the problem? It didn't used to be. What changed?
7. Resources
8. Elsewhere
8.1. In my garden
Notes that link to this note (AKA backlinks).