As a user of Paperless-ngx, I know how important it is to have my files organized and accessible. However, sometimes I run into a problem when trying to import password-protected PDF files into the system. That’s why I decided to create a pre-consumption script for Paperless-ngx that would automatically remove the passwords using a dictionary file. In this tutorial, I’ll show you how I did it.
My environment
I run Paperless-ngx in a docker container on a Synology NAS. I would expect that the following steps can be applied to another docker installation easily. On a Paperless-ngx installation without docker this might be different.
Step 1: Create a Dictionary File
The first step in creating a pre-consumption script is to create a dictionary file. This file will contain a list of all the passwords that you want to try to remove from the PDF files. To create a dictionary file:
- Open a text editor such as Notepad or TextEdit.
- Enter each password on a new line.
- Save the file as
<paperless-ngx root>/scripts/passwords.txt
.
1
2
3
4
5
6
7
8
9
10
123456
123456789
qwerty
password
12345
qwerty123
1q2w3e
12345678
111111
1234567890
Step 2: Write the Pre-Consumption Script
Next, you’ll need to write the pre-consumption script. This script will use the dictionary file to automatically remove the passwords from the PDF files. For this tutorial, I’ll be using Python.
- Open a text editor such as Notepad or TextEdit.
- Copy below script. Check the filepath in line 8 with your script folder path in your Paperless-ngx docker config.
- Save the file as
<paperless-ngx root>/scripts/removepassword.py
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#!/usr/bin/env python
import pikepdf
import os
def unlock_pdf(file_path):
password = None
print("reading passwords")
with open("/usr/src/paperless/scripts/passwords.txt", "r") as f:
passwords = f.readlines()
for p in passwords:
password = p.strip()
try:
with pikepdf.open(file_path, password=password, allow_overwriting_input=True) as pdf:
print("password is working:" + password)
pdf.save(file_path)
break
except pikepdf.PasswordError:
print("password isn't working:" + password)
continue
if password is None:
print("Empty password file")
file_path = os.environ.get('DOCUMENT_WORKING_PATH')
unlock_pdf(file_path)
Step 3: Configure the pre-consumption script to be run
Finally we need to configure the Python script to run, when a new files is processed by Paperless-ngx.
- Open your docker configuration file of Paperless-ngx.
<paperless-ngx root>/{docker-config}.yml
- Make sure that the script folder is available to the docker container.
services.webserver.volumes
- Set
PAPERLESS_PRE_CONSUME_SCRIPT: /usr/src/paperless/scripts/removepassword.py
- Restart the Paperless-ngx docker container.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
version: "3.6"
services:
broker:
image: redis
container_name: Paperless-NGX-REDIS
restart: always
volumes:
- /volume1/docker/paperlessngx/redis:/data
db:
image: postgres
container_name: Paperless-NGX-DB
restart: always
volumes:
- /volume1/docker/paperlessngx/db:/var/lib/postgresql/data
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: paperless
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
container_name: Paperless-NGX
restart: always
depends_on:
- db
- broker
ports:
- 8777:8000
volumes:
- /volume1/docker/paperlessngx/data:/usr/src/paperless/data
- /volume1/docker/paperlessngx/media:/usr/src/paperless/media
- /volume1/docker/paperlessngx/export:/usr/src/paperless/export
- /volume1/scans:/usr/src/paperless/consume
- /volume1/docker/paperlessngx/scripts:/usr/src/paperless/scripts
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBHOST: db
USERMAP_UID: 1027
USERMAP_GID: 100
PAPERLESS_TIME_ZONE: Europe/Berlin
PAPERLESS_ADMIN_USER: robert
PAPERLESS_ADMIN_PASSWORD: XXXXXXXXXXXX
PAPERLESS_OCR_LANGUAGE: deu+eng
PAPERLESS_PRE_CONSUME_SCRIPT: /usr/src/paperless/scripts/removepassword.py
PAPERLESS_POST_CONSUME_SCRIPT: /usr/src/paperless/scripts/post-consumption.sh