Home Automatic password removal in Paperless-ngx
Post
Cancel

Automatic password removal in Paperless-ngx

As a user of Paperless-ngx, I know how important it is to have my files organized and accessible. However, sometimes I run into a problem when trying to import password-protected PDF files into the system. That’s why I decided to create a pre-consumption script for Paperless-ngx that would automatically remove the passwords using a dictionary file. In this tutorial, I’ll show you how I did it.

My environment

I run Paperless-ngx in a docker container on a Synology NAS. I would expect that the following steps can be applied to another docker installation easily. On a Paperless-ngx installation without docker this might be different.

Step 1: Create a Dictionary File

The first step in creating a pre-consumption script is to create a dictionary file. This file will contain a list of all the passwords that you want to try to remove from the PDF files. To create a dictionary file:

  1. Open a text editor such as Notepad or TextEdit.
  2. Enter each password on a new line.
  3. Save the file as <paperless-ngx root>/scripts/passwords.txt.
1
2
3
4
5
6
7
8
9
10
123456
123456789
qwerty
password
12345
qwerty123
1q2w3e
12345678
111111
1234567890

Step 2: Write the Pre-Consumption Script

Next, you’ll need to write the pre-consumption script. This script will use the dictionary file to automatically remove the passwords from the PDF files. For this tutorial, I’ll be using Python.

  1. Open a text editor such as Notepad or TextEdit.
  2. Copy below script. Check the filepath in line 8 with your script folder path in your Paperless-ngx docker config.
  3. Save the file as <paperless-ngx root>/scripts/removepassword.py.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#!/usr/bin/env python
import pikepdf
import os

def unlock_pdf(file_path):
    password = None
    print("reading passwords")
    with open("/usr/src/paperless/scripts/passwords.txt", "r") as f:
        passwords = f.readlines()
    for p in passwords:
        password = p.strip()
        try:
            with pikepdf.open(file_path, password=password, allow_overwriting_input=True) as pdf:
                print("password is working:" + password)
                pdf.save(file_path)
                break
        except pikepdf.PasswordError:
            print("password isn't working:" + password)
            continue
    if password is None:
        print("Empty password file")

file_path = os.environ.get('DOCUMENT_WORKING_PATH')
unlock_pdf(file_path)

Step 3: Configure the pre-consumption script to be run

Finally we need to configure the Python script to run, when a new files is processed by Paperless-ngx.

  1. Open your docker configuration file of Paperless-ngx. <paperless-ngx root>/{docker-config}.yml
  2. Make sure that the script folder is available to the docker container. services.webserver.volumes
  3. Set PAPERLESS_PRE_CONSUME_SCRIPT: /usr/src/paperless/scripts/removepassword.py
  4. Restart the Paperless-ngx docker container.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
version: "3.6"
services:
  broker:
    image: redis
    container_name: Paperless-NGX-REDIS
    restart: always
    volumes:
      - /volume1/docker/paperlessngx/redis:/data

  db:
    image: postgres
    container_name: Paperless-NGX-DB
    restart: always
    volumes:
      - /volume1/docker/paperlessngx/db:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: paperless
      POSTGRES_USER: paperless
      POSTGRES_PASSWORD: paperless

  webserver:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    container_name: Paperless-NGX
    restart: always
    depends_on:
      - db
      - broker
    ports:
      - 8777:8000
    volumes:
      - /volume1/docker/paperlessngx/data:/usr/src/paperless/data
      - /volume1/docker/paperlessngx/media:/usr/src/paperless/media
      - /volume1/docker/paperlessngx/export:/usr/src/paperless/export
      - /volume1/scans:/usr/src/paperless/consume
      - /volume1/docker/paperlessngx/scripts:/usr/src/paperless/scripts
    environment:
      PAPERLESS_REDIS: redis://broker:6379
      PAPERLESS_DBHOST: db
      USERMAP_UID: 1027
      USERMAP_GID: 100
      PAPERLESS_TIME_ZONE: Europe/Berlin
      PAPERLESS_ADMIN_USER: robert
      PAPERLESS_ADMIN_PASSWORD: XXXXXXXXXXXX
      PAPERLESS_OCR_LANGUAGE: deu+eng
      PAPERLESS_PRE_CONSUME_SCRIPT: /usr/src/paperless/scripts/removepassword.py      
      PAPERLESS_POST_CONSUME_SCRIPT: /usr/src/paperless/scripts/post-consumption.sh
This post is licensed under CC BY 4.0 by the author.

-

SRP Update Error: Internal Server Error