Guide: how to generate video fingerprints at scale on Windows

Use cases

Solutions

Products

Learn more →

Company

Solutions

Products

Cookie Settings

Cookies necessary for the correct operation of the site are always enabled.
Other cookies are configurable.

Guide: how to generate video fingerprints at scale

Learn how to use one script to fingerprint multiple video files and upload prepared fingerprints to WebScan, platform-scanning software

Content

Introduction

Step 1. Prerequisites

Step 2. Prepare your files & metadata for bulk upload

Step 3. Run the script

Step 4. Scan the platforms

Introduction

WebScan is the software that enables instant detection of illegal video copies on video-sharing platforms. WebScan is based on WebKyte’s proprietary digital video fingerprinting technology.

Video fingerprinting is the technology that extracts distinctive elements of a video into a unique line of code, a fingerprint. This technology has demonstrated its efficiency in identifying and comparing video files at scale. Video fingerprints are irreversible to original files and fully secure.

To run copy detection for a video on WebScan you should prepare and upload a digital fingerprint of the video to the WebScan data server. This guide covers every step of this process.

Once uploaded, your titles will be available for copy identification on WebScan:

If you have any questions about WebScan please feel free to reach out to us via hello@webkyte.com

Step 1. Prerequisites

To generate fingerprints locally, the WebKyte team has prepared a Docker image. Therefore, the script from step 3 requires Docker to be installed in order to run.

Please install and set up Docker following this instruction:

https://docs.docker.com/desktop/install/windows-install/

To run the script and Docker image on Windows, please make sure you have PowerShell or install it:

https://learn.microsoft.com/en-us/powershell/scripting/install/installing-powershell-on-windows?view=powershell-7.4

Step 2. Prepare your files & metadata for bulk upload

The fingerprint upload process consists of two steps. The first step involves fingerprinting videos and storing the generated fingerprints on your device. The second step entails uploading the fingerprints to WebScan.

Before you can run the script, there are three things you need to prepare: a folder with video files, metadata, and a folder for fingerprints.

1. Folders for video files and fingerprints

Open PowerShell and follow these steps:


# create a wk directory
$ mkdir wk

# go to the wk directory
$ cd wk

# create two more directories
$ mkdir videos
$ mkdir fingerprints

You can keep the video files in their original folder. In this case, make sure to add the full path to the folder with files when running the script in step 3.

2. Video files

Please add all the videos you want to fingerprint to the videos folder. This folder has read-only access so your original files stay in your system throughout the whole process.

Supported formats
Uploader supports all the main video formats: MP4, MOV, MKV, WMV, AVI, and others.

Use clear files
Make sure there are no watermarks, tech specs, or other alien elements in video files. It helps to generate the most accurate digital fingerprints.

When using the WebKyte script, you do not share video files with us. The entire process of fingerprint generation occurs solely on your side. Only the prepared fingerprints are sent to the WebScan server.

3. Metadata

To connect a fingerprint to a specific title in the WebScan interface, it is essential to supply metadata.

Please add a file named upload.csv containing metadata to the wk folder.

When preparing a CSV file, please use this format and ensure there are no typos:


video1.mp4,Algiers,1938,0
video2.mp4,Captain Kidd,1945,0
video3.avi,First of Us,2023,1,1,1
video4.avi,First of Us,2023,1,1,2

The 1st value is the video file name (for example, video3.avi)

The 2nd value is an official title (for example, First of us)

The 3rd value is the year of release (for example, 2023)

The 4th value is a type (0 for a movie, 1 for a series)

The 5th value is a season number (only for series)

The 6th value is an episode number (only for series)

All the values should be divided by a comma with no additional spaces.

Each title’s metadata should be on a separate row in the CSV file.

Here is an example of metadata for 4 titles: link

Step 3. Run the script to upload multiple fingerprins

1. Download the script upload.sh to the wk folder and make the script executable:


# Download the script
$ Invoke-WebRequest -Uri "http://bit.ly/WKuploadWindows" -OutFile "upload.ps1"

# Make the script executable
$ Set-ItemProperty -Path .\upload.ps1 -Name IsReadOnly $false

Full script text:


# Read CSV file, specifying headers since the file does not include them
$headers = @('VideoFile', 'Title', 'Year', 'Type', 'Season', 'Episode')
$data = Import-Csv -Path $args[1] -Header $headers

foreach ($row in $data) {
    Write-Host "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"
    Write-Host "Processing file started: $($row.VideoFile)"
    Write-Host ""

    Write-Host "Video file: $($row.VideoFile)"
    Write-Host "Title: $($row.Title)"
    Write-Host "Year: $($row.Year)"
    Write-Host "Type: $($row.Type)"
    if ($row.Type -eq "1") {
        Write-Host "Season: $($row.Season)"
        Write-Host "Episode: $($row.Episode)"
    }
    Write-Host ""

    Write-Host "Fingerprinting started"

    # Prepare the Docker command
	
	$dockerCommand = 'docker run --rm -v "' + "$($PWD.Path)/$($args[2]):/videos:ro" + '" -v "' + "$($PWD.Path)/$($args[3]):/fingerprints" + '" registry.clients.webkontrol.com/wk-fingerprint "/videos/' + "$($row.VideoFile)" + '" "/fingerprints/' + "$($row.VideoFile).fp" + '"'

    
    # Print the Docker command
    Write-Host "Docker Command: $dockerCommand"

    # Execute the Docker command
    Invoke-Expression $dockerCommand

    Write-Host "Fingerprinting finished"
    Write-Host ""

    if ($row.Type -eq "0") {
		
		$formattedTitle = $row.Title -replace " +", "_"
		$encodedId = [System.Web.HttpUtility]::UrlEncode($formattedTitle) + "_y$($row.Year)"
		Write-Host "Uploading id: $encodedId"
		
		$formattedTitle = $row.Title -replace " +", "%2520"
		$encodedTitle = [System.Web.HttpUtility]::UrlEncode($formattedTitle)

        $url = "https://scan.webkontrol.com/api/upload/$($encodedId)?type=0&title=$($encodedTitle)&year=$($row.Year)"
        Write-Host "Uploading url: $url"

    } else {
		$formattedTitle = $row.Title -replace " +", "_"
		$encodedId = [System.Web.HttpUtility]::UrlEncode($formattedTitle) + "_y$($row.Year)_s$($row.Season)_e$($row.Episode)"
		Write-Host "Uploading id: $encodedId"
		
		$formattedTitle = $row.Title -replace " +", "%2520"
		$encodedTitle = [System.Web.HttpUtility]::UrlEncode($formattedTitle)

        $url = "https://scan.webkontrol.com/api/upload/$($encodedId)?type=1&title=$($encodedTitle)&year=$($row.Year)&season=$($row.Season)&episode=$($row.Episode)"
        Write-Host "Uploading url: $url"
    }

    Write-Host "Uploading started"

    # Uploading the fingerprint
    Invoke-RestMethod -Uri $url -Method Post -Headers @{ Authorization = "Token $($args[0])" } -InFile "$($args[3])/$($row.VideoFile).fp" -ContentType 'application/octet-stream'

    Write-Host "Uploading finished"
    Write-Host ""

    Write-Host "Processing file finished: $($row.VideoFile)"
    Write-Host "------------------------------------------------------------"
    Write-Host ""
}

2. View the contents of the directory. You should see the following files and sub-directories:


├── wk
│ ├── fingerprints
│ ├── upload.csv
│ ├── upload.sh
│ ├── videos
│    ├── video1.mp4
│    ├── video2.mp4
│    ├── video3.avi
│    ├── video4.avi

3. Run the script:


# Run the script
$ ./upload.ps1 <TOKEN> upload.csv /full/path/to/videos/directory /full/path/to/fingerprints/directory

1. Make sure to enter the full path to both the folder with video files and the folder for fingerprints

2. Replace <TOKEN> with your personal token. You can find it in your inbox.

Here is an example when videos and fingerprints directories are in the same directory as upload.ps1:


./upload.ps1 f1e6c0c19b1b7556a49ab7aed02549486b3d89 upload.csv videos fingerprints

4. Wait for the script to finish processing files.

How to use the script for new titles. Once you have the new titles released, you can upload them to WebScan with the same script.

1. Add new video files to the videos folder

2. Update the upload.csv file so it contains only metadata for the new titles

3. Run the script again

Step 4. Scan the platforms

Once the fingerprints are successfully uploaded to WebScan, you can run copy detection for your titles.

Before you start: make sure to whitelist your official channels.

Email us the links to your official channels on Dailymotion, Vimeo, and other supported platforms to keep them out of matching results.

1. Go to https://scan.webkyte.com/ and log in to your account:

Make sure to check your inbox to find your Login and Password

2. Click New Search

3. Select titles your want to check for copies

4. Select the platforms and click Match it! to start scanning:

5. Wait for the status to be «Completed» and explore the matching results:

If you have any questions about WebScan please feel free to reach out to us via hello@webkyte.com