Karaoke-Version.com - Download script?

FredAnderson

Code like this would solve it if you want to play around with it.

var songUrl = "https://www.karaoke-version.com/custombackingtrack/neil-young/heart-of-gold.html";

var urlObj = new URL(songUrl);
var domain = urlObj.hostname;

console.log(domain); // Outputs: www.karaoke-version.com

DaveT

Cool mate, and thanks for all the help. I will def dig into it at some point once I get a quiet period lol

DaveT

yeah download destination is one I would def mess with

DaveT

So.... after a bit of messing around. I did this on a Mac. Get the script above working first of all. then I found that if you download all your invoices (pdfs), which is quick way to get all your artist and song names. Open them all with word and copy and paste just the centre section of the pdf where your song name and artist is listed. I did about 700 songs in less than 10 minutes this way. Once they are all in excel you need to be creative with your renaming of certain letters and spaces etc within the names you have. use find and replace to do it. start by renanaming " : " with "/" all other punctuation marks such as ' ( ) ? can all be removed ' needs to be replaced with a - , and & needs to be replaced with the word and . finally rename all " " spaces with a "-". that's the hard bit done. now you need to add https://www.karaoke-version.com/custombackingtracks/ to the front of all the names you've created and .html to the end (CONCAT) is what you want to use in excel. Now use the LOWER command on your results to make all capitals lower case. This will give you all the correct formatted website addresses for your files. now add this to the front of the cells "nom run start "

Now all your lines will look like this

npm run start https://www.karaoke-version.com/custombackingtrack/solomon-burke/everybody-needs-somebody-to-love.html
npm run start https://www.karaoke-version.com/custombackingtrack/the-jackson-5/can-you-feel-it.html
npm run start https://www.karaoke-version.com/custombackingtrack/michael-jackson/one-day-in-your-life.html

etc etc etc

now comes the fun part -- you can now merge cells together with this in between each cell " ; " merge about 20 or 30 cells at a time and you will have a command like this

copy and paste this into terminal, hit return and leave your Mac to run in the background or at night etc and it will now download all of those multi track files automatically for you one after the other!!!!!

If you do some reading up on excel and how to expand cells so that once you do one cell you can drag it to all others etc, it will work quickly for you. Between working out how to do it and the formatting, I now have about 25 terminal commands to copy and paste into terminal to get ALL of my multitracks.

You could probably do more than 20 or 30 at a time, but why tempt fate lol

Lets see how it goes

Dave

FredAnderson

DaveT interesting. Seems like a good feature to add to the script, to go to the "downloads" page and scrape all the links and send them to a file.

FredAnderson

DaveT OK.. here's a script to scrape the site. it's python, it uses beautifulsoup4 so.. you will need to install that if you don't have it installed.
pip install beautifulsoup4 requests
obviously include your username and password.

import requests
from bs4 import BeautifulSoup

LOGIN_URL = 'https://www.karaoke-version.com/my/login.html'
BASE_URL = 'https://www.karaoke-version.com/my/download.html?page='
TARGET_PREFIX = '/custombackingtrack/'

# Your login credentials
payload = {
    'frm_login': 'YOUR_USERNAME',
    'frm_password': 'YOUR_PASSWORD'
}

def get_links_from_page(session, url):
    response = session.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    
    # Only keep links that start with the desired prefix
    links = [a['href'] for a in soup.find_all('a', href=True) if a['href'].startswith(TARGET_PREFIX)]
    
    return links

def get_number_of_pages(session, url):
    response = session.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    
    # Get all elements with the class 'mr-1'
    pagination_elements = soup.select('.mr-1')
    
    return len(pagination_elements)

all_links = []

with requests.Session() as session:
    # Login
    post = session.post(LOGIN_URL, data=payload)

    # Check if login was successful
    if post.status_code == 200:
        total_pages = get_number_of_pages(session, BASE_URL + '1')  # Assuming you start at page 1

        for page_num in range(1, total_pages + 1):
            current_page_url = BASE_URL + str(page_num)
            all_links.extend(get_links_from_page(session, current_page_url))

print(all_links)

nickstomp

FredAnderson

Hi I cleaned the script a bit (much) now it returns all the links correctly.
to run it you need to install with pip

pip install -U beautifulsoup4

and need to create a file called config.toml in the current directory like

frm_login="mylogin"
frm_password="mypassword"

#!/usr/bin/env python3

from urllib.parse import urljoin
import logging
import os
import requests
import sys
import tomllib
from pprint import pprint, pformat

from bs4 import BeautifulSoup


# configure logging
logger = logging.getLogger(os.path.basename(__file__))
logging.basicConfig(level=logging.INFO)

# global variables
CONFIG_FILE = os.path.join(os.getcwd(), "config.toml")
DOMAIN_URL = "https://www.karaoke-version.com"
LOGIN_URL = f"{DOMAIN_URL}/my/login.html"
TARGET_PREFIX = "/custombackingtrack/"


def get_download_page_url(base_url, page):
    download_url = f"{base_url}/my/download.html?page={page:d}"
    return download_url

def get_number_of_pages(session, domain_url):
    # Use a trick if we put a huge number of pages we get the latest page
    logger.debug("Opening download page %d", 999)
    url = get_download_page_url(domain_url, 999)

    response = session.get(url)
    soup = BeautifulSoup(response.content, "html.parser")

    # Get all elements with the class 'mr-1'
    pagination_elements = soup.select("a.mr-1")

    last_page_number = 1

    # Check if there are pagination elements
    if pagination_elements:
        # Get the last element's href attribute
        last_page_link = pagination_elements[-1]["href"]
        # Extract the page number from the href
        last_page_number = int(last_page_link.split("page=")[-1])

    return last_page_number

def get_song_urls(session, domain_url, total_pages):
    all_song_urls = set() # Using a set to automatically handle duplicates

    for page_number in range(1, total_pages + 1):
        url = get_download_page_url(domain_url, page_number)
        response = session.get(url)
        soup = BeautifulSoup(response.content, "html.parser")

        # Find all <a> elements within <td class="my-downloaded-files__song">
        song_elements = soup.select("td.my-downloaded-files__song a")

        # Extract the href attributes from the <a> elements and construct full URLs
        song_urls = [urljoin(domain_url, element["href"]) for element in song_elements]
        all_song_urls.update(song_urls) # Update the set with new URLs

    # Sort the collected URLs
    sorted_song_urls = sorted(all_song_urls)

    return sorted_song_urls


def run_main():
    logger.info("Load config from %s", CONFIG_FILE)
    with open(CONFIG_FILE, "rb") as f:
        my_config = tomllib.load(f)

    with requests.Session() as session:
        # Login
        logger.info("Open loging page %s", LOGIN_URL)
        post = session.post(LOGIN_URL, data=my_config)

        # Check if login was successful
        assert post.status_code == 200

        logger.info("Login successful. Getting number of pages")

        total_pages = get_number_of_pages(session, DOMAIN_URL)
        logger.info("Total number of page is %d", total_pages)

        sorted_urls = get_song_urls(session, DOMAIN_URL, total_pages)

        logger.info("Sorted Song URLs:")
        for url in sorted_urls:
            print(url)


if __name__ == "__main__":
    logger.debug("run main")
    sys.exit(run_main())

now it return an ordered list of all the correct links

DaveT

Brilliant work mate, I haven't had a chance to look at it in full etc, as my terminal commands are all running in the back ground and I dont want to disturb them. Bit annoying that whenever a download completes you lose the cursor/mouse to it all lol. Your script looks much better to recover the real links rather than my "manual" job and then me having to rename and add to what I have, but as long as it work for me then its a bonus, as getting all the multitracks was a job I planned for so long, that once you get a few years down the line and almost 1000 tracks bought, the thought of then having to go in and manually down load all of them, becomes a fog in front of you. the script with the command line has been a god send, and kicked me up the ass to go and get them all. My concern is that KV kicks me off my account for doing it, but I can't see anywhere on their site that says I can't. they openly say that the custom tracks are there so that you can edit till your heart is happy, so I dont see how or why they would block anyone for grabbing the individual tracks no matter the method. It may impact on their site bandwidth which is a concern, but if the said to me that it would cost me an extra few quid to do it, I would gladly pay it. my plan was to setup my mixer with allocated channels per track i.e. bass, strings, piano brass drums etc etc etc and then have all the tracks on the correct channels for live shows - for the only reason that some songs are weaker on drums or bass or strings etc etc and I could "mix" them live if I had the multi tracks, plus I get bored at gig s mixing a few mics with just a stereo track. it will give me so much more scope to make the tracks sound better, and also tune the individual instruments to the venue, plus release the boredom lol. getting excited to set this all up now lol

DaveT

now ive just found -- run multiple terminals, and you can multi thread the downloads!!!! multiple tracks downloading at the one time lol

DaveT

So quick update, I found that opening about 10 terminal windows and doing them in batches of 50 on each window turned out work best and give me maximum speed etc. Just over 850 songs all done as multitracks in just over 24 hours including fixing ones that failed due to my spreadsheet searching missing a few things in renaming etc (had about 20 failures all in, but once I had corrected the address they all downloaded with no issues.

I now have a folder of 79 gig with over 8000 files in it to sort out lol

Dave

bens

The interest here is really nice to see, but I would suggestion caution when using this script to download huge batches of songs, especially in parallel. As stated in the README, this may well be against their terms of use, and I'd hate to see anyone's account get banned for using such automation. Doing so in huge batches will make it trivial for your account to be flagged as a bot, for instance. Especially multiple operations in parallel.

To answer some of the questions in this thread:

the mouse grabbing / focus steeling behavior is not something you can get around. A single misclick can interrupt the scripts defined behavior, so it is recommended to just leave it alone for a few minutes to finish
It would be fantastic if KV provided a true API for building such downloaders in a much nicer way, but since they don't, we rely on manual parsing and poking at a Chromium web driver. This is clunky, error prone, and difficult to extend with more options.
Changing the download path, however, might be as easy as setting the PUPPETEER_DOWNLOAD_PATH environment variable to a folder of your choosing. I haven't tried this. I may add this to the script as a documented option.

Also great catch on the co.uk issue! I can fix that one.

DaveT

If you're looking for suggestions ben -- would it be possible for the script to put the downloaded files into a folder named by the link? That way bulk downloads dont all land in the one folder, but folders per song? Its very easy to rename the folders in bulk using the Mac app NameChanger to remove the likes of the http:// etc etc from it to leave the artist and song - I use it anyway to remove all the _ from the file names themselves.

Great work on the script by the way -- it saved me a very large load of ball ache work lol - very very much appreciated and will be using it to download all new files I buy in future (usually about 30 or 40 every couple of weeks), as the individual downloading was always going to be a headache even a few files, so never done it. I now find myself in a situation where I would prefer the individuals (live multitrack mixing to improve the lack in some of the bass or drums content of the files etc, and some panning to give better depth to the files) for some higher profile live shows coming up. The other advantage is that if anything should ever happen to the company, at least having everything downloaded helps to protect your investment.

bens

DaveT thanks. Please do heed my warning about not running this in parallel. I'm pushing up some changes to allow you to change the download path. See the README for instructions.

I also fixed the co.uk issue.

I'd prefer we not hijacking Peter's forum for this discussion. I would suggest further discussion happen on the GitHub project: https://github.com/subdigital/puppeteer-karaoke-version/discussions/3

DaveT

Just an update on the above. KV have added a new "help FAQ" section in. Looks like they are listening to what's being said or done. They have now stated on their page that to download individual tracks, you can modify and do this as often as you like. I think they know people are using the script to download the tracks automatically.

https://www.karaoke-version.com/help/custombackingtrack_262.html

yblanco

Talking about Karaoke-Version, is it normal that no matter which instrument is the stem, even if it’s only the click track, the file size remains the same? I expect the file size with all instruments to be much larger than just the click track.

Thanks for clarifying this

peter

Not with fixed rate mp3 encoding. Then the file size is only dependent on the length of the track.

yblanco

FredAnderson Great video Fred. Thanks so much for this. And Ben of course for the wonderful script

G3no

For me , it only started download the first track (click) and waiting for the others. then get an error message saying

Waiting for download...
file:///Users/mycomputersname/puppeteer-karaoke-version/node_modules/puppeteer-core/lib/esm/puppeteer/common/WaitTask.js:68
this.terminate(new TimeoutError(Waiting failed: ${options.timeout}ms exceeded));
^

TimeoutError: Waiting for selector Your download will begin in a moment failed: Waiting failed: 120000ms exceeded
at Timeout.<anonymous> (file:///Users/mycomputersname/puppeteer-karaoke-version/node_modules/puppeteer-core/lib/esm/puppeteer/common/WaitTask.js:68:32)
at listOnTimeout (node:internal/timers:573:17)
at process.processTimers (node:internal/timers:514:7)

anyone can help me ? I'm a perfect noob.🙁

DaveT

I seem to be having some issues with the script just now. It isn't downloading all of the tracks from the song. Particular track just now has 11 tracks in it, the script runs as if everything has completed fine, but in the download folder only 6 tracks are there. Any ideas?

Checking the track names, there aren't any repeat named ones that may be overwriting. If I solo each track individually and download manually then all files download perfectly.

puppeteer-karaoke-version@1.0.0 start
node src/index.js https://www.karaoke-version.com/custombackingtrack/stevie-wonder/uptight-everything-s-alright.html

Starting for song page: https://www.karaoke-version.com/custombackingtrack/stevie-wonder/uptight-everything-s-alright.html
Opening the browser...
soloing track 1
Waiting for download...
closing modal...
soloing track 2
Waiting for download...
closing modal...
soloing track 3
Waiting for download...
closing modal...
soloing track 4
Waiting for download...
closing modal...
soloing track 5
Waiting for download...
closing modal...
soloing track 6
Waiting for download...
closing modal...
soloing track 7
Waiting for download...
closing modal...
soloing track 8
Waiting for download...
closing modal...
soloing track 9
Waiting for download...
closing modal...
soloing track 10
Waiting for download...
closing modal...
soloing track 11
Waiting for download...
closing modal...
Done!

Klar

FredAnderson How do you run this?

« Previous Page Next Page »