multiobject_download (function)

multiobject_download(urlList, downloadDirectory, log, timeStamp=True, timeout=180, concurrentDownloads=10, resetFilename=False, credentials=False, longTime=False, indexFilenames=False)[source]

get multiple url documents and place them in specified download directory/directories

Key Arguments

  • urlList – list of document urls

  • downloadDirectory – directory(ies) to download the documents to - can be one directory path or a list of paths the same length as urlList

  • log – the logger

  • timestamp – append a timestamp the name of the URL (ensure unique filenames)

  • longTime – use a longer timestamp when appending to the filename (greater uniqueness)

  • timeout – the timeout limit for downloads (secs)

  • concurrentDownloads – the number of concurrent downloads allowed at any one time

  • resetFilename – a string to reset all filenames to

  • credentials – basic http credentials { ‘username’ : “…”, “password”, “…” }

  • indexFilenames – prepend filenames with index (where url appears in urllist)

Return

  • list of timestamped documents (same order as the input urlList)

Usage

# download the pages linked from the main list page
from fundamentals.download import multiobject_download
localUrls = multiobject_download(
    urlList=["https://www.python.org/dev/peps/pep-0257/","https://en.wikipedia.org/wiki/Docstring"],
    downloadDirectory="/tmp",
    log="log",
    timeStamp=True,
    timeout=180,
    concurrentDownloads=2,
    resetFilename=False,
    credentials=False,  # { 'username' : "...", "password", "..." }
    longTime=True
)

print localUrls
# OUT: ['/tmp/untitled_20160316t160650610780.html', '/tmp/Docstring_20160316t160650611136.html']
https://i.imgur.com/QYoMm24.pngwidth=600px