JGET Java Utility
JGet is handy utility (written in Java), that enables to download a series of files from the Web. It is not a typical web spider, it uses rather an "url pattern", which defines a sequence of urls and this sequence is then downloaded. To give you closer idea how it works here is quick example:

The url pattern for JGet is for instance "http://www.sample.com/lib/pic{ENU march april june}{NSEQ 01 31}.gif - when entered in JGet this pattern will expand (evaluating functions  in curly braces) to following sequence of urls:
http://www.sample.com/lib/picmarch01.gif
http://www.sample.com/lib/picmarch02.gif
http://www.sample.com/lib/picmarch03.gif
...
http://www.sample.com/lib/picmarch31.gif
http://www.sample.com/lib/picapril01.gif
...
http://www.sample.com/lib/picmarch01.gif
...
http://www.sample.com/lib/picjune31.gif

And these urls are then downloaded by JGet to defined directory. Main advantage of this approach is that it can get files using directly their names/urls and not link from other pages. It enables easily to get files which are not linked at any page, files, which are referred by dynamically generated urls  and it can be very specific about which files which have to be downloaded.

Main features of JGET includes:

  • Easy and flexible url patterns, which enable to quickly define url sequence to be downloaded
  • Enables either direct download of expanded urls, or the expanded url is parsed for containing urls and only these urls (after matching to regular expression filter) are then downloaded
  • Extensible definition of functions for url patterns - just create a class implementing one interface
  • Graphical User Interface, which enables easy entry of parameters and provides nice overview about download progress, or
  • Command Line Interface
  • Ready for localization
  • Multithreaded download - number of parallel download threads can be set to optimize download performance
  • Open Source & Free

GUI
Here is a picture of main application screen:

Url pattern - put url pattern there
Extact urls - instead of directly saving files from extracted urls, it gets HTML text from url, parses it for other urls, which compares with given regular expression and only matching urls are downloaded.
with this pattern (regex) regular expression used, if "Extract urls" is selected.
Download/start - starts the download
File/Settings ... - set parameters of the program, especially directory, where downloaded files are saved (should be set before download), the settings are then saved on program exit in jget.properties file.

Url patterns
JGet enables to enter so called url pattern - it is url with inserted "functions", which generate sequences of strings (like 1, 2, 3, ... or A, B, C, ..). Generated strings are then used to replace function writing in the url. There can be as many functions in the url pattern as needed and only one function changes at time, (starting from right), so all combinations are generated. For practical sample how it works look above.

Functions are surrounded in curly braces { }, braces contain one or more string tokens separated by space. First token is name of function, others are function parameters. Currently there are only two functions, but others can be easily added (see below):

{NSEQ x y z} -  numeric sequence, x is first number (defaulted to 1), y is last number (defaulted to integer.MAX_VALUE) and z is step (defaulted to 1); resulting string is justified with zeros to the same size as first parameter has, so for instance, if first parametr is 001, than the sequence is 001, 002, 003 ...

{ENU token1 token2 ... tokenn} - enumeration, each parameter is one value of the sequence.

How to add new functions? Just implement interface ivan.jget.SeqGenerator. The implemented class has to be in package ivan.jget.generators. Name of new class is also name of the function, any parameters are passed as string array to setParams method. See source for existing functions implementation for the guidance - it fairly straightforward.

Command line

jget [-d directory] [-e n] [-x extractPattern] URLPattern

-d directory, where to save downloaded files
-e number of errors tolerated, before download is stopped
-x from expanded urls extract other urls matching this extractPattern (regular expression) and download these extracted urls
URLPattern - url pattern

If started with command line parameters, program woks in command line mode, otherwise starts GUI interface.

Downloading JGET

JGET is available under GPL license. You can download binary here and complete source (Eclipse project) here.
To run binary use java -jar jget.jar - the gui interface will start ( then do not forget to set basic settings especially download directory - the directory should exist).  
Concerning source I was little bit experimenting with AspectJ - but all aspects are development time aspects, so AspectJ is not needed to build fully functional product.


Last site update on 30/08/2013