Investment Studio > Views > Downloads > Downloader > Scripts
A downloader script is a file (recommended location: <SCRIPTS> directory) which contains the code of one or more URL generation and/or data parsing methods.
Each download item can specify a URL generation method, called by the downloader to find out the location of new source data for the item; and a data parsing method, called by the downloader to incorporate new source data into the item's data file. It can also specify one or more arguments, which can be accessed by the script method(s) through the downloader's ActiveX interface.
Unlike a macro package, a downloader script is only loaded when it's needed, and it's not required to use the same scripting language as its peers. In principle, every method specified in every download item can reside in a different script file and rely on a different scripting engine for execution. In practice, it's convenient to keep related URL generation and data parsing methods in the same file and to write them in the same scripting language.
Script files, methods, engines
and arguments can be specified using either the single
item editor or the multiple
item editor. The script editor (used to create and modify script files)
is opened by clicking
in the single item editor.
Simple URL examples
This is the simplest possible URL method (in JScript):
function static_URL()
{
DLInput.addUrl("http://some-server.com/quotes/some_file.htm", false);
}
|
All it does is add a single, constant URL to the download item's list of data sources. For this simple task, a static URL would be a better solution. But what if you wanted to update this item's data file with information collected not from one, but from two sources? Using static URLs, you could create two separate download items, one for each URL, but apart from being messy, this would not allow you to parse the two files together. With a URL method, you can simply add the a second file to the download item's list:
function static_URLs()
{
DLInput.addUrl("http://some-server.com/quotes/some_file.htm", false);
DLInput.addUrl("http://some-server.com/quotes/some_other_file.htm", false);
// Just add more here!
}
|
This way, both files are downloaded and can then be parsed together using an appropriate parsing method (see below).
Another good reason to use a URL method is to generate ticker-dependent URLs. Suppose that you want to download quote files for 100 different stocks, all from the same server. Chances are that all files have long but near-identical URLs, with the only difference being a ticker symbol. Rather than typing the same near-identical URL into all download items, you could type it once into a URL method and then pass the ticker as an argument:
function ticker_URL()
{
// args(1) = ticker symbol
var url = "http://some-server.com/quotes/blah<TICKER>blah.htm"; // URL template
url = url.replace("<TICKER>", DLInput.args(1)); // Substitute actual ticker symbol
DLInput.addUrl(url, false);
}
|
The real merit of this solution becomes apparent when the constant ("blah") part of the URL suddenly changes, and you only have to modify one line in you URL method instead of 100 download items.
Date-dependent URLs are quite common. Suppose that every night, some server posts a file containing the day's quotes for some asset(s), and that the file's name is the day's date in ISO format (four-digit year, two-digit month number, two-digit day number). A static URL would have to be updated every day. With a JScript method, you can update the filename like this:
function date_dependent_URL()
{
var url = "http://some-server.com/quotes/<DATE>.htm"; // URL template
url = url.replace("<DATE>", Aux.y + Aux.mm + Aux.dd); // Substitute actual date
DLInput.addUrl(url, false);
}
|
This method has a drawback: it requires that you update your database every day. But if old files are also kept on the server, you can use a loop to get all files from a specified number of days in the past and up to the present:
function date_loop_URLs()
{
// args(1) = number of days back
var url;
for (Aux.dayOffset = -DLInput.args(1); Aux.dayOffset <= 0; Aux.dayOffset++)
{
if (("Sa" == Aux.weekDay) || ("Su" == Aux.weekDay)) continue; // Skip weekends
url = "http://some-server.com/quotes/<DATE>.htm"; // URL template
url = url.replace("<DATE>", Aux.y + Aux.mm + Aux.dd);
DLInput.addUrl(url, false);
}
}
|
Note how the number of days is passed as an argument, making it easy to change without having to edit the script (you can use the multiple item editor editor to set the number of days for all download items in one go).
URLs can be "crawled" (or "spidered"). When a file has been retrieved from a URL marked for crawling, the URL method is called again and given the opportunity to extract new URLs from that file's contents. Any URLs returned by this call are inserted into the download list immediately after the parent URL. They are therefore visited next, resulting in a "depth first" traversal of the link tree. This traversal order minimizes the risk that time-limited dynamic content (generated by the server in response to a previous request) will expire before it's retrieved.
| URL crawling is a recursive and therefore potentially endless process. It is the URL method's responsibility to stop adding new URLs at a finite and reasonable link level. Keep in mind that the number of URLs at the bottom level of a link tree grows exponentially with link level. For instance, if every page links to 10 other pages, the progression is 1, 10, 100, 1000, 10000, 100000... |
In order to know if and how to generate new URLs, a URL method used for crawling needs to know at which level it is in the link tree. This vital piece of information is provided by the property DLInput.currentLevel in the downloader's ActiveX interface. A currentLevel of 0 means root level (first call to the URL method); 1 means first level under the root, and so on.
The link level of a retrieved file's URL can also be crucial for determining if and how the data in the file should be parsed. To find out, use DLInput.urlLevel.
For a simple example of URL crawling, consider the case of a database-driven web site which generates XLS files on request and returns download links to them, using randomly generated filenames. In order to download these files, you could use something along these lines:
function crawler_URL()
{
// args(1) = ticker symbol
if (0 == DLInput.currentLevel)
{
// Return URL to download page
var url = "http://some-server.com/get<TICKER>.htm"; // URL template
url = url.replace("<TICKER>", DLInput.args(1)); // Substitute actual ticker symbol
DLInput.addUrl(url, true); // Add the URL to list, MARKED FOR CRAWLING
} else
if (1 == DLInput.currentLevel)
{
// Extract (and return) URL to XLS file from download page
var data = DLInput.read(1); // Get the download page's content
var re = /<a href=(data\/dynamdata\/\w+.xls)>/i; // Prepare the regular expression to match the XLS URL
var a = re.exec(data); // Apply the regular expression
if (null == a) return; // If no match, exit
DLInput.addUrl("http://some-server.com/" + a[1], false); // Add extracted URL to list (no more crawling!)
}
}
|
In the parsing method, you would then skip any URL with DLInput.urlLevel <> 1.
Parser examples
A parser method can do one of two things: it can extract quotes from retrieved files and return them to the downloader through its ActiveX interface (letting the downloader take care of saving them according to a target format string); or it can extract any kind of data from any source and write it directly to one or more targets.
The first situation is the most common one. It's handled by using the script language's pattern matching functions to extract quotes and the downloader's DLOutput.addQuote or DLOutput.addQuoteEx method to return the results. For a simple example, consider the functional equivalent of the format string
d-mmm-20yy50,o,h,l,c,v
A JScript implementation might look like this:
function csv_table_parser()
{
// Functional equivalent of format string d-mmm-20yy50,o,h,l,c,v
var data = DLInput.read(1); // Get the data retrieved by the downloader
// Prepare the regular expression to match quote records in the data
var re = /(\d\d?)\-(\w\w\w)\-(\d\d),([\d+\.?]+),([\d+\.?]+),([\d+\.?]+),([\d+\.?]+),(\d+)[\r\n]+/;
var a = null;
// Find and parse all quote records in the data
do
{
a = re.exec(data); // Match next quote record
if (null != a)
{
data = data.substring(RegExp.lastIndex, data.length); // Remove matched record from data
a[2] = Aux.getMonthNumber(a[2], "US Jan-Dec"); // Interpret month name using named month set
a[3] = Aux.getFullYear(a[3], 50, 20); // Complement two-digit year
// Return extracted quote record using DLOutput interface
DLOutput.addQuote(a[3], a[2], a[1], a[4], a[5], a[6], a[7], a[8]);
}
}
while (null != a);
}
|
This loops through the retrieved data (assumed to be a table of quote records in CSV format) and uses a regular expression to match each record. Note the use of the month set name "US Jan-Dec" in the call to the interface method Aux.getMonthNumber.
For a more interesting situation, consider a set of web pages which are updated every night to reflect the latest NAVs of a bank's mutual funds. Each fund is reported on its own row in a HTML table. You know the "ticker" symbols of all funds, but not on which page they will turn up (they move around from day to day). In order to get the NAV for a given fund, you therefore grab all report pages with a URL method like static_URLs, then you loop through them looking for the table row containing the right symbol:
function web_pages_parser()
{
// args(1) = symbol
var data = ""; // Temporary storage for web page being parsed
var urlNumber = 0; // Used to loop through retrieved pages
// Prepare regular expression to match quote line containing symbol
var s = "";
var re = "";
var s = "<TD[^>]*><[^>]*>" + DLInput.args(1) + "</[^>]*></TD>[\\s]*" +
"<TD[^>]*><[^>]*>[A-Z]{3}</[^>]*></TD>[\\s]*" +
"<TD[^>]*><[^>]*>([\\d\\.?\\']+)</[^>]*></TD>[\\s]*" +
"<TD[^>]*><[^>]*>(\\d\\d)\.(\\d\\d)\.(\\d\\d\\d\\d)</[^>]*></TD>";
var re = new RegExp(s, "");
// Loop through all retrieved web pages
var a = null;
for (urlNumber = 1; urlNumber <= DLInput.nUrls; urlNumber++)
{
data = DLInput.read(urlNumber); // Get page content
a = re.exec(data); // Look for quote line containing symbol
if (null == a) continue; // If no match, skip this URL
a[1] = a[1].replace(/\'/g, ""); // Remove thousands separator from quote
// Return date and quote using DLOutput interface
DLOutput.addQuote(a[4], a[3], a[2], "", "", "", a[1], "");
break; // If we got this far, we have the day's quote for this symbol - done!
}
}
|
By the way, if downloading all report pages for all items looks like a monumental waste of time and bandwidth, don't worry: the downloader is smart enough to compare URLs and retrieve data from each one only once.