SQLite Forum

Better download file naming
Login

Better download file naming

(1) By nolasd on 2021-06-26 20:34:51 [link] [source]

Hello,

There are a couple of Sqlite packages in the Npackd repository: https://www.npackd.org/p?q=sqlite&sort=stars&category0=&repository=

The new versions are detected by downloading the contents of https://www.sqlite.org/download.html and searching via a regular expression like this one

32-bit DLL (x86) for SQLite version ([d.]+).

... and creating a version using a template like this one:

https://www.sqlite.org/2021/sqlite-dll-win32-x86-${v0}${v1}0${v2}00.zip

Unfortunately the naming of the files like sqlite-amalgamation-3360000.zip does not allow a clear version detection. Is it 3.36 or 33.6? The usage of the current year in the URL does not help either.

Would it be possible to change the naming scheme? I guess other package managers also have this problem.

(2) By Larry Brasfield (larrybr) on 2021-06-26 21:06:47 in reply to 1 [link] [source]

Consulting Build Product Names, I see no possible ambiguity. The file naming is a fixed-field format, as documented there, and readily decoded by a simple routine written in virtually any modern programming language.

Can you encode more than one version using that format and get the same filename result? I do not see how to do that.

(3) By Stephan Beal (stephan) on 2021-06-26 21:24:50 in reply to 2 [link] [source]

Can you encode more than one version using that format and get the same filename result? I do not see how to do that.

The OP is mistaken about that but not:

The usage of the current year in the URL does not help either.

Having to know the year of a release is irritating every January.

(4) By Larry Brasfield (larrybr) on 2021-06-26 23:27:25 in reply to 3 [link] [source]

Yeah, in my haste to correct a mistake on the internet, I overlooked that related valid issue.

Richard, in apparent response to the OP's expressed difficulty, has checked in a revision to the download page source which embeds an HTML comment. That comment's first line reads: "Download product data for scripts to read". Its subsequent lines comprise a CSV table whose column headers are: PRODUCT,FILENAME,VERSION,SIZE-IN-BYTES,SHA3-HASH . This will be documented in the Build Product Names section, either as-is or possibly with further improvements if worthwhile.

With that aid, script writers should have little trouble getting the version, filename, and etc. of the downloadable objects.

(5) By doug (doug9forester) on 2021-06-27 02:17:03 in reply to 4 [link] [source]

I think the date of the version should be in the CSV table, too.

(6) By nolasd on 2021-06-28 18:18:05 in reply to 4 [link] [source]

Thank you for the changes. This unfortunately does not help me as I use a very primitive regular expression where the first search group should contain the version number. Maybe this helped some other package managers.

Is there any chance we can get something like

https://sqlite.org/downloads/sqlite-amalgamation-3.36.zip

?

(7) By Larry Brasfield (larrybr) on 2021-06-28 18:23:28 in reply to 6 [link] [source]

Will you please describe what you are trying to accomplish, absent any preconceived idea as to how to do it? I have found that guessing use cases sometimes misses the mark.

(8) By nolasd on 2021-06-28 18:30:40 in reply to 7 [link] [source]

As every other package manager I try to keep the packages up-to-date. This is done by downloading https://www.sqlite.org/download.html and searching via a regular expression like this one

32-bit DLL (x86) for SQLite version ([d.]+).

... and creating a version using a template like this one:

https://www.sqlite.org/2021/sqlite-dll-win32-x86-${v0}${v1}0${v2}00.zip

This fails too often: every year because the year number changes and each time the version number changes from 1 to 2 digits and back.

(9) By RandomCoder on 2021-06-28 19:57:06 in reply to 8 [link] [source]

Would something like this not work (python code here, but I'd imagine it's straightforward enough)?

# Data is https://sqlite.org/download.html
m = re.search("PRODUCT,(.*sqlite-dll-win32.*zip),([0-9.]+),", data);
url = "https://sqlite.org/" + m.group(1)
ver = m.group(2)

(10) By nolasd on 2021-06-28 21:21:30 in reply to 9 [link] [source]

If I'd have a programming language, this would not be a problem. You're right.

But I don't have one. I have a regular expression and a template and the restriction that the group 1 should contain the version number.

You could say that this is too restrictive, but it covers 99% of all packages. That is why I'm asking here for a change.

If there are no other considerations like backwards compatibility, sqlite-amalgamation-3.36.zip is better for a human than sqlite-amalgamation-3360000.zip.

(11) By Larry Brasfield (larrybr) on 2021-06-28 21:45:19 in reply to 10 [link] [source]

What scripting system is implementing the RE matching? What class of RE can it use? (Perl RE, aka "extended regular expression", or TCL advanced RE, or must it be whatever egrep understands?)

I still do not understand the objective. Can you state what information is to be extracted and to what purpose? I am guessing that you want to get a URL via which the downloadable product can be obtained via HTTP(S) GET and an indication as to what version it is. That is a low-level objective, in service of something like "Detect when a newer version exists, know what it is, and fetch it."

Please state something more useful than a demand to change the existing product filenames. For all we know, others are relying on the existing format. Failing that, what are the constraints on the RE and what is this "template" and its function?

(12) By nolasd on 2021-06-29 20:00:15 in reply to 11 [link] [source]

There is no scripting available. The web application is in Java. A regular expression can be stored per package that is applied to the contents of a web page (https://www.sqlite.org/download.html).

The necessary information that should be extracted is the version number and the download URL. The objective is to detect the newest version of Sqlite.

There is one additional constraint that the version number must be in the first matching group.

I understand that other systems may already rely on the format of the download links, but the current format makes it too complicated and hard to ready by humans. I would have to extend my application to handle Sqlite.

Something like https://www.sqlite.org/download/sqlite-amalgamation-3.36.zip could be done without any modifications to the web application on www.npackd.org.

(13.1) By Larry Brasfield (larrybr) on 2021-06-29 21:04:21 edited from 13.0 in reply to 12 [link] [source]

Please humor me by answering: What are the constraints on the RE and what is this "template" and its function?

Here is a head start on the answer. Please correct and amend as needed.

  1. A single regular expression, applied to a known web page content read via HTTP GET, must yield two data items within its capture groups.

  2. One data item is the product version number, which must be the entirety of the 1st capturing matching group.

  3. The other data item is a URL, which must be the entirety of another capture group, which must be the Nth matching capturing group where N is known and greater than 1.

  4. The regular expression must be one understood by java.util.regex.* and it may use non-capturing groups to facilitate meeting other constraints here.

  5. The template provides these inputs to the page read and processing:

  6. the known web page URL;

  7. the regular expression;

  8. and which capture group will contain the product URL.

  9. A successful match must provide a valid product URL and version, and if such are not both available at the known web page, the RE must not match.

  10. One template exists for each product, where "product" is a regularly published archive or binary, updated upon successive releases.

Pretend that I am able to help you if I correctly understand the constraints, but will not be able or willing to simply change the product URLs.

(14) By nolasd on 2021-06-29 21:13:40 in reply to 13.1 [link] [source]

I don't like your tone and I do not see any will from your side to do anything. I will stop posting here.

Bye.

(16) By Larry Brasfield (larrybr) on 2021-06-30 00:06:56 in reply to 14 [link] [source]

What you read as my "tone" is yet another attempt1 to get past your insistence upon a single way to solve your problem2 and understand what the solution constraints are. Richard spent some of his valuable time trying to help solve it. Given that the new HTML comment product table was not going to help you, as now generated, I thought that a reordering of fields or some extra field might well suffice. But I am unwilling to blindly stab at revisions to it. Hence my repeated effort to get at the actual constraints. You should not be surprised or offended that such efforts became increasingly explicit and again included dissuasion of yet another response resembling "If only the URL was formed as ...".

For what it's worth, to you or others with a similar issue, here is a Perl script which filters the download page content so as to yield the URL and version of a particular downloadable SQLite utility archive: while ($_ = <>) { if (my ($urlfrag, $version) = m/^PRODUCT,(\d+\/sqlite-tools-win32-x86-\d+\.zip),(3\.[\d\.]+),/) { printf("URL: https://sqlite.org/%s , Version: %sn", $urlfrag, $version); } }

Piping the output of wget -O - https://sqlite.org/download.html 2>nul through that filter will emit the download URL and object version.

As far as I can tell, absent better information, this "solution" has only one shortfall as far as its regular expression is concerned: The two data items are in the wrong order for one particularly limited consumer. If that was the only shortfall, I would check in a revision of that HTML table generator today to put the fields in a more palatable order.

I have deferred documenting that HTML comment table structure because I thought it might evolve in response to suggestions from you and others. We already have one helpful suggestion from Doug. If we can get past attitude problems here, there may be other useful changes.


  1. Previous, unsuccessful attempts can be seen in posts 7 and 11.

  2. We presume that others may have similar issues, so the problem is not strictly "yours".

(15) By RandomCoder on 2021-06-29 21:16:07 in reply to 10 [link] [source]

Rather than getting the website to match the package manager, wouldn't this work for this package manager?

Discovery page (URL): 
https://sqlite.org/download.html
Discovery regular expression:
[0-9]+/sqlite-dll-win32[^,]+zip
Discovery package download URL pattern: 
https://sqlite.org/${match}

(17) By Larry Brasfield (larrybr) on 2021-06-30 00:21:03 in reply to 15 [link] [source]

I suppose by "this package manager" you mean the web application on serving www.npackd.org whose source is at https://github.com/npackd/npackd-gae-web . If so, could you please say where you found enough documentation or examples to derive your suggested 6 lines? Is that part or all of the "template"?

(18) By RandomCoder on 2021-06-30 00:37:20 in reply to 17 [source]

I was curious, since I'm responsible for a small package manager, and this sort of problem isn't new to it. I've absolutely run across situations like SQLite where the version number alone isn't enough to find the download. I wasn't able to find enough documentation. I went looking at existing packages and found one that followed the pattern I used:

https://npackd.org/p/kodi

(I think you'll need to "login" to the website to see the form showing the regexp capture and replace group)

I'll admit I know nothing about this package manager beyond this little bit of digging. I was just curious how it worked, since the package managers I've worked with before have at least some basic scripting to solve this sort of issue. Furthermore, for all I know this npackd is somehow different from the npackd that nolasd was referring to.

(19) By Larry Brasfield (larrybr) on 2021-06-30 01:05:38 in reply to 18 [link] [source]

Thanks for the tips.

It appears that the usage of REs by that package collector is a bit more sophisticated than I was led to believe. And it looks as if my blind stab at the constraints in post #13.1 is close enough to right that a simple reordering of the HTML comment table fields will suffice to get npackd.org's package management able to get what the OP wanted. That should be easy enough to do.

(20.1) By Larry Brasfield (larrybr) on 2021-06-30 21:17:49 edited from 20.0 in reply to 1 [link] [source]

(Edited for tense.)

A change has just been checked in to the docs repo for the download page which has been pushed to the SQLite website. Consequently, filtering https://sqlite.org/download.html content through this RE: /^PRODUCT,(3\.[\d\.]+),(\d+\/sqlite-dll-win32-x86-\d+\.zip),/ will yield one match with 2 sub-match groups. Match group #1 (in java parlance) will be the version in conventional, dotted numeric format. Match group #2, when prepended with "https://sqlite.org/", provides the URL for the most recent downloadable binary named "sqlite-dll-win32-x86-3XXYYZZ.zip" where XXYYZZ is the documented, fixed-field filename version substring corresponding to the dotted version.

Because npackd.org's web app is fully capable of dealing with this scheme via appropriate values for its configuration "Discovery regular expression" and "Discovery package download URL pattern" fields, (which can use arbitrarily numbered capture groups to compose that URL), I consider the OP's problem to be solved, whether or not he benefits from this change.

The HTML comment containing a CSV table which enables this solution is documented on the download page, near where file naming is documented.