I want to obtain a list of all images on my wiki by file size. I'm not talking about the table that is generated by Special:ListFiles, which has way too much extraneous information for my purposes. I just mean a "raw" list of the names of the files, ordered from biggest to smallest, kind of like how pagegenerators.py will give you a list of things. So, something like:
File1.jpg File2.jpg File3.jpg
I'm not seeing an obvious way to do it through DPL; as far as I can tell "file size" isn't a parameter on offer. Anyone got any suggestions?
- From what I understand, there's no way to do that directly. Copying ListFiles into Excel is probably your best bet. You could also use the
allimagesAPI, but even then the files aren't listed by size. ʞooɔ
15:39, September 21, 2012 (UTC)
- Of course! API to the rescue! Why didn't I think of that? The answer — though perhaps an inelegant one — turned out to be http://tardis.wikia.com/api.php?action=query&list=allimages&ailimit=1000&&aiminsize=1000000&aiprop=size Then, it was just a matter of sprinkling some regex dust on the results to extract the raw title name. Now, this doesn't, in fact, sort by size, but rather alphabetically by file name. But at least it's a list of all files that are bigger than 1mb.
- Hey, since I'm not really that au fait with API, I was wondering what you meant by importing into Excel? Do you have to specify a particular format that makes a better Excel import? Is it literally copying an pasting, or do you do a data import? Do you think you could take a moment to walk me through how you go from the page generated by the above API URL to a usable spreadsheet in Excel? Thanks :)
- Rather than doing this in Excel, I think this would a great thing to do on the wiki itself, that way other editors can potentially do this. I've put together a quick example of what such a tool could look like at w:c:mathmagician:Allimages. Take a look at it and let me know if that looks like something that might work for you. (Note: it's not fully built yet, it's just a working demo)
- That's really rather fabulous, MM.
(Reset indent) Cool :) -- let me know if there's any more features you want for the tool. I could just do a little more testing and basically give it to you "as-is", if all you really care about is the file size.
Or, I could take a day or two to build even more features into it if you think they'd be potentially helpful. Examples of features that could be added to this tool:
- Ability to populate the table based on mime type. (e.g. I could add checkboxes into the interface so you can do something like "I want only videos, or, I want to search for only png's and gif's)
- Ability to sort by timestamp
- Ability to sort by user who uploaded the file
- Ability to not only set minimum file size, but also maximum file size.
- Ability to look for images that begin with a certain prefix
The API can do all of these things, as I'm sure you saw when you were looking at it. It'd just be matter of building a user interface to go along with the table that allows you to conveniently set these sort of options (i.e. making it user friendly). And then packaging this tool in a way that's easy to install or copy onto other wikis in case other people wanted to use it.
If you do still want to know how to do this in Excel, hopefully Cook can explain that. I don't have Excel on my main computer, unfortunately, and I'm not very good with spreadsheets :P.
- Well, all I was really looking for was a way to output the raw file names, so that I could then add a category to the page and quickly delete them. (Tardis has a "ye shall not upload bigger than 1mb images rule".) However, this tool would surely be helpful in monitoring compliance after the initial round of deletions is done, so I certainly have a use for this. I suppose I'm also interested in just kicking the tires on it, since the same basic method would likely work with other API queries.
- All of which is a long way of saying that I'd like the following, please:
- Population by mime type
- Sorting by timestamp
- Prefix lookup
- Sorting by uploader
- I really am very excited by all this. Thanks for broadening my mind on the API possibilities.
- Alright, this proved to be a bit more time consuming than I originally thought, but I've finally put together a v1.0 of a form for the API queries that incorporates many of these ideas. The script is at w:c:dev:ListFiles, installation instructions and more can be found there! (the name is designed to remind you of Special:ListFiles, which is similar but less customizable).
- I'm sorry for taking over 48 hours to publicly thank you for this one, but I've been locked away doing a lot of little niggly things here and there. I freakin' love this thing. Maybe you are a genius after all ... :)