Some people have their own reasons to justify their actions, If it was worth his time and personal justification to let us know, i imagine he would have knowing the value he granted to us all. But if someone doesnt want to let it be known, there isnt anything we can do but move on and start over.
Sad to see that site go. Does anyone know where else one can find specs on factory cams formatted the way he had them? I know TB has some specs on the front page, but not all the niggly numbers are there.
It would be great to get all this stuff in a torrent at some point, then it's far more difficult for anyone to take down. If I had my way it would all be public domain by now, the cars it supports are deprecated, and we should have a right to access all pertinent service information.
Yeah, I can make them available pretty soon as a flat archive of downloads. We have been trying to make the entire collection fully searchable, but I think the computing power required to do that is too expensive to maintain, so for now it'll just be download links. There's just over 1000 items in total.
Basically I just need to dump it into an S3 bucket and make a front-end to serve it all.
So where we're at is that it's hosted, but we're still organising the data. Scraping the document IDs automatically is difficult as they move around inside the documents, they are not terribly consistent in format, and many documents don't have visible part numbers at all. Many uploaders were nice enough to name their files after the Volvo part number of the document, but even then I expect 99% of visitors will want to filter by model and/or year, which is again not easy to automatically pluck from the documents with certainty.
We played around with indexing, we chucked the whole lot into ES and used Ambar as a frontend, but it was resource-intensive and also not actually terribly useful, as in a bunch of my scenarios I was looking for something like "rivet" but Volvo call it something really generic like "retaining fastener". It was good at finding wiring diagrams though.
Anyway, so we ditched Ambar, now we're looking at ES Service with Searchkit as a frontend out of S3.
If we don't get that working to a satisfactory level in the next few days, we'll switch to the alternative approach, which is recruit a bunch of people to go through the documents and mark them up in a spreadsheet, than have a script break it all up into useful metadata and organise the files from that.
If we go the mechanical turk route, I'll post here looking for some volunteers to help out.
As an aside, "final" count is 14.6GB and 2190 documents/images after auto deduplication, but we've found there are some documents that sneak through (i.e. they're the same actual document but scanned by different people).
Why don't you just post a library of GreenBooks .pdf files with labels as to models/years and Volvo subject. Trying to create an universal searchable database is far beyond my needs. If you want me to add the GB part number to the 19 files I uploaded I would be happy to but no one would search on that.
Okay to clarify, that is what we're doing, and the end result will either be exactly that or that + searching. But either way we have to actually create all those labels because right now we have 2000+ files with names like "100.pdf", which as it turns out is a 1964 122S marketing brochure introducing the availability of automatic transmissions.
So the "just give them model/year labels" is most of the work. With search indexing, the indexer is very smart and does most of the work for you.