Ah! well, I did it as an experiment and used lots of messy batch files to make it. I didn’t do it one by one. Also it contains pieces of an idea that didn’t work (the scattered HTML files).
I did it all over a long weekend, figuring out what to do as I went along, how to autocreate the directory names, how to move 35,000+ txt files into the right places…
So it’s not a “ready for publication” thing but rather “proof of concept”
—
The very HARDEST part, oddly enough, was finding a source for DDC with enough granularity in a form I could steal for the project, as the DDC is copywrited.
—
My final plan (not realized) was to also fill in EVERY one of the 1000 with SOMETHING, whether books or articles or papers written by anybody.
Then I’d have a complete library. But this was hard enough to do
🙂
Since you know Dewey (I love Dewey) – here is the classifier I used. Pushing 35,000 little txt files through it was abusive of me but I tried to stagger.
Then there was figuring out how to take the DDC probabilities and tie them back to the txt files so I could move them automatically to the right folders.. It was challenging but fun.
http://act-dl.base-search.net/webclassifier
—
So, imagine taking EVERYTHING you ever wrote and it automatically classified each snippet into the proper classification? Then you have a future book all ready, just with pages out of order.
From there, you just have to put the pieces together. No rewriting needed. That was my idea anyway.
—
*EVENTUALLY* I wanted to put the images in too.. but that turned out to be too much manual labor. I wanted a process that did “all at once” or not at all and so I did.
—
i that’s another thing — I wanted to auto-hotlink them but the thing I used to turn HTML into usable text files properly formatted width stripped _everything_… and putting it back wasn’t worth the time to set up, as a lot of titles have quotes, which wrecks HTML anchoring. So plain txt it stayed.
Mostly, I like that it’s retro.
I *think* this is the same data in Open Office format .https://data.mendeley.com/datasets/2ggd5pyngv/1
—
[My contributions got a few hundred downloads from that [two diff views of the data I think. I don’t know WHAT ppl are doing with it, if anything. Steal my personality? Create an AI me? I dunno. But it feels good to contribute.]
Udut, Kenneth (2016), “Results of AI Dewey Decimal Classifier on Kenneth Udut online written output corpus (26 yrs) (brain)”, Mendeley Data, v1 http://dx.doi.org/10.17632/2ggd5pyngv.1
—
[responsivevoice_button voice="US English Male"]