Ah! well, I did it as an experiment and used lots of messy batch files to make it. I didn’t do it one by one. Also it contains pieces of an idea that didn’t work (the scattered HTML files). I did it all over a long weekend, figuring out what to do as I went along, how to autocreate the directory names, how to move 35,000+ txt files into the right places… So it’s not a “ready for publication” thing but rather “proof of concept”

Ah! well, I did it as an experiment and used lots of messy batch files to make it. I didn’t do it one by one. Also it contains pieces of an idea that didn’t work (the scattered HTML files).

I did it all over a long weekend, figuring out what to do as I went along, how to autocreate the directory names, how to move 35,000+ txt files into the right places…

So it’s not a “ready for publication” thing but rather “proof of concept”

The very HARDEST part, oddly enough, was finding a source for DDC with enough granularity in a form I could steal for the project, as the DDC is copywrited.

My final plan (not realized) was to also fill in EVERY one of the 1000 with SOMETHING, whether books or articles or papers written by anybody.

Then I’d have a complete library. But this was hard enough to do :)

Since you know Dewey (I love Dewey) – here is the classifier I used. Pushing 35,000 little txt files through it was abusive of me but I tried to stagger.

Then there was figuring out how to take the DDC probabilities and tie them back to the txt files so I could move them automatically to the right folders.. It was challenging but fun.

http://act-dl.base-search.net/webclassifier

So, imagine taking EVERYTHING you ever wrote and it automatically classified each snippet into the proper classification? Then you have a future book all ready, just with pages out of order.

From there, you just have to put the pieces together. No rewriting needed. That was my idea anyway.

*EVENTUALLY* I wanted to put the images in too.. but that turned out to be too much manual labor. I wanted a process that did “all at once” or not at all and so I did.

i that’s another thing — I wanted to auto-hotlink them but the thing I used to turn HTML into usable text files properly formatted width stripped _everything_… and putting it back wasn’t worth the time to set up, as a lot of titles have quotes, which wrecks HTML anchoring. So plain txt it stayed.

Mostly, I like that it’s retro.

I *think* this is the same data in Open Office format .https://data.mendeley.com/datasets/2ggd5pyngv/1

[My contributions got a few hundred downloads from that [two diff views of the data I think. I don’t know WHAT ppl are doing with it, if anything. Steal my personality? Create an AI me? I dunno. But it feels good to contribute.]

Udut, Kenneth (2016), “Results of AI Dewey Decimal Classifier on Kenneth Udut online written output corpus (26 yrs) (brain)”, Mendeley Data, v1 http://dx.doi.org/10.17632/2ggd5pyngv.1

 

 

Leave a comment

Your email address will not be published. Required fields are marked *


9 × nine =

Leave a Reply