the keywords are sources and the categories are destinations.
the categories i formed by repeatedly running latent dirachalet ..LDA — i used a 30 day trial of WordStat 8 but it works the same if you find github code that does it — you manually chose the number of categories you want it to find. i did 2 then 3 then 4 through 67 then i put all of that together in a single long spreadsheet so i could find our what was in common between the different categories it found as there was a lot of repetition of similar categories found but not identical.
i reformed the excel spreadsheet so that there was but two columns – keywords in one and categories in another. that’s what i fed into Cytoscape which is what i am using for mapping but yED gephi or others also may work.
the raw unformatted text was a 40mb text file containing whatever writings of mine i collected, one line per paragraph but it could be sentences too or whole documents per line.
raw to wordstat 8 to cytoscape
but there’s many ways to do this. this is just the one i stumbled upon with trial and error over the last couple of days.
but yeah i wanted my ontology to form itself from my own words because i was never happy with other people’s ontologies. so what’s mine?