Tuesday, September 4, 2018

UNnatural Language Processing

I was reading about Natural Language Processing - i'm trying to learn how to noodle around with Python. Why? Well, I was going to use it to extract all sorts of cool words from the influences upon D&D, and create tables for Mythic GM Emulator. That's down the road, I think but I'm grokking the prinicples and it's not a bad way to learn.

I'm still mucking around with Python, but I did install it, read up on it, installed the Natural Language Toolkit, and a bunch of packages.

Meanwhile, I give you the 100 most common word-clusters/Noun-Phrases from all the prose stories that HPL wrote hisself. It looks like the most common non-stop words (that is, not included in the most frequently occuring English words) are: Time, Old, House, and Night. But Never and Strange are fairly close behind (time and old are far away more frequent). These were collected by software called KHCoder 3, and I got the corpus from GothicChic's pdf, converted to text. I will scrub the stopwords, maybe, but then all the fun stats change and you never know what you'll get that way.

old man
old ones
one night
mr. ward
great ones
great race
black stone
such things
first time
other gods
one thing
charles ward
dr. willett
terrible old man
old days
old house
young man
joseph curwen
other things
nameless city
other hand
one day
table of contents
next day
other side
strange things
young ward
ancient house
young men
great old
new england
old woman
one side
old town
old people
unknown kadath
old whateley
million years
old men
earth's gods
outside world
many years
old legends
many things
old folk
old world
cold waste
same time
old joseph curwen
black man
old sea
randolph carter
old tales
bearded man
dead city
last night
great abyss
one man
small hours
one place
sunset city
human world
old zadok
two years
ground floor
open space
old times
dead man
old bugs
stone floor
kind o
other times
human beings
certain things
the street
mrs. ward
old ephraim
old keziah
waking world
town street
other end
water street
great old ones
land city
outer world
new city
house of stone
aout o
white ship
great cthulhu
attic room
black abyss
one time
great deal
strange days
next morning
dark man
two men
other time
new york


WHO KNOWS WHAT THE FUTURE MIGHT BRING? THE STARS ARE NEARLY RIGHT

1 comment:

  1. Awesome! Python is a lot of fun. I used it for my random adventure creator for my Weird Adventure Wednesday posts. It's not as good at fine-grained text manipulation as Perl, but it's sooooo much cleaner. I have a bunch of data files that you're welcome to use if you are interested.

    ReplyDelete