I was reading about Natural Language Processing - i'm trying to learn how to noodle around with Python. Why? Well, I was going to use it to extract all sorts of cool words from the influences upon D&D, and create tables for Mythic GM Emulator. That's down the road, I think but I'm grokking the prinicples and it's not a bad way to learn.
I'm still mucking around with Python, but I did install it, read up on it, installed the Natural Language Toolkit, and a bunch of packages.
Meanwhile, I give you the 100 most common word-clusters/Noun-Phrases from all the prose stories that HPL wrote hisself. It looks like the most common non-stop words (that is, not included in the most frequently occuring English words) are: Time, Old, House, and Night. But Never and Strange are fairly close behind (time and old are far away more frequent). These were collected by software called KHCoder 3, and I got the corpus from GothicChic's pdf, converted to text. I will scrub the stopwords, maybe, but then all the fun stats change and you never know what you'll get that way.
| old man |
| old ones |
| one night |
| mr. ward |
| great ones |
| great race |
| black stone |
| such things |
| first time |
| other gods |
| one thing |
| charles ward |
| dr. willett |
| terrible old man |
| old days |
| old house |
| young man |
| joseph curwen |
| other things |
| nameless city |
| other hand |
| one day |
| table of contents |
| next day |
| other side |
| strange things |
| young ward |
| ancient house |
| young men |
| great old |
| new england |
| old woman |
| one side |
| old town |
| old people |
| unknown kadath |
| old whateley |
| million years |
| old men |
| earth's gods |
| outside world |
| many years |
| old legends |
| many things |
| old folk |
| old world |
| cold waste |
| same time |
| old joseph curwen |
| black man |
| old sea |
| randolph carter |
| old tales |
| bearded man |
| dead city |
| last night |
| great abyss |
| one man |
| small hours |
| one place |
| sunset city |
| human world |
| old zadok |
| two years |
| ground floor |
| open space |
| old times |
| dead man |
| old bugs |
| stone floor |
| kind o |
| other times |
| human beings |
| certain things |
| the street |
| mrs. ward |
| old ephraim |
| old keziah |
| waking world |
| town street |
| other end |
| water street |
| great old ones |
| land city |
| outer world |
| new city |
| house of stone |
| aout o |
| white ship |
| great cthulhu |
| attic room |
| black abyss |
| one time |
| great deal |
| strange days |
| next morning |
| dark man |
| two men |
| other time |
| new york |
|
WHO KNOWS WHAT THE FUTURE MIGHT BRING? THE STARS ARE NEARLY RIGHT
Awesome! Python is a lot of fun. I used it for my random adventure creator for my Weird Adventure Wednesday posts. It's not as good at fine-grained text manipulation as Perl, but it's sooooo much cleaner. I have a bunch of data files that you're welcome to use if you are interested.
ReplyDelete