Around 1974 or so I attended a lecture by a Yale professor about monkeys: if you have a room full of monkeys, each banging on a typewriter, eventually you’ll get all of Shakespeare’s works. Using a random number generator to generate characters simulates this process with fewer bananas required.
Of course, not all letters occur with the same frequency. If the typewriter noticed that the monkey typed a “Q”, then it could force the next letter to be “U” with a particular probability. Similarly for other letter pair combinations. One can imagine using some existing text as a model from which to calculate frequencies of letter pairs. How often is “t” followed by “h” ? by “a” ? One can also imagine that extrapolating such 2 letter patterns to n letter patterns can produce even better results.
I wrote a program in C about 20 years ago to simulate the monkeys. I wrote it again in VFP, and found it surprisingly simple to write.
This program can be used on other text, such as programs, or different languages, such as German.
Try experimenting with different pat lengths.
WarAndPeace.Txt can be found at
http://www.calvinhsia.com/vfp/monkeys.aspCLEAR ALL
CLEAR
nPatLen=18
CREATE CURSOR letters (chrs c(nPatLen), cnt i)
INDEX ON chrs tag chrs
fd=FOPEN("warandpeace.txt")
IF fd <0
?"err opening file"
RETURN
ENDIF
cprior=""
nCnt=0
do while !FEOF(fd) and !CHRSAW() and nCnt<100000
ch=FREAD(fd,1)
nAsc=ASC(LOWER(ch))
IF (nAsc>96 and nAsc <=122) or nAsc=32 or nasc=13 or nasc=10
nCnt=nCnt+1
IF LEN(cprior) >= nPatLen
cprior=SUBSTR(cprior,2)+ch
IF SEEK(cprior)
REPLACE cnt with cnt+1
ELSE
INSERT into letters values (cprior,1)
ENDIF
ELSE
cprior=cprior+ch
ENDIF
ENDIF
ENDDO
FCLOSE(fd)
LOCATE
BROWSE last nowa
ACTIVATE WINDOW (PROGRAM())
GO INT(RAND()*RECCOUNT()+1)
cPrior=chrs
do while !CHRSAW()
cLast=SUBSTR(cPrior,2)
SEEK cLast
SUM cnt while chrs = cLast to nCnt
nr=INT(RAND()*nCnt+1)
SEEK cLast
SCAN while nr >0
nr=nr-cnt
ENDSCAN
SKIP -1
cNewLet=RIGHT(chrs,1)
??cNewLet
cPrior=SUBSTR(cPrior,2)+cNewLet
ENDDO
INKEY()