Donate $25 for two DVDs of the Cryptome collection of files from June 1996 to the present

Natsios Young Architects


6 August 2010


A sends:

If you run the following commands on a Windows machine (and have Cygwin installed!) you can produce a list of all words in the Wikileaks Afghan War Diary AFG.CSV file. You can also produce a list of words by frequency.

http://cryptome.org/0002/afg/afg_list.txt.zip (2.9MB)

http://cryptome.org/0002/afg/afg_freqlist.txt.zip (2.8MB)

__________

Commands

REM This is a windows batch file that sequences CYGWIN Unix utils.
REM This batch file makes a list of all words in the Afghan War Diary CSV file, with frequencies.
REM remove the formatting crap
c:\cygwin\bin\tr [:space:][:blank:][:punct:]  \n < afg.csv > afg.tr
REM sort alphabetically, ignore case
c:\cygwin\bin\sort -f -b -d   <afg.tr >afg.srt
REM filter out duplicates; the -c adds counts to the output file
c:\cygwin\bin\uniq -c < afg.srt > afglist.txt
REM list by frequency
c:\cygwin\bin\sort < afglist.txt > afgfreq.txt

Sample output:

17 PAID
      4 Paid
    555 paid
      1 paided
      1 Paien
      2 Paienda
      2 paient
      1 PAIL
     23 pail
      1 PAILS
      1 Pails
      5 pails
      2 PAIMAKTHU
      2 Paiman
      1 PAIMONAR
    103 PAIN
     53 Pain
    469 pain
      1 PAINBAGH
      1 Painda
      2 Paindah
      1 Paindai
      1 Paindakhel
      2 PAINFUL
      1 Painful
      9 painful
     73 PAINKILLER
      7 Painkiller
      3 painkiller
      1 painkillers
      1 PAINOP
     14 PAINS
      4 Pains
     47 pains
      1 PAINSTAKINGLY
      1 painstakingly
      9 PAINT
     10 Paint
    100 paint
      6 PAINTED
      1 Painted
     43 painted
      2 painter
      2 painters
      1 Painting
     25 painting
      2 paintings
      1 paints
      1 painx7days
     14 PAIR
     63 Pair
     94 pair
      7 paired
      9 pairing
      2 Pairouz
     12 Pairs
     65 pairs
      2 PAITENT
      2 Paitent
      1 paitent
      1 paitent1
      2 PAITENTS
      1 paitents
      1 paitient
      1 Paiwar
      1 Paiyan
      2 Paj
      1 PAJA
      4 Pajak
      3 pajamas
      2 Pajan
      1 Pajero
      1 Pajhowk
      1 Pajhwak
     13 PAJHWOK
     60 Pajhwok
      4 Pajwahlk
      1 Pajwaye
      2 Pajwok
    395 PAK
      2 PAk
     56 Pak
      5 pak
      5 PAK1
      1 PAK10
      2 PAK5
      2 PAK6
      2 PAK7
      1 paka
      1 PAKAISTAN
      3 Pakastan
      2 PAKASTANI
      1 Pakastani
      1 Pakatya
      1 PAKEL
      1 Pakela
      2 PAKGOV
      1 Pakh
      3 Pakhta
     11 PAKI
     14 Paki
      1 Pakika
      1 PAKIML
      2 Pakiran
      1 PAKIs
      4 Pakis
      1 Pakisatn
      2 PAKISATNI
    280 PAKISTAN
   1608 Pakistan
      7 pakistan
      1 PAKISTANFIRE
     81 PAKISTANI
    436 Pakistani
      8 pakistani
      1 Pakistanies
      7 PAKISTANIS
     28 Pakistanis
      9 Pakistans
      1 Pakistanwhere
      1 Pakita
      1 Pakitani
      2 PAKITIKA
     10 PAKITKA
      9 Pakitka
      1 pakitstan
      1 Pakko
      1 PAKKTIKA
    985 PAKMIL
     20 PakMil
     46 Pakmil
      2 pakmil


By frequency

  27610 OF
  27610 UPDATE
  27776 FRIEND
  28001 PRT
  28207 07
  28404 No
  28447 12
  29097 2009
  29113 Enemy
  29398 ANA
  29768 AND
  30763 None
  31141 FOB
  31410 SAF
  32054 by
  32975 SOUTH
  33550 TO
  33559 10
  33719 x
  33859 Event
  34588 DAM
  35262 ISAF
  35648 INJ
  37022 FALSE
  37426 MANAGER
  38022 an
  39048 S
  39199 ANP
  39947 EAST
  41243 SIGACTS
  42246 RPT
  42423 Action
  42482 for
  43946 were
  44320 is
  44425 CF
  45493 FF
  49040 THE
  49914 3
  49939 At
  50251 from
  52726 RED
  53697 2007
  54048 on
  58090 The
  61747 that
  71269 reported
  73062 UNKNOWN
  73315 SECRET
  77686 IED
  77998 with
  80351 was
  81760 2
  84327 at
  89211 in
  91684 00
  93430 1
  97215 RC
 103609 A
 114819 a
 122292 ENEMY
 134112 TF
 144324 of

__________