ADHFAIS APP{ENDIX} 1: NgramReader, 2023 (OpenITI, Release ver. 2023.1.8)

General Instructions

Like Google Ngram Viewer (https://books.google.com/ngrams), [OpenITI] NgramReader charts diachronic frequencies of words and phrases, using the data of the OpenITI corpus. Unlike Google Ngram Viewer, however, it allows one to combine different morphological forms of the same lexical items together as well as to explore classes of objects. Why to combine forms? Arabic morphology is complex and the same word can appear in a large variety of forms: for example, kitāb, al-kitāb, wa-kitāb, wa-l-kitāb are instances of the same lemma and one might want to combine all or only some forms into a single entity. This approach also allows one to create thematic clusters of words (or, classes). For example, one can combine Baġdād and Madīnaŧ al-salām in order to get all mentions of the ʿAbbāsid capital; or, to combine together all cities of Ḫurāsān in order to gauge frequencies of references to Ḫurāsān in general.

SEARCHES Syntax for the searches is as follows: #ItemForTheLegend #FirstSearchItem #SecondSearchItem #NthSearchItem, that is each item must begin with #. #ItemForTheLegend is not a search item but a string that you want to show on the legend of the graph. Thus, if you were to look for mentions of Baġdād, your search line would look something like this #Baġdād #bgdAd #bbgdAd #wbgdAd #wbbgdAd. Try searching for these tokens in one line and in separate lines to see the difference. Regular expressions. The search line also supports regular expressions which make things simpler and more robust. For example, #Baġdād #bgdAd #bbgdAd #wbgdAd #wbbgdAd can be also written more concisely as #Baġdād #[wb]?bgdAd. Simplified Buckwalter transliteration is used in the NgramReader: ء = c, ا = A, إ = A, أ = A, آ = A, ب = b, ة = o, ت = t, ث = v, ج = j, ح = H, خ = x, د = d, ذ = V, ر = r, ز = z, س = s, ش = E, ص = S, ض = D, ط = T, ظ = Z, ع = C, غ = g, ف = f, ق = q, ك = k, ل = l, م = m, ن = n, ه = h, ؤ = c, و = w, ى = y, ئ = c, ي = y.

FILENAME PREFIX. You can use this option to automatically assign a specific prefix to the results that you may want to download. You can download data for the main search results, graphs as well as data for all the summaries that are generated for each search.

TYPES OF NGRAMS. You can search for unigrams (1), bigrams (2), or trigrams (3). Make sure to select appropriate ngram Type. By default, unigrams are activated. If you search for bigrams or trigram, use underscores “_” instead of spaces, i.e. kataba ilay-hi should be transliterated as ktb_Alyh. Note that you can only search one type of ngrams at a time. In most cases, it does not make sense to combine ngrams of different length in the same search, since frequencies of unigrams are usually significantly higher than those of bigrams, and the frequencies of bigrams usually significantly higher than those of trigrams.

Graphs of Relative and Absolute Frequencies of Ngrams

Download Data

The graph below will be shown only if you have searched for a single Ngram Group.

The graph below will be shown only if you have searched for a single Ngram Group. This graph might look better if you used a very long regular expression, which will not be included into the legend of the graph.

Distribution of Ngrams over Book Types: Radar Graph

Distribution of Ngrams over Book Types: Chronological perspective

Distribution of Ngrams over Book Types

Books that scored less than 0.5 (or 50%) in any of the types (“genres”) are considered to be “undefined”, since none of the currently modeled types has a dominant score.

Download Data

Results by Centuries (for each individual ngram)

Only ngrams that occur at least five times in the corpus are included.

Download Data

Results by Authors

Download Data

Results by Books

compsPelim shows the highest scoring type (“genre”). The score itself is given in compsValue; compsFinal shows the dominant type, unless the compsValue is smaller than 0.5 (i.e. 50%), in which case the type of the book is considered “undefined”.

Download Data