conc
  1. API
  2. keyness
  • Introduction to Conc
  • Tutorials
    • Get Started with Conc
    • Quick Conc Recipes
    • Installing Conc
  • Explanations
    • Why Conc?
    • Anatomy of a corpus
    • Performance
  • Development
    • Releases
    • Roadmap
    • Developer Guide
  • API
    • corpus
    • conc
    • corpora
    • frequency
    • ngrams
    • concordance
    • keyness
    • collocates
    • result
    • plot
    • text
    • core
  1. API
  2. keyness

keyness

Functionality for keyness analysis.

source

Keyness

 Keyness (corpus:conc.corpus.Corpus, reference_corpus:conc.corpus.Corpus)

Class for keyness analysis reporting.

Type Details
corpus Corpus Corpus instance
reference_corpus Corpus Corpus for comparison

source

Keyness.keywords

 Keyness.keywords (effect_size_measure:str='log_ratio',
                   statistical_significance_measure:str='log_likelihood',
                   order:str|None=None, order_descending:bool=True,
                   statistical_significance_cut:float|None=None,
                   apply_bonferroni:bool=False,
                   min_document_frequency:int=0,
                   min_document_frequency_reference:int=0,
                   min_frequency:int=0, min_frequency_reference:int=0,
                   case_sensitive:bool=False, normalize_by:int=10000,
                   page_size:int=20, page_current:int=1,
                   show_document_frequency:bool=False,
                   exclude_tokens:list[str]=[],
                   exclude_tokens_text:str='',
                   restrict_tokens:list[str]=[],
                   restrict_tokens_text:str='',
                   exclude_punctuation:bool=True)

Get keywords for the corpus.

Type Default Details
effect_size_measure str log_ratio effect size measure to use, currently only ‘log_ratio’ is supported
statistical_significance_measure str log_likelihood statistical significance measure to use, currently only ‘log_likelihood’ is supported
order str | None None default of None orders by effect size measure, results can also be ordered by: frequency, frequency_reference, document_frequency, document_frequency_reference, log_likelihood
order_descending bool True order is descending or ascending
statistical_significance_cut float | None None statistical significance p-value to filter results, e.g. 0.05 or 0.01 or 0.001 - ignored if None or 0
apply_bonferroni bool False apply Bonferroni correction to the statistical significance cut-off
min_document_frequency int 0 minimum document frequency in target for token to be included in the report
min_document_frequency_reference int 0 minimum document frequency in reference for token to be included in the report
min_frequency int 0 minimum frequency in target for token to be included in the report
min_frequency_reference int 0 minimum document frequency in reference for token to be included in the report
case_sensitive bool False frequencies for tokens with or without case preserved
normalize_by int 10000 normalize frequencies by a number (e.g. 10000)
page_size int 20 number of rows to return, if 0 returns all
page_current int 1 current page, ignored if page_size is 0
show_document_frequency bool False show document frequency in output
exclude_tokens list [] exclude specific tokens from report results
exclude_tokens_text str text to explain which tokens have been excluded, will be added to the report notes
restrict_tokens list [] restrict report to return results for a list of specific tokens
restrict_tokens_text str text to explain which tokens are included, will be added to the report notes
exclude_punctuation bool True exclude punctuation tokens
Returns Result return a Result object with the frequency table
# load the target corpus
gardenparty = Corpus().load(path_to_gardenparty_corpus)
# load the reference corpus
brown = Corpus().load(path_to_brown_corpus)
# instantiate the Keyness class
keyness = Keyness(corpus = gardenparty, reference_corpus = brown)
# generate and display the keywords report
keyness.keywords(show_document_frequency = True, min_document_frequency_reference = 5, statistical_significance_cut = 0.0001, apply_bonferroni = True, order_descending = True, page_current = 1).display()
Keywords
Target corpus: Garden Party Corpus, Reference corpus: Brown Corpus
Rank Token Frequency Frequency Reference Document Frequency Document Frequency Reference Normalized Frequency Normalized Frequency Reference Relative Risk Log Ratio Log Likelihood
1 laura 86 14 2 6 13.58 0.14 95.10 6.57 402.74
2 jug 30 6 2 5 4.74 0.06 77.41 6.27 136.44
3 grandma 73 15 2 5 11.53 0.15 75.34 6.24 330.64
4 meadows 33 7 1 5 5.21 0.07 72.98 6.19 148.73
5 con 27 7 1 5 4.26 0.07 59.71 5.90 117.62
6 bye 25 7 9 7 3.95 0.07 55.29 5.79 107.37
7 velvet 14 5 6 5 2.21 0.05 43.35 5.44 57.19
8 shone 13 5 7 5 2.05 0.05 40.25 5.33 52.21
9 queer 15 6 5 6 2.37 0.06 38.70 5.27 59.69
10 gloves 17 7 7 5 2.69 0.07 37.60 5.23 67.18
11 cried 59 26 12 23 9.32 0.27 35.13 5.13 229.24
12 faintly 14 7 7 6 2.21 0.07 30.96 4.95 52.61
13 darling 36 18 8 13 5.69 0.18 30.96 4.95 135.27
14 sandy 11 6 3 6 1.74 0.06 28.38 4.83 40.33
15 alice 21 13 2 6 3.32 0.13 25.01 4.64 74.09
16 oh 149 93 15 62 23.53 0.95 24.80 4.63 524.30
17 handkerchief 14 9 8 6 2.21 0.09 24.08 4.59 48.80
18 charlotte 22 15 1 5 3.47 0.15 22.71 4.51 75.22
19 dear 78 54 13 36 12.32 0.55 22.36 4.48 265.31
20 breathed 13 9 7 9 2.05 0.09 22.36 4.48 44.22
Report based on word tokens
Filtered tokens by minimum document frequency in reference corpus (5)
Keywords filtered based on p-value 0.0001 with Bonferroni correction (based on 3378 tests)
Normalized Frequency is per 10,000 tokens
Total word tokens in target corpus: 63,311
Total word tokens in reference corpus: 980,144
Keywords: 243
Showing 20 rows
Page 1 of 13
  • Report an issue