conc
  1. API
  2. collocates
  • Introduction to Conc
  • Tutorials
    • Get Started with Conc
    • Quick Conc Recipes
    • Installing Conc
  • Explanations
    • Why Conc?
    • Anatomy of a corpus
    • Performance
  • Development
    • Releases
    • Roadmap
    • Developer Guide
  • API
    • corpus
    • conc
    • corpora
    • frequency
    • ngrams
    • concordance
    • keyness
    • collocates
    • result
    • plot
    • text
    • core
  1. API
  2. collocates

collocates

Functionality for collocation analysis.

source

Collocates

 Collocates (corpus:conc.corpus.Corpus)

Class for collocation analysis reporting.

Type Details
corpus Corpus Corpus instance

source

Collocates.collocates

 Collocates.collocates (token_str:str, effect_size_measure:str='logdice',
                        statistical_significance_measure:str='log_likeliho
                        od', order:str|None=None,
                        order_descending:bool=True,
                        statistical_significance_cut:float|None=None,
                        apply_bonferroni:bool=False,
                        context_length:int|tuple[int,int]=5,
                        min_collocate_frequency:int=5, page_size:int=20,
                        page_current:int=1, exclude_punctuation:bool=True)

Report collocates for a given token string.

Type Default Details
token_str str Token to search for
effect_size_measure str logdice statistical measure to use for collocation calculation: logdice, mutual_information
statistical_significance_measure str log_likelihood statistical significance measure to use, currently only ‘log_likelihood’ is supported
order str | None None default of None orders by collocation measure, results can also be ordered by: collocate_frequency, frequency, log_likelihood
order_descending bool True order is descending or ascending
statistical_significance_cut float | None None statistical significance p-value to filter results, e.g. 0.05 or 0.01 or 0.001 - ignored if None or 0
apply_bonferroni bool False apply Bonferroni correction to the statistical significance cut-off
context_length int | tuple[int, int] 5 Window size per side in tokens - if an int (e.g. 5) context lengths on left and right will be the same, for independent control of left and right context length pass a tuple (context_length_left, context_left_right) (e.g. (0, 5))
min_collocate_frequency int 5 Minimum count of collocates
page_size int 20 number of rows to return, if 0 returns all
page_current int 1 current page, ignored if page_size is 0
exclude_punctuation bool True exclude punctuation tokens
Returns Result
collocates = Collocates(reuters)
for word in ["economy"]: # brown used 'i went in', 'any of us',  for testing "economy"
Collocates of "economy"
Reuters Corpus
Rank Token Collocate Frequency Frequency Logdice Log Likelihood
1 stimulate 29 85 10.39 206.37
2 boost 20 222 9.60 84.59
3 japanese 35 944 9.52 88.82
4 domestic 27 700 9.39 70.45
5 german 23 537 9.35 64.41
6 world 35 1,173 9.32 75.37
7 grew 12 103 9.09 57.00
8 sluggish 10 44 8.94 61.75
9 economy 18 621 8.89 195.51
10 measures 13 288 8.87 37.66
11 sectors 10 89 8.85 46.76
12 performance 11 165 8.84 40.00
13 signs 10 107 8.81 43.03
14 economists 12 325 8.70 30.36
15 impact 11 249 8.69 31.43
16 west 20 964 8.69 30.92
17 strength 9 95 8.69 38.97
18 good 12 361 8.65 28.11
19 shows 8 65 8.58 38.90
20 u.s. 70 5,496 8.55 57.99
Report based on word tokens
Context tokens left: 5, context tokens right: 5
Filtered tokens by minimum collocation frequency (5)
Keywords filtered based on p-value 0.0001 with Bonferroni correction (based on 204 tests)
Unique collocates: 34
Showing 20 rows
Page 1 of 2
CPU times: user 82.1 ms, sys: 165 ms, total: 247 ms
Wall time: 159 ms
  • Report an issue