= Collocates(reuters) collocates
collocates
Functionality for collocation analysis.
About Conc’s Collocates functionality
Conc implements logDice as introduced in Rychlý’s (2008) paper “A Lexicographer-Friendly Association Score”. Conc’s implementation of Mutual Information is based on the formula in Rychlý’s paper.
Using the Collocates class
There are examples below showing how to use the Collocates class directly to output collocation tables. The recommended way to use this functionality is through the Conc class. This provides an interface to create frequency lists, concordances, collocation tables, keyword tables and more.
Collocates class API reference
Collocates
Collocates (corpus:conc.corpus.Corpus)
Class for collocation analysis reporting.
Type | Details | |
---|---|---|
corpus | Corpus | Corpus instance |
Collocates.collocates
Collocates.collocates (token_str:str, effect_size_measure:str='logdice', statistical_significance_measure:str='log_likeliho od', order:str|None=None, order_descending:bool=True, statistical_significance_cut:float|None=None, apply_bonferroni:bool=False, context_length:int|tuple[int,int]=5, min_collocate_frequency:int=5, page_size:int=20, page_current:int=1, exclude_punctuation:bool=True)
Report collocates for a given token string.
Type | Default | Details | |
---|---|---|---|
token_str | str | Token to search for | |
effect_size_measure | str | logdice | statistical measure to use for collocation calculation: logdice, mutual_information |
statistical_significance_measure | str | log_likelihood | statistical significance measure to use, currently only ‘log_likelihood’ is supported |
order | str | None | None | default of None orders by collocation measure, results can also be ordered by: collocate_frequency, frequency, log_likelihood |
order_descending | bool | True | order is descending or ascending |
statistical_significance_cut | float | None | None | statistical significance p-value to filter results, e.g. 0.05 or 0.01 or 0.001 - ignored if None or 0 |
apply_bonferroni | bool | False | apply Bonferroni correction to the statistical significance cut-off |
context_length | int | tuple[int, int] | 5 | Window size per side in tokens - if an int (e.g. 5) context lengths on left and right will be the same, for independent control of left and right context length pass a tuple (context_length_left, context_left_right) (e.g. (0, 5)) |
min_collocate_frequency | int | 5 | Minimum count of collocates |
page_size | int | 20 | number of rows to return, if 0 returns all |
page_current | int | 1 | current page, ignored if page_size is 0 |
exclude_punctuation | bool | True | exclude punctuation tokens |
Returns | Result |
Examples
See the note above about accessing this functionality through the Conc class.
= 'economy'
query = None, order_descending = True, statistical_significance_cut = 0.0001, apply_bonferroni=True, effect_size_measure='logdice', context_length = 5, min_collocate_frequency = 5, page_current = 1).display() collocates.collocates(query, order
Collocates of "economy" | |||||
---|---|---|---|---|---|
Reuters Corpus | |||||
Rank | Token | Collocate Frequency | Frequency | Logdice | Log Likelihood |
1 | stimulate | 29 | 85 | 10.39 | 206.37 |
2 | boost | 20 | 222 | 9.60 | 84.59 |
3 | japanese | 35 | 944 | 9.52 | 88.82 |
4 | domestic | 27 | 700 | 9.39 | 70.45 |
5 | german | 23 | 537 | 9.35 | 64.41 |
6 | world | 35 | 1,173 | 9.32 | 75.37 |
7 | grew | 12 | 103 | 9.09 | 57.00 |
8 | sluggish | 10 | 44 | 8.94 | 61.75 |
9 | economy | 18 | 621 | 8.89 | 195.51 |
10 | measures | 13 | 288 | 8.87 | 37.66 |
11 | sectors | 10 | 89 | 8.85 | 46.76 |
12 | performance | 11 | 165 | 8.84 | 40.00 |
13 | signs | 10 | 107 | 8.81 | 43.03 |
14 | economists | 12 | 325 | 8.70 | 30.36 |
15 | impact | 11 | 249 | 8.69 | 31.43 |
16 | west | 20 | 964 | 8.69 | 30.92 |
17 | strength | 9 | 95 | 8.69 | 38.97 |
18 | good | 12 | 361 | 8.65 | 28.11 |
19 | shows | 8 | 65 | 8.58 | 38.90 |
20 | u.s. | 70 | 5,496 | 8.55 | 57.99 |
Report based on word tokens | |||||
Context tokens left: 5, context tokens right: 5 | |||||
Filtered tokens by minimum collocation frequency (5) | |||||
Keywords filtered based on p-value 0.0001 with Bonferroni correction (based on 204 tests) | |||||
Unique collocates: 34 | |||||
Showing 20 rows | |||||
Page 1 of 2 |