frequency

Functionality for frequency analysis.

Using the Frequency class

There are examples below showing how to use the Frequency class directly to get frequency lists for a corpus. The example below also shows example frequency lists. The recommended way to use this functionality is through the Conc class. This provides an interface to create frequency lists, concordances, collocation tables, keyword tables and more.

Frequency class API reference

source

Frequency

 Frequency (corpus:conc.corpus.Corpus|conc.listcorpus.ListCorpus)

Class for frequency analysis reporting

	Type	Details
corpus	conc.corpus.Corpus \| conc.listcorpus.ListCorpus	Corpus instance

source

Frequency.frequencies

 Frequency.frequencies (case_sensitive:bool=False, normalize_by:int=10000,
                        page_size:int=20, page_current:int=1,
                        show_token_id:bool=False,
                        show_document_frequency:bool=False,
                        exclude_tokens:list[str]=[],
                        exclude_tokens_text:str='',
                        restrict_tokens:list[str]=[],
                        restrict_tokens_text:str='',
                        exclude_punctuation:bool=True)

Report frequent tokens.

	Type	Default	Details
case_sensitive	bool	False	frequencies for tokens with or without case preserved
normalize_by	int	10000	normalize frequencies by a number (e.g. 10000)
page_size	int	20	number of rows to return, if 0 returns all
page_current	int	1	current page, ignored if page_size is 0
show_token_id	bool	False	show token_id in output
show_document_frequency	bool	False	show document frequency in output
exclude_tokens	list	[]	exclude specific tokens from frequency report, can be used to remove stopwords
exclude_tokens_text	str		text to explain which tokens have been excluded, will be added to the report notes
restrict_tokens	list	[]	restrict frequency report to return frequencies for a list of specific tokens
restrict_tokens_text	str		text to explain which tokens are included, will be added to the report notes
exclude_punctuation	bool	True	exclude punctuation tokens
Returns	Result		return a Result object with the frequency table

Examples

See the note above about accessing this functionality through the Conc class.

# load the corpus
brown = Corpus().load(path_to_brown_corpus)

# instantiate the Frequency class
freq_brown = Frequency(brown)

# run the frequencies method and display the results
freq_brown.frequencies(normalize_by=10000, page_size=20).display()

Frequencies
Frequencies of word tokens, Brown Corpus
Rank	Token	Frequency	Normalized Frequency
1	the	63,516	648.03
2	of	36,321	370.57
3	and	27,787	283.50
4	to	25,868	263.92
5	a	22,190	226.40
6	in	19,751	201.51
7	that	10,409	106.20
8	is	10,138	103.43
9	was	9,931	101.32
10	for	8,905	90.85
11	with	7,043	71.86
12	it	6,991	71.33
13	he	6,772	69.09
14	as	6,738	68.75
15	his	6,523	66.55
16	on	6,459	65.90
17	be	6,365	64.94
18	's	5,285	53.92
19	had	5,200	53.05
20	by	5,156	52.60
Report based on word tokens
Normalized Frequency is per 10,000 tokens
Total word tokens: 980,144
Unique word tokens: 42,907
Showing 20 rows
Page 1 of 2146

from conc.core import get_stop_words

stop_words = get_stop_words(save_path, spacy_model = 'en_core_web_sm')
freq_brown.frequencies(normalize_by=10000, show_document_frequency = True, exclude_tokens = stop_words, page_size=20).display()

Frequencies
Frequencies of word tokens, Brown Corpus
Rank	Token	Frequency	Document Frequency	Normalized Frequency
1	said	1,944	315	19.83
2	time	1,667	450	17.01
3	new	1,595	390	16.27
4	man	1,346	326	13.73
5	like	1,287	366	13.13
6	af	989	49	10.09
7	years	953	346	9.72
8	way	925	365	9.44
9	state	883	200	9.01
10	long	863	354	8.80
11	people	851	286	8.68
12	world	848	274	8.65
13	year	831	242	8.48
14	little	823	322	8.40
15	good	813	320	8.29
16	men	772	248	7.88
17	work	767	310	7.83
18	day	767	311	7.83
19	old	734	278	7.49
20	life	728	284	7.43
Report based on word tokens
Tokens excluded from report: 302
Normalized Frequency is per 10,000 tokens
Total word tokens: 980,144
Unique word tokens: 42,601
Showing 20 rows
Page 1 of 2131