core

Core CorPress functions.

You can call any of these functions directly by importing CorPress with:

from corpress.core import *

You probably only need the corpress function if you want to gather data and output a corpus in one step. You can import the corpress function directly with:

from corpress.core import corpress

Note: for an example of how to use the corpress function, see the documentation home page or the repository README. The documentation and README are generated from this notebook, which you can download, edit and run locally.

source

get_api_url

 get_api_url (url:str, endpoint_type:str='posts', headers:dict=None)

Queries a URL to get the REST API route for the endpoint type provided.

	Type	Default	Details
url	str		the URL of the WordPress website
endpoint_type	str	posts	posts or pages
headers	dict	None	optional headers for requests

source

get_json

 get_json (endpoint_url:str, endpoint_type:str='posts', headers:dict=None,
           params:dict=None, json_save_path:str=None,
           seconds_between_requests:int=5, max_pages:int=None)

Download and save JSON data from a specific REST API endpoint.

	Type	Default	Details
endpoint_url	str		the URL of the WordPress REST API endpoint
endpoint_type	str	posts	the type of data to download
headers	dict	None	optional headers for requests
params	dict	None	optional parameters to pass to the API
json_save_path	str	None	path to save the JSON data
seconds_between_requests	int	5	number of seconds to wait between requests, must be at least 1
max_pages	int	None	maximum number of pages to download
Returns	bool		True if successful, False otherwise

source

create_corpus

 create_corpus (corpus_format:str='txt', json_save_path:str=None,
                corpus_save_path:str=None, csv_save_file:str=None,
                include_title_in_text:bool=True, encoding:str='utf-8')

Create a corpus from downloaded JSON data in txt or csv format.

	Type	Default	Details
corpus_format	str	txt	format of the corpus files, txt or csv
json_save_path	str	None	path to JSON data
corpus_save_path	str	None	path to save corpus in txt format
csv_save_file	str	None	path to CSV file to output corpus in CSV format (or metadata if txt corpus)
include_title_in_text	bool	True	include the title in the text file
encoding	str	utf-8	encoding to use
Returns	bool		True if successful, False if there are errors parsing the JSON

source

result_reporting

 result_reporting (result:dict, output:bool=True)

Outputs the results of the corpress process

	Type	Default	Details
result	dict		the result dictionary
output	bool	True	output the results
Returns	dict		returns the result dictionary

source

corpress

 corpress (url:str, endpoint_type:str='posts', headers:dict=None,
           params:dict=None, corpus_format:str='txt',
           json_save_path:str=None, corpus_save_path:str=None,
           csv_save_file:str=None, seconds_between_requests:int=5,
           max_pages:int=None, include_title_in_text:bool=True,
           output:bool=True, encoding:str='utf-8')

Retrieve data from the REST API of a WordPress site and create a corpus.

	Type	Default	Details
url	str		the URL of the WordPress website
endpoint_type	str	posts	posts or pages
headers	dict	None	optional headers for requests
params	dict	None	optional parameters to pass to the API
corpus_format	str	txt	format of the corpus files, txt or csv
json_save_path	str	None	path to save the JSON data
corpus_save_path	str	None	path to save the corpus in txt format
csv_save_file	str	None	path to CSV file to output corpus in CSV format (or metadata if txt corpus)
seconds_between_requests	int	5	number of seconds to wait between requests
max_pages	int	None	maximum number of pages to download
include_title_in_text	bool	True	option to include the title in the text file
output	bool	True	option to output the results of the process
encoding	str	utf-8	encoding to use
Returns	dict		dictionary with results of each stage of the process and the number of texts in the corpus

For an example of how to use the corpress function, see the documentation home page or the repository README. The documentation and README are generated from this notebook, which you can download, edit and run locally.