core
You can call any of these functions directly by importing CorPress with:
from corpress.core import *
You probably only need the corpress function if you want to gather data and output a corpus in one step. You can import the corpress function directly with:
from corpress.core import corpress
Note: for an example of how to use the corpress function, see the documentation home page or the repository README. The documentation and README are generated from this notebook, which you can download, edit and run locally.
get_api_url
get_api_url (url:str, endpoint_type:str='posts', headers:dict=None)
Queries a URL to get the REST API route for the endpoint type provided.
Type | Default | Details | |
---|---|---|---|
url | str | the URL of the WordPress website | |
endpoint_type | str | posts | posts or pages |
headers | dict | None | optional headers for requests |
get_json
get_json (endpoint_url:str, endpoint_type:str='posts', headers:dict=None, params:dict=None, json_save_path:str=None, seconds_between_requests:int=5, max_pages:int=None)
Download and save JSON data from a specific REST API endpoint.
Type | Default | Details | |
---|---|---|---|
endpoint_url | str | the URL of the WordPress REST API endpoint | |
endpoint_type | str | posts | the type of data to download |
headers | dict | None | optional headers for requests |
params | dict | None | optional parameters to pass to the API |
json_save_path | str | None | path to save the JSON data |
seconds_between_requests | int | 5 | number of seconds to wait between requests, must be at least 1 |
max_pages | int | None | maximum number of pages to download |
Returns | bool | True if successful, False otherwise |
create_corpus
create_corpus (corpus_format:str='txt', json_save_path:str=None, corpus_save_path:str=None, csv_save_file:str=None, include_title_in_text:bool=True, encoding:str='utf-8')
Create a corpus from downloaded JSON data in txt or csv format.
Type | Default | Details | |
---|---|---|---|
corpus_format | str | txt | format of the corpus files, txt or csv |
json_save_path | str | None | path to JSON data |
corpus_save_path | str | None | path to save corpus in txt format |
csv_save_file | str | None | path to CSV file to output corpus in CSV format (or metadata if txt corpus) |
include_title_in_text | bool | True | include the title in the text file |
encoding | str | utf-8 | encoding to use |
Returns | bool | True if successful, False if there are errors parsing the JSON |
result_reporting
result_reporting (result:dict, output:bool=True)
Outputs the results of the corpress process
Type | Default | Details | |
---|---|---|---|
result | dict | the result dictionary | |
output | bool | True | output the results |
Returns | dict | returns the result dictionary |
corpress
corpress (url:str, endpoint_type:str='posts', headers:dict=None, params:dict=None, corpus_format:str='txt', json_save_path:str=None, corpus_save_path:str=None, csv_save_file:str=None, seconds_between_requests:int=5, max_pages:int=None, include_title_in_text:bool=True, output:bool=True, encoding:str='utf-8')
Retrieve data from the REST API of a WordPress site and create a corpus.
Type | Default | Details | |
---|---|---|---|
url | str | the URL of the WordPress website | |
endpoint_type | str | posts | posts or pages |
headers | dict | None | optional headers for requests |
params | dict | None | optional parameters to pass to the API |
corpus_format | str | txt | format of the corpus files, txt or csv |
json_save_path | str | None | path to save the JSON data |
corpus_save_path | str | None | path to save the corpus in txt format |
csv_save_file | str | None | path to CSV file to output corpus in CSV format (or metadata if txt corpus) |
seconds_between_requests | int | 5 | number of seconds to wait between requests |
max_pages | int | None | maximum number of pages to download |
include_title_in_text | bool | True | option to include the title in the text file |
output | bool | True | option to output the results of the process |
encoding | str | utf-8 | encoding to use |
Returns | dict | dictionary with results of each stage of the process and the number of texts in the corpus |
For an example of how to use the corpress function, see the documentation home page or the repository README. The documentation and README are generated from this notebook, which you can download, edit and run locally.