models
AnalysisGroup
Bases: UUIDTimeStampedModel
Abstract group to assign a record to for purposes of analysis.
Attributes:
Name | Type | Description |
---|---|---|
name |
str
|
Name of the group. |
podcasts |
QuerySet[Podcast]
|
Podcasts explicitly linked to group. |
seasons |
QuerySet[Season]
|
Seasons explicitly linked to group. |
episodes |
QuerySet[Episode]
|
Episodes explicitly linked to group. |
get_all_episodes
Get all episodes, explict and implied, for this Analysis Group.
Source code in src/podcast_analyzer/models.py
get_all_people
Returns a QuerySet of all People that are associated with this group.
Source code in src/podcast_analyzer/models.py
get_all_podcasts
Returns a QuerySet of all Podcast objects for this group, both explicitly assigned and implied by Season and Episode objects.
Source code in src/podcast_analyzer/models.py
get_all_seasons
Returns a QuerySet of all Season objects for this group, both explicit and implied.
Source code in src/podcast_analyzer/models.py
get_counts_by_release_frequency
Get counts of podcasts by release frequency.
NOTE: This is based on podcasts' current release frequency. We can't reliably calculate this based on isolated seasons and episodes.
Source code in src/podcast_analyzer/models.py
get_itunes_categories_with_count
For all associated podcasts, explicit or implicit, return their associated distinct categories with counts.
Source code in src/podcast_analyzer/models.py
get_median_duration_timedelta
Return the median duration of episodes as a timedelta.
Source code in src/podcast_analyzer/models.py
get_num_dormant_podcasts
Get the podcasts connected, explict or implicit, that are dormant.
get_num_podcasts_using_trackers
Feeds that contain what appears to be third-party tracking data.
get_num_podcasts_with_donation_data
Feed contains structure donation/funding data.
get_num_podcasts_with_itunes_data
get_num_podcasts_with_podcast_index_data
get_total_duration_seconds
Calculate the total duration of all episodes, explicit and implied for this group.
Source code in src/podcast_analyzer/models.py
median_episode_duration
num_episodes
Returns the number of episodes associated with this group, whether directly or via an assigned season or podcast.
num_people
Returns the total number of people detected from episodes associated with this group.
num_podcasts
Returns the total number of podcasts in this group, both explicitly and implied.
num_seasons
Returns the number of seasons associated with this group, both direct associations and implicit associations due to an assigned feed.
ArtUpdate
Bases: Model
Model for capturing art update events. Useful for debugging.
Attributes:
Name | Type | Description |
---|---|---|
podcast |
Podcast
|
Podcast that this update relates to. |
timestamp |
datetime
|
Timestamp when the update was requested. |
reported_mime_type |
str
|
The mime_type returned by the remote server. |
actual_mime_type |
str
|
The actual mime_type of the file. |
valid_file |
bool
|
Whether the file was valid and of the allowed mime types. |
Episode
Bases: UUIDTimeStampedModel
Represents a single episode of a podcast.
Attributes:
Name | Type | Description |
---|---|---|
podcast |
Podcast
|
The podcast this episode belongs to. |
guid |
str
|
GUID of the episode |
title |
str | None
|
Title of the episode |
ep_type |
str
|
Episode type, e.g full, bonus, trailer |
season |
Season | None
|
Season the episode belongs to. |
ep_num |
int | None
|
Episode number |
release_datetime |
datetime | None
|
Date and time the episode was released. |
episode_url |
str | None
|
URL of the episode page. |
mime_type |
str | None
|
Reported mime type of the episode. |
download_url |
str | None
|
URL of the episode file. |
itunes_duration |
int | None
|
Duration of the episode in seconds. |
file_size |
int | None
|
Size of the episode file in bytes. |
itunes_explict |
bool
|
Does this episode have the explicit flag? |
show_notes |
str | None
|
Show notes for the episode, if provided. |
cw_present |
bool
|
Did we detect a content warning? |
transcript_detected |
bool
|
Did we detect a transcript? |
hosts_detected_from_feed |
QuerySet[Person]
|
Hosts found in the feed information. |
guests_detected_from_feed |
QuerySet[Person]
|
Guests found in the feed information. |
analysis_group |
QuerySet[AnalysisGroup]
|
Analysis Groups this is assigned to. |
duration
property
Attempts to convert the duration of the episode into a timedelta for better display.
create_or_update_episode_from_feed
classmethod
create_or_update_episode_from_feed(
podcast: Podcast,
episode_dict: dict[str, Any],
*,
update_existing_episodes: bool = False
) -> bool
Given a dict of episode data from podcastparser, create or update the episode and return a bool indicating if a record was touched.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
podcast
|
Podcast
|
The instance of the podcast being updated. |
required |
episode_dict
|
dict[str, Any]
|
A dict representing the episode as created by |
required |
update_existing_episodes
|
bool
|
Update data in existing records? Default: False |
False
|
Returns: True or False if a record was created or updated.
Source code in src/podcast_analyzer/models.py
1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 |
|
get_file_size_in_mb
ItunesCategory
Bases: TimeStampedModel
Itunes categories.
Attributes:
Name | Type | Description |
---|---|---|
name |
str
|
Name of the category |
parent_category |
ItunesCategory | None
|
Relation to another category as parent. |
Person
Bases: UUIDTimeStampedModel
People detected from structured data in podcast feed. Duplicates are possible if data is tracked lazily.
Attributes:
Name | Type | Description |
---|---|---|
name |
str
|
Name of the person. |
url |
str | None
|
Reported URL of the person. |
img_url |
str | None
|
Reported image URL of the person. |
hosted_episodes |
QuerySet[Episode]
|
Episodes this person has hosted. |
guest_appearances |
QuerySet[Episode]
|
Episodes this person has a guest appearance. |
distinct_podcasts
Get a count of the number of unique podcasts this person has appeared on.
get_distinct_podcasts
Return a queryset of the distinct podcasts this person has appeared in.
Source code in src/podcast_analyzer/models.py
get_podcasts_with_appearance_counts
Provide podcast appearance data for each distinct podcast they have appeared on.
Source code in src/podcast_analyzer/models.py
get_potential_merge_conflicts
Checks the person record against a given merge target and returns data on any potential merge conflicts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target
|
Person
|
The person whose merge conflicts should be checked against. |
required |
Returns:
Name | Type | Description |
---|---|---|
PersonMergeConflictData |
PersonMergeConflictData
|
The merge conflicts data on the proposed target. |
Source code in src/podcast_analyzer/models.py
get_total_episodes
has_guested
has_hosted
Counts the number of episodes where they have been listed as a host.
merge_person
staticmethod
merge_person(
source_person: Person,
destination_person: Person,
*,
conflict_data: PersonMergeConflictData | None = None,
dry_run: bool = False
) -> int
Merge one person record into another and update all existing episode links. In cases where a conflict appears, such as an overlap in episodes or in additional attributes such as url or img_url, the destination record always wins.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source_person
|
Person
|
The person that will be merged into another record. |
required |
destination_person
|
Person
|
The person record where the source_person will be merged into. |
required |
conflict_data
|
PersonMergeConflictData
|
You can optionally provide this data in advance if you have already calculated it. |
None
|
dry_run
|
bool
|
Whether to actually do the merge or simply report the number of affected records. |
False
|
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
The number of affected records or the number of records updated. |
Source code in src/podcast_analyzer/models.py
PersonMergeConflictData
dataclass
PersonMergeConflictData(
source_person: Person,
destination_person: Person,
common_episodes: QuerySet[Episode],
common_host_episodes: QuerySet[Episode],
common_guest_episodes: QuerySet[Episode],
_common_ids: list[uuid.UUID] | None = None,
)
Dataclass for sending back a structured list of potential merge conflicts between two Person records.
Attributes:
Name | Type | Description |
---|---|---|
source_person |
Person
|
The source for the merge. |
destination_person |
Person
|
The destination record for the merge. |
common_episodes |
QuerySet[Episode]
|
Episodes where both records appear. |
common_host_episodes |
QuerySet[Episode]
|
Episodes where both records appear as hosts. |
common_guest_episodes |
QuerySet[Episode]
|
Episodes where both records appear as guests. |
common_id_list
Get the list of any potential common episodes and store it in an attribute before returning for caching.
Returns:
Type | Description |
---|---|
list[UUID]
|
list[uuid.UUID]: The list of ids for any potential common episodes. |
Source code in src/podcast_analyzer/models.py
is_conflict
Checks if the supplied episode is one with a conflict.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
episode
|
Episode
|
Episode to check. |
required |
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
Whether the merge conflicts were found. |
Source code in src/podcast_analyzer/models.py
is_conflict_free
Are any potential conflicts present?
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
Whether the merge conflicts were found. |
Source code in src/podcast_analyzer/models.py
Podcast
Bases: UUIDTimeStampedModel
Model for a given podcast feed.
Attributes:
Name | Type | Description |
---|---|---|
title |
str
|
The title of the podcast. |
rss_feed |
str
|
The URL of the RSS feed of the podcast. |
podcast_cover_art_url |
str | None
|
The remove URL of the podcast cover art. |
podcast_cached_cover_art |
File | None
|
The cached cover art. |
last_feed_update |
datetime | None
|
When the podcast feed was last updated. |
dormant |
bool
|
Whether the podcast is dormant or not. |
last_checked |
datetime
|
When the podcast feed was last checked. |
author |
str | None
|
The author of the podcast. |
language |
str | None
|
The language of the podcast. |
generator |
str | None
|
The reported generator of the feed. |
email |
str | None
|
The email listed in the feed. |
site_url |
str | None
|
The URL of the podcast site. |
itunes_explicit |
bool | None
|
Whether the podcast has an explict tag on iTunes. |
itunes_feed_type |
str | None
|
The feed type of the podcast feed. |
description |
str | None
|
The provided description of the podcast. |
release_frequency |
str
|
The detected release frequency. One of: daily, often, weekly, biweekly, monthly, adhoc, unknown. |
feed_contains_itunes_data |
bool
|
Whether the podcast feed contains itunes data. |
feed_contains_podcast_index_data |
bool
|
Whether the podcast feed contains podcast index elements. |
feed_contains_tracking_data |
bool
|
Whether the podcast feed contains third-party tracking data. |
feed_contains_structured_donation_data |
bool
|
Whether the feed contains donation links. |
funding_url |
str | None
|
Provided URL for donations/support. |
probable_feed_host |
str | None
|
Current assessment of the feed hosting company. |
itunes_categories |
QuerySet[ItunesCategory]
|
The listed iTunes categories. |
tags |
list[str]
|
The list of keywords/tags declared in the feed. |
analysis_group |
QuerySet[AnalysisGroup]
|
The associated analysis groups. |
median_episode_duration_timedelta
property
Returns the median duration as a timedelta.
total_duration_timedelta
property
Returns the total duration of the podcast as a timedelta object.
ReleaseFrequency
Bases: TextChoices
Choices for release frequency.
afetch_podcast_cover_art
async
Does an async request to fetch the cover art of the podcast.
Source code in src/podcast_analyzer/models.py
alast_release_date
async
Do an async fetch of the last release date.
Source code in src/podcast_analyzer/models.py
analyze_feed
async
Does additional analysis on release schedule, probable host, and if 3rd party tracking prefixes appear to be present.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
episode_limit
|
int
|
Limit the result to the last n episodes. Zero for no limit. Default 0. |
0
|
full_episodes_only
|
bool
|
Exclude bonus episodes and trailers from analysis. Default True. |
True
|
Source code in src/podcast_analyzer/models.py
analyze_feed_for_third_party_analytics
async
Check if we spot any known analytics trackers.
Source code in src/podcast_analyzer/models.py
analyze_host
async
Attempt to determine the host for a given podcast based on what information we can see.
Source code in src/podcast_analyzer/models.py
calculate_median_release_difference
async
staticmethod
Given a queryset of episodes, calculate the median difference and return it.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
episodes
|
QuerySet[Episode]
|
Episodes to use for calculation. |
required |
Returns: A timedelta object representing the median difference between releases.
Source code in src/podcast_analyzer/models.py
calculate_next_refresh_time
Given a podcast object, calculate the ideal next refresh time.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
last_release_date
|
datetime
|
Provide the last release date of an episode. |
required |
Returns: Datetime for next refresh.
Source code in src/podcast_analyzer/models.py
fetch_podcast_cover_art
Does a synchronous request to fetch the cover art of the podcast.
Source code in src/podcast_analyzer/models.py
get_feed_data
Fetch a remote feed and return the rendered dict.
Returns:
Type | Description |
---|---|
dict[str, Any]
|
A dict from the |
Source code in src/podcast_analyzer/models.py
last_release_date
Return the most recent episode's release datetime.
Source code in src/podcast_analyzer/models.py
median_episode_duration
process_cover_art_data
process_cover_art_data(
cover_art_data: BytesIO,
cover_art_url: str,
reported_mime_type: str | None,
) -> None
Takes the received art from a given art update and then attempts to process it.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cover_art_data
|
BytesIO
|
the received art data. |
required |
cover_art_url
|
str
|
the file name of the art data. |
required |
reported_mime_type
|
str
|
Mime type reported by the server to be validated. |
required |
Source code in src/podcast_analyzer/models.py
refresh_feed
Fetches the source feed and updates the record. This is best handled as a scheduled task in a worker process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
update_existing_episodes
|
bool
|
Update existing episodes with new data? |
False
|
Returns:
Type | Description |
---|---|
int
|
An int representing the number of added episodes. |
Source code in src/podcast_analyzer/models.py
schedule_next_refresh
Given a podcast object, schedule it's next refresh in the worker queue.
Source code in src/podcast_analyzer/models.py
set_dormant
async
Check if latest episode is less than 65 days old, and set
dormant
to true if so.
Source code in src/podcast_analyzer/models.py
set_release_frequency
async
Calculate and set the release frequency.
Source code in src/podcast_analyzer/models.py
total_duration_seconds
Returns the total duration of all episodes in seconds.
Source code in src/podcast_analyzer/models.py
total_episodes
update_episodes_from_feed_data
update_episodes_from_feed_data(
episode_list: list[dict[str, Any]],
*,
update_existing_episodes: bool = False
) -> int
Given a list of feed items representing episodes, process them into records.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
episode_list
|
list[dict[str, Any]
|
The |
required |
update_existing_episodes
|
bool
|
Update existing episodes? |
False
|
Returns:
Type | Description |
---|---|
int
|
The number of episodes created or updated. |
Source code in src/podcast_analyzer/models.py
update_podcast_metadata_from_feed_data
Given the parsed feed data, update the podcast channel level metadata in this record.
Source code in src/podcast_analyzer/models.py
PodcastAppearanceData
dataclass
PodcastAppearanceData(
podcast: Podcast,
hosted_episodes: QuerySet[Episode],
guested_episodes: QuerySet[Episode],
)
Dataclass for sending back structured appearance data for an individual on a single podcast.
Attributes:
Name | Type | Description |
---|---|---|
podcast |
Podcast
|
Podcast the data relates to. |
hosted_episodes |
QuerySet[Episode]
|
Episodes hosted by them. |
guested_episodes |
QuerySet[Episode]
|
Episodes where they appeared as a guest. |
Season
Bases: UUIDTimeStampedModel
A season for a given podcast.
Attributes:
Name | Type | Description |
---|---|---|
podcast |
Podcast
|
The podcast the season belongs to. |
season_number |
int
|
The season number. |
analysis_group |
QuerySet[AnalysisGroup]
|
Analysis Groups this is assigned to. |
TimeStampedModel
Bases: Model
An abstract model with created and modified timestamp fields.
UUIDTimeStampedModel
Bases: TimeStampedModel
Base model for all our objects records.
Attributes:
Name | Type | Description |
---|---|---|
id |
UUIDField
|
Unique ID. |
created |
DateTimeField
|
Creation time. |
modified |
DateTimeField
|
Modification time. |
cached_properties |
list[str]
|
Names of cached properties that should be dropped on refresh_from_db |
refresh_from_db
Also clear out cached_properties.
Source code in src/podcast_analyzer/models.py
calculate_median_episode_duration
Given an iterable of episode objects, calculate the median duration.
If not a QuerySet, first convert to a queryset to order and extract values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
episodes
|
Iterable[Episode]
|
An iterable of episode objects, e.g. a list or QuerySet |
required |
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
The median duration in seconds. |
Source code in src/podcast_analyzer/models.py
podcast_art_directory_path
Used for caching the podcast channel cover art.