2023-12-11 12:32:39.333040: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
[notice] A new release of pip available: 22.2.2 -> 23.3.1
[notice] To update, run: pip install --upgrade pip
✔ Download and installation successful
You can now load the package via spacy.load('en_core_web_sm')
Year 2016 has 114 single article sections, that will be nullified.
Year 2017 has 226 single article sections, that will be nullified.
Year 2018 has 174 single article sections, that will be nullified.
Year 2019 has 121 single article sections, that will be nullified.
Year 2020 has 74 single article sections, that will be nullified.
date: date32[day]
year: int64
month: int64
day: int64
author: string
title: string
article: string
url: string
section: string
publication: string
title_word_count: int64
article_word_count: int64
title_textblob_sentiment: double
article_textblob_sentiment: double
vader_prob_positive_title: double
vader_prob_negative_title: double
vader_prob_neutral_title: double
vader_compound_title: double
simple_topic: string
Index(['date', 'year', 'month', 'day', 'author', 'title', 'article', 'url',
'section', 'publication', 'title_word_count', 'article_word_count',
'title_textblob_sentiment', 'article_textblob_sentiment',
'vader_prob_positive_title', 'vader_prob_negative_title',
'vader_prob_neutral_title', 'vader_compound_title', 'simple_topic'],
dtype='object')
Topics Processor 💬
This notebook cleans up and assigns more well structured topics to news sections.
Notebook Properties
src.engineering.word_counts_and_sentiments
64 GB RAM, 4 CPUs
Dec 10 2023
Data
all_the_news
input
Delta
AllTheNews
catalog/text_eda/all_the_news.delta