πŸ—žοΈ

Web Scrape News Articles With notion-py Into A Notion Database

Category
Tutorial
Tags
Tech
Notion
Key Points
Step By Step Tutorial
Date
Jun 20, 2020
Word Count
495

The Notion webclipper only goes so far. The articles clipped with the extension don't include the ability to populate particular database properties and sometimes the body of the page doesn't populate either. I want to show you how I pieced together a little news stream project with the notion-py, and break down each element of my code to understand the basics of notion-py.
Why a personal news stream? As a news-junkie I wanted a place to web clip interesting headlines and articles into a calendar view but found the process of manually adding properties like "category," "publication date," "source url," etc. to be tiresome. I'm still in the process of completing this project but I thought I'd give a web scraping tutorial in the meantime.

What The End Result Will Look Like

Change view from table-to-gallery view β†’ change properties settings to Card Preview:: Page Cover
notion image
notion image

Step By Step Tutorial

Install the following: python, notion-py, md2notion, newspaper3k
Links you need to get started: Find your notion token_v2, and link to the table you want to manipulate in Notion.

Step 1 β†’ Import

from notion.client import NotionClient from md2notion.upload import upload import newspaper from newspaper import Article import os import sys

Step 2 β†’ Plug in links

client = NotionClient(token_v2="INSERT TOKEN V2 HERE") cv = client.get_collection_view("INSERT TABLE LINK HERE")

Step 3 β†’ Convert lists to strings

def converttostr(input_seq, seperator): final_str = seperator.join(input_seq) return final_str seperator = (", ")

Step 4 β†’ Download article

url = "INSERT ARTICLE URL HERE" toi_article = Article(url, language="en") toi_article.download() toi_article.parse() toi_article.nlp()

Step 5 β†’ Add information to Notion table

#add new row row = cv.collection.add_row() page = row #add article title to row title property (text) row.title = toi_article.title #add article date to date property (date) row.date = toi_article.publish_date #add article's keywords to keyword property (text) row.keywords = converttostr(toi_article.meta_keywords, seperator) #add article's authors to authors property (text) row.authors = converttostr(toi_article.authors, seperator) #add summary to summary property (text) row.summary = toi_article.meta_description #add url to url property (url) row.url = toi_article.url #add source of article (ie. cnn) to source property (url) row.source = toi_article.source_url #add word count of article to word count property (number) row.word_count = sum(map(toi_article.text.strip().count, [' ','-'])) + 1

Step 6 β†’ Add media to cover image and page icon

page.get("format.page_cover") page.set("format.page_cover", toi_article.meta_img) page.get("format.page_icon") page.set("format.page_icon", "πŸ“°")

Step 7 β†’ Send body of article to markdown file

#create markdown file and replace 'article.md' with your file title f = open("article.md", "w") print(page.children.add_new("quote", title="The Article")) print(toi_article.text, file=f) f.close()

Step 8 β†’ Send markdown file to Notion page

#replace 'article.md' with your file title with open("article.md", "r", encoding="utf-8") as mdFile: newPage = page upload(mdFile, newPage)
Β 
Β 
Β