The Notion webclipper only goes so far. The articles clipped with the extension don't include the ability to populate particular database properties and sometimes the body of the page doesn't populate either. I want to show you how I pieced together a little news stream project with the notion-py, and break down each element of my code to understand the basics of notion-py.
Why a personal news stream? As a news-junkie I wanted a place to web clip interesting headlines and articles into a calendar view but found the process of manually adding properties like "category," "publication date," "source url," etc. to be tiresome. I'm still in the process of completing this project but I thought I'd give a web scraping tutorial in the meantime.
What The End Result Will Look Like
Change view from table-to-gallery view β change properties settings to Card Preview:: Page Cover
Step By Step Tutorial
Links you need to get started: Find your notion token_v2, and link to the table you want to manipulate in Notion.
Step 1 β Import
from notion.client import NotionClient from md2notion.upload import upload import newspaper from newspaper import Article import os import sys
Step 2 β Plug in links
client = NotionClient(token_v2="INSERT TOKEN V2 HERE") cv = client.get_collection_view("INSERT TABLE LINK HERE")
Step 3 β Convert lists to strings
def converttostr(input_seq, seperator): final_str = seperator.join(input_seq) return final_str seperator = (", ")
Step 4 β Download article
url = "INSERT ARTICLE URL HERE" toi_article = Article(url, language="en") toi_article.download() toi_article.parse() toi_article.nlp()
Step 5 β Add information to Notion table
#add new row row = cv.collection.add_row() page = row #add article title to row title property (text) row.title = toi_article.title #add article date to date property (date) row.date = toi_article.publish_date #add article's keywords to keyword property (text) row.keywords = converttostr(toi_article.meta_keywords, seperator) #add article's authors to authors property (text) row.authors = converttostr(toi_article.authors, seperator) #add summary to summary property (text) row.summary = toi_article.meta_description #add url to url property (url) row.url = toi_article.url #add source of article (ie. cnn) to source property (url) row.source = toi_article.source_url #add word count of article to word count property (number) row.word_count = sum(map(toi_article.text.strip().count, [' ','-'])) + 1
Step 6 β Add media to cover image and page icon
page.get("format.page_cover") page.set("format.page_cover", toi_article.meta_img) page.get("format.page_icon") page.set("format.page_icon", "π°")
Step 7 β Send body of article to markdown file
#create markdown file and replace 'article.md' with your file title f = open("article.md", "w") print(page.children.add_new("quote", title="The Article")) print(toi_article.text, file=f) f.close()
Step 8 β Send markdown file to Notion page
#replace 'article.md' with your file title with open("article.md", "r", encoding="utf-8") as mdFile: newPage = page upload(mdFile, newPage)
Β
Β
Β