如何快捷的收集活动帖子做成汇总?

n
nearby
楼主 (文学峸)

美语世界的妖妖灵 (她是版主么?)来询问邻兄如何汇总活动帖子的。邻兄于是做了两件事:

要求她称呼邻兄'虎哥' (she did)。一不做二不休,服务大家,于是把这个程序公开于此。邻兄是Java和Excel的绝世高手 (别来请教我,我确实没时间解答问题),但邻兄不是Python的高手,才学Python,是故现在做啥都写Python,以熟悉之。 为她把邻兄的Python 程序加了许多说明 程序文件原名 ParseSXZJ_html.py   You can copy/paste the codes below into a Python program.  If you have Python 3 installed on your computer, you can then follow the instructions below to make 活动帖子的收集基本全自动.     Good Luck! 拒不解答后续问题!       # Author: 书香之家版主 nearby, March 2022 # # Usage of this Python program: # 0. Make sure that you have Internet access and Python 3 installed on your computer (or use Cloud)! # 1. Place this file in a folder. Say, in a folder named "wxc" # 2. Create a sub-folder named "data" inside "wxc" in which all you data files will be generated # 3. Go to your '论坛', search for your '活动' title. You will get one or more pages. Remember how many pages there are. # If you do not know how to do this, just skip this step, I will then assume that there are 3 pages (150 entries, which is more than usual) # 4. execute this program, you will be prompted (asked for) the name of your activity, and # the number of pages you obtained in step 3 (if you do not know the number of pages, just hit ENTER) # Example: # 春天的畅想 # 3 (or Hit ENTER key) # 5. The result is stored inside 'data/sxzj-out.html'. You can then copy/paste the source code of # 'data/sxzj-out.html' into your WXC new page. Done! # # # Note: By default the entries are organized in reverse chronological order. # Should you need them to be placed in chronological order, please do: # Comment out the statement: mylist.reverse() by placing # in front of it, like: #mylist.reverse() # # import requests notargets = ['跟帖', '输入关键词', '内容查询', 'input name', '当前', '首页', '上一页', '尾页', '尾页', '下一页'] notargets.append('archive') # This is how SXZJ (书香之家) works. When 无忧 starts an activity, she always marks her activity like this. notargets.append('##活动##') # notargets.append('汇总') def isInside(line, notargets_array): for t in notargets_array: if t in line: return True return False # END # the line looks like <a href="/sxsj/76799.html" target="_blank">【<em>春天的畅想</em>】春天属于女人</a> # I need it to be <a href="https://bbs.wenxuecity.com/sxsj/76799.html" target="_blank">【<em>春天的畅想</em>】春天属于女人</a> def addHttp(line): at = line.split('href="') line2 = '<a href="https://bbs.wenxuecity.com' + at[1] return line2 # END def processOneFile(target, html, mylist): # split the text by newline character to get an array of string all = html.text.split('\n') length = len(all) i = 0 while i < length: line = all[i] if (target in line) and (not isInside(line, notargets)): line = addHttp(line) print(line) i = i + 1 line2 = all[i] i = i + 1 line3 = all[i] line += line2 + " " + line3 mylist.append(line) i = i + 1 # END of FUNCTIONS # ---- main starts here ---- print() print('# Author: 书香之家版主 nearby, March 2022') print() print('Usage of this Python program:') print('\t0. Make sure that you have Internet access and Python 3 installed on your computer (or use Cloud)!') print('\t1. Place this file in a folder. Say, in a folder named "wxc"') print('\t2. Create a sub-folder named "data" inside "wxc" in which all you data files will be generated') print('\t3. Go to your "论坛", search for your "活动" title. You will get one or more pages. Remember how many pages there are.') print('\t\t If you do not know how to do this, just skip this step, I will then assume that there are 3 pages (150 entries), which is more than usual)') print('\t4. execute this program, you will be prompted (asked for) the name of your activity, and') print('\t\tthe number of pages you obtained in step 3') print('\t\tExample:') print('\t\t\t春天的畅想') print('\t\t\t3') print('\t5. The result is stored inside "data/sxzj-out.html". You can then copy/paste the source code of') print('\t\t"data/sxzj-out.html" into your WXC new page. Done!') print('Note, by default the entries are organized in reverse chronological order.') print('Should you need them to be placed in chronological order, please do:') print('\t Change the statement: mylist.reverse() to be:') print('\t\t#mylist.reverse()') print("\n\n") target = input('What is the title of your activity (活动)?: ') pages = 3 # default, means there are maximum 150 entries temp = input('How many pages there are when you search for the activity in WXC? (If you do not know, just Hit ENTER): ') if temp != '': pages = int(temp) mylist = [] # this is the output file. html2 = open('data/sxzj-out.html', 'w', encoding='utf-8') url = 'https://bbs.wenxuecity.com/bbs/archive.php?SubID=sxsj&pos=bbs&keyword=' + target + '&username=' f = requests.get(url) processOneFile(target, f, mylist) for i in range(1, pages): url = 'https://bbs.wenxuecity.com/bbs/archive.php?page=' + str(i) + '&SubID=sxsj&pos=bbs&keyword=' + target + '&username=' f = requests.get(url) processOneFile(target, f, mylist) mylist.reverse() for li in mylist: html2.write("<p>" + li+"\n") html2.close() print(str(len(mylist)) + " entries")         Good Luck! 拒不解答后续问题!  
W
WXCTEATIME
赞!
可能成功的P
赞!
尘凡无忧
赞邻兄,分享的精神可嘉。。。也赞邻兄的智慧,比如坚决不说话。。。:)
尘凡无忧
对了,妖妖灵是美语坛版主。:)还有,被绝世高手四个字震晕了。。。。LOL
n
nearby
程序一但启动,只消输入活动名称,一切搞定。邻兄就不回贴了哈。谢谢书香的朋友们 (及楼上的茶兄、小p、忧忧)
l
lovecat08
赞美!
n
nearby
好奇问一句,但是版主名字里没看见她。她是谁? :-) 自吹自吹,牛皮就是靠吹的 :-)

小时候我妈妈笑我:城墙那么厚,也能被你吹倒!

尘凡无忧
LOL赞这吹力。。。。:)
W
WXCTEATIME
她是版主,人可以有好多件衣服,对吧?:)
k
kirn
发程序时连个manual 都不顺便写一个,其实够歹毒的
n
nearby
不得不批评小k,程序里一半都是 manual, 解释了两遍该如何用

尘凡无忧
其实这个呢,懂的人一眼就懂了,不懂的话要补的课太多。。。邻兄也是无偿分享啊,这个工作应当是文学城技术部门来做的。。。
k
kirn
作为一个用过类似简单大蛇程序的过来人,我可以很可怜的告诉你,我是被文件名等等搞昏的。除非经常用,否则转眼就忘。。连哪个

目录都找不到的。。。。起码每个星期要搞一次活动。。。。。

k
kirn
有技术部吗。我以为主要是营销部呢。。。志愿者倒是个个技艺惊人
鲁冰花
不想做版主的邻兄就不是好猫咪。。我绕道。:)
尘凡无忧
有的。:)
妖妖灵
哇哇哇,虎哥真是活雷锋!!! 太感谢啦!!! 赶紧抱回家去好好琢磨!!!

这个 程序的reverse是我一直想找的,你不费吹灰之力就用到它,忒厉害!! 膜拜!!

mylist.reverse() for li in mylist: html2.write("<p>" + li+"\n") html2.close() print(str(len(mylist)) + " entries")
庄文雅
赞邻版,文采和高科技俱佳。
老林子里的夏天
真才华! 我是暈了,绕行… :~)
老键
邻兄示范了给论坛搞些技术革新其实并不难,我曾建议多次论坛试点不显跟贴但有跟贴就自动上升,并不很难的,近兄应当文城技术顾问
n
nearby
希望这个能帮上妖妹。虎哥拿四个论坛,特别是你的和你的活动试过,都行。我第一次汇集活动也是手动,累晕 :-)
尘凡无忧
老键快来参加活动吧。。。:)
尘凡无忧
忘了说,邻兄请网管把这个帖子放到论坛右边挂着收藏起来吧。。。
老键
啊没注意你们在搞活动,比赛编程? Python我还可以
浮云驰
不明觉厉,邻兄威武!
a
applebee3
赞邻兄爱心满满!
尘凡无忧
哈哈。是人间情色活动。我看过你的情色。。。。LOL