统计网友活动的python程序。邻兄拒不跟帖,放在这里当做存根,须用的请自用

n
nearby
楼主 (文学城)
# Author: 书香之家版主 nearby, August 2022 # # This program allows you to analyze the activities of all the users in a WXC 论坛, for example, 书香之家(sxsj). # It counts the numbers of 主帖 and 跟帖 respectively for each user. # The result is printed into a .CSV file. Note, to view the Chinese characters, CSV file is not good. # So, you can view the result using Notepad or other text editor and then copy/paste the result into an Excel file. # # import requests # users: a dictionary. key=username, value = list. Inside the list, the first element is the number of 主帖 # the second element is the number of 跟帖 def processOneFile(us_dict, html): all = html.text.split('\n') length = len(all) i = 0 while i < length: line = all[i].strip() jump = 6 if line == '<!-- -->': i = i + 1 line = all[i].strip() if line == '<!-- 列表中插广告 -->': jump = 9 i = i + jump # print(all[i].strip()) # this is a 主帖. get the user name first i = i + 3 # the line looks like: <a class="b" href="https://passport.wenxuecity.com/members/index.php?act=profile&amp;cid=ling_yin_shi">ling_yin_shi</a> user = all[i].strip().split('>')[1].split('<')[0] # add one for this user on his or her 主帖 if user in us_dict: L = us_dict[user] L[0] = L[0] + 1 else: L = [1,0] us_dict[user] = L # Now, process on the 跟帖 i = i + 1 line = all[i].strip() while line != '</div>': # target this line: <a class="b" href="https://passport.wenxuecity.com/members/index.php?act=profile&amp;cid=FionaRawson">FionaRawson</a> - if line.startswith('<a class="b" href='): sub_user = line.split('>')[1].split('<')[0] # add one for this user on his/her 跟帖. Here, the guanshui variable is used. if sub_user != user or guanshui == False: if sub_user in us_dict: L = us_dict[sub_user] L[1] = L[1] + 1 else: L = [0, 1] us_dict[sub_user] = L i = i + 1 line = all[i].strip() i = i + 1 # ---- main starts here ---- print() print('# Author: 书香之家版主 nearby, August 2022') print() subid = 'sxsj' temp = input('What is the name of your 论坛 in English? For example, 书香之家 is sxsj, 美语世界 is mysj, 文化走廊 is culture, 诗词欣赏 is poetry: ') if len(temp) >= 2: subid = temp numPages = 200 temp = input('How many pages you would like to search? If do not know, just hit ENTER, the program will search for 200 pages by default. ') if len(temp) >= 1: numPages = int(temp) guanshui = False # Use this variable because of kirn's talking about 灌水 :-) temp = input('Discard those 跟帖 that a user made after his/her own post? (1=yes, 0=no, default=0)\n' + 'Sometimes a user only post 跟帖 after his/her own 主帖. If yes, then such 跟帖 will be discarded. ') if int(temp) > 0: guanshui = True print('guanshui='+str(guanshui)) users = dict() for i in range(1, numPages+1): url = 'https://bbs.wenxuecity.com/' + subid + '/?page=' + str(i) f = requests.get(url) processOneFile(users, f) print("\n---------------\n") ks = users.keys() html2 = open('sxzj-out.csv', 'w', encoding='utf-8') for u in ks: L = users[u] print(u + ',' + str(L[0]) + ',' + str(L[1])) html2.write(u + ',' + str(L[0]) + ',' + str(L[1]) + '\n') html2.close() print("\n") print("\n") print("Please check the file sxzj-out.csv. The result is in it! Thanks for using this program. ---- 虎哥 / Nearby / 邻兄")
尘凡无忧
盲赞。邻兄太nice。:)
k
kirn
你真黑!
F
FionaRawson
只能佩服了。。。。借这里和无忧说一下,无忧之前提过延长新冠活动一个星期,我想了想,

我这个系列还有后续,但是一个星期内发不出来,因为要等两部“揭秘电影”拍完才行。我目前看了trailer,至少还要一两个月才能拍完、上映。

你随意啦,反正有后续的话,我继续来书香发就是。

 

尘凡无忧
啊,我刚才在上面说都没看到你这个。。。心有灵犀握握手。:)
l
lovecat08
服,黑,了!
尘凡无忧
活动延长到9月10号。我知道高妹还有很多想说的。。。不过你看自己的时间安排。:)
F
FionaRawson
谢谢
妖妖灵
虎哥,活雷锋英文怎么翻?:)
望沙
l
ling_yin_shi
这是功力。也是爱呀,:)

小林子了不得。

n
nearby
更新后更好用些, 方便做统计工作。邻兄记载于此,2022-11月-08