Long Luo's Life Notes

每一天都是奇迹

By Long Luo

This article is the solution Java HashSet Solution of Problem 2215. Find the Difference of Two Arrays.

Intuition

We can use \(\texttt{HashSet}\) to record the distinct integers of the two arrays and judge whether a distinct integer in \(\textit{nums}_1\) which not present in \(\textit{nums}_2\).

We can only use \(2\) \(\texttt{HashSet}\), not \(4\) \(\texttt{HashSet}\).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
public List<List<Integer>> findDifference(int[] nums1, int[] nums2) {
Set<Integer> set1 = new HashSet<>();
Set<Integer> set2 = new HashSet<>();
for (int num : nums1) {
set1.add(num);
}
for (int num : nums2) {
set2.add(num);
set1.remove(num);
}

for (int num : nums1) {
set2.remove(num);
}

List<List<Integer>> ans = new ArrayList<>();
ans.add(new ArrayList<>(set1));
ans.add(new ArrayList<>(set2));
return ans;
}
阅读全文 »

By Long Luo

This article is the solution 3 Approaches: Brute Force, Recursion and Two Pointers of Problem 680. Valid Palindrome II.

Here shows 3 Approaches to slove this problem: Brue Force, Recursion and Two Pointers.

Brue Force

Let’s start from the simplest method:

  • if \(len(str) \le 2\), definitely return \(\textit{true}\);
  • if the string is a palindrome, return \(\textit{true}\);
  • if not, enumerate each position as the deleted position, and then check the remaining strings is it a palindrome.

Time complexity of this approach is \(O(n^2)\), time limit will be exceeded.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
public static boolean validPalindrome_bf(String s) {
int len = s.length();
if (len <= 2 || validPalindrome(s, 0, len - 1)) {
return true;
}

for (int i = 0; i < len; i++) {
String str = s.substring(0, i) + s.substring(i + 1, len);
if (validPalindrome(str, 0, str.length() - 1)) {
return true;
}
}

return false;
}

public static boolean validPalindrome(String s, int left, int right) {
int len = s.length();
if (left >= len || right < 0 || left > right) {
return false;
}

while (left < right) {
if (s.charAt(left) != s.charAt(right)) {
return false;
}
left++;
right--;
}

return true;
}

Analysis

  • Time Complexity: \(O(n^2)\)
  • Space Complexity: \(O(1)\)
阅读全文 »

By Long Luo

微博热搜爬虫

前几天外甥女要我帮她完成一个小作业,用 Python 完成一个爬虫,于是在网上找了点资料1 和 Github 上找了个爬虫2 的例子,改简单了点,只爬取 微博热搜 数据并存储到 Excel 中,使用 Python Jupyter 编写。

具体代码见:https://github.com/longluo/spider/blob/master/weibohot.ipynb 。

微博爬虫

微博爬虫 3 的代码及功能太多太复杂,但我目前只需要爬取下列信息:

  1. 个人信息
  2. 具体时间段全部博文;
  3. 将数据存入 Excel 和数据库中。

所以需要精简其代码。

微博API

微博获取个人信息API:https://weibo.cn/1565668374/info

微博获取个人全部博文API:https://weibo.cn/1565668374?page=1

具体代码见:https://github.com/longluo/spider/blob/master/weibo.py 。

博客爬虫

在写完上述两个爬虫之后,趁热打铁,也把之前自己一直想做完的功能做完了!

功能

  1. 获取某网站的全部博文比如 https://www.longluo.me/ 。
  2. 爬取内容:文章标题、发布时间、分类、链接、正文(HTML格式)等。

分析

  1. 向目标网页发出请求

Header 是请求头信息的,添加的键值对越多,目标网站就越认为你是真实的用户,但我的网站没有反爬虫措施,所以 Header 里面就只有 User-Agent 键值对,用来描述你的操作系统、浏览器等信息。

  1. 获取网站博客的文章标题、发布时间和链接;
  2. 数据写入Excel中;
  3. 数据存入数据库中。

实现

获取全部博文标题、发布时间和链接

全部博文列表位于 https://www.longluo.me/archives/ 中,首先我们要获取的是总共有多少页呢?

总页数

查看网页源代码,我们可以看见全部页数为 <span class="space">&hellip;</span><a class="page-number" href="/archives/page/48/">48</a>,所以我们只需要解析出来即可。

参考网络代码4 ,利用BeautifulSoup解析HTML,然后利用正则表达式匹配,代码如下所示:

1
2
3
4
5
6
7
8
9
10
def get_total_page(self):
html = self.get_html(self.archivesUrl)

page_number = re.compile('[0-9]+')

soap = BeautifulSoup(html, 'html.parser')

page_item = soap.find_all('a', class_="page-number")
total_pages = re.findall(page_number, str(page_item))[-1]
return int(total_pages)
阅读全文 »

By Long Luo

This article is the solution Simple Two Pointers Solution of Problem 125. Valid Palindrome.

Very easy, we can use Two Pointers to solve this problem.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
public boolean isPalindrome(String s) {
if (s == null || s.length() <= 1) {
return true;
}

int n = s.length();
int left = 0;
int right = n - 1;
while (left < right) {
while (left < n && left < right && !(Character.isDigit(s.charAt(left)) || Character.isAlphabetic(s.charAt(left)))) {
left++;
}

while (right > 0 && left < right && !(Character.isDigit(s.charAt(right)) || Character.isAlphabetic(s.charAt(right)))) {
right--;
}

if (Character.toUpperCase(s.charAt(left)) != Character.toUpperCase(s.charAt(right))) {
return false;
}

while (left < right && Character.toUpperCase(s.charAt(left)) == Character.toUpperCase(s.charAt(right))) {
left++;
right--;
}
}

return true;
}
阅读全文 »

By Long Luo

This article is the solution Two Pointers: Sliding Window with HashSet of Problem 3. Longest Substring Without Repeating Characters.

Intuition

We can just simulate the process.

Using a Sliding Window from left to right of the string, and a HashSet to make every character is unique in the sliding window.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
public static int lengthOfLongestSubstring(String s) {
if (s == null || s.length() <= 1) {
return s.length();
}

int ans = 0;
int len = s.length();
Set<Character> set = new HashSet<>();
int left = 0;
int right = 0;
while (right < len) {
while (right < len && !set.contains(s.charAt(right))) {
set.add(s.charAt(right));
right++;
}

ans = Math.max(ans, right - left);
set.remove(s.charAt(left));
left++;
}

return ans;
}
阅读全文 »
0%