抱歉,您的浏览器无法访问本站
本页面需要浏览器支持(启用)JavaScript
了解详情 >

靖待的技术博客

小清新IT旅程 | 为中华之崛起而读书

一个bug引起的…thinking

需求简述

将excel学生信息表转换为json格式。
  其中代码有一步要将excel每行的数据按照json模板格式替换掉默认值。

原代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import xlrd
import json

class GetStudentInfo(object):
def __init__(self, student_info_path):
self.student_info_path = student_info_path
self.template = {
"name": "ZhangSan",
"sex": "female",
"grade": "6",
"age": "12",
"id": "0"
}

def create_new_student(self, name, student_id):
new_student = self.template
new_student['name'] = name
new_student['id'] = student_id
return new_student

def get_whole_stu_info(self):
students = {}
tables = xlrd.open_workbook(self.student_info_path)
table = tables.sheets()[0]
for row in range(0, table.nrows - 1):
name = table.cell_value(row + 1, 0)
student_id = table.cell_value(row + 1, 1)
new_student = self.create_new_student(name, student_id)
students[str(row)] = new_student
self.get_new_file(students)

def get_new_file(self, students):
with open('./output.json', 'w', encoding='utf-8') as file:
json.dump(students, file, indent=4, ensure_ascii=False)

if __name__ == '__main__':
student_info_path = './student_info.xlsx'
data = GetStudentInfo(student_info_path)
data.get_whole_stu_info()

输出文件为

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
{
"0": {
"name": "GouDong",
"sex": "female",
"grade": "6",
"age": "12",
"id": 4.0
},
"1": {
"name": "GouDong",
"sex": "female",
"grade": "6",
"age": "12",
"id": 4.0
},
"2": {
"name": "GouDong",
"sex": "female",
"grade": "6",
"age": "12",
"id": 4.0
},
"3": {
"name": "GouDong",
"sex": "female",
"grade": "6",
"age": "12",
"id": 4.0
}
}

Bug定位

1
2
3
4
5
6
def create_new_student(self, name, student_id):
new_student = self.template # 这一行有问题!
print(id(new_student))
new_student['name'] = name
new_student['id'] = student_id
return new_student

发现每一次的new_student的id是一样的

1
2
3
4
2110590375320
2110590375320
2110590375320
2110590375320

解决方法

修改为

1
2
3
4
5
6
7
def create_new_student(self, name, student_id):
new_student = copy.deepcopy(self.template) # 修改后(法一)
# new_student = copy.copy(self.template) # 修改后(法二)
print(id(new_student))
new_student['name'] = name
new_student['id'] = student_id
return new_student

此时输出的id不同了:

1
2
3
4
2392740865832
2392740866072
2392740866152
2392740865752

新的输出文件为

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
{
"0": {
"name": "LiuBo",
"sex": "female",
"grade": "6",
"age": "12",
"id": 1.0
},
"1": {
"name": "BoCai",
"sex": "female",
"grade": "6",
"age": "12",
"id": 2.0
},
"2": {
"name": "CaiGou",
"sex": "female",
"grade": "6",
"age": "12",
"id": 3.0
},
"3": {
"name": "GouDong",
"sex": "female",
"grade": "6",
"age": "12",
"id": 4.0
}
}

分析

原代码中,new_student = self.template每一次都将*new_student指向self.templatestudents[str(row)] = new_student每一次都将students[str(row)]指向new_student。所以每次new_student修改后,students[str(row)]的全部值都会更改为最新版。

若要避免该问题,就涉及到浅拷贝和深拷贝的问题。

  • 赋值:仅仅是个别名,引用,指向原有地址,id的地址和原有地址相同。(就像快捷方式。)
  • 浅拷贝:第一层拷贝了,里面子文件全是引用。(先建一个新对象,对象地址是新的,里面放原数据的地址,就像一个文件夹里放的全是快捷方式。)
  • 深拷贝:新对象的内存地址也会重新分配,跟原来的内存地址不一样。完全弄一个克隆版,克隆体和本体没有关系了,本体改了克隆体不变。(先建一个新对象,对象地址是新的,里面放的全是克隆体,其地址也是新的。就像一个文件夹里放的全是文件,而不是快捷方式。)

再要分清Python里,“=”号、copy.copycopy.deepcopy三者的区别。

  • "="号:对应赋值
1
2
3
4
5
6
7
8
9
10
11
12
a = 1
b = a
print('原来的a', a, '地址', id(a))
print('b', b, '地址', id(b))
b = 2
print('此时的a', a, '地址', id(a))
print('修改后的b', b, '地址', id(b))
# output
# 原来的a 1 地址 140720364485696
# b 1 地址 140720364485696
# 此时的a 1 地址 140720364485696
# 修改后的b 2 地址 140720364485728

修改后b此时地址变了,因为赋给一个全新完整的变量会重新生成新地址。

1
2
3
4
5
6
7
8
9
10
11
12
c = [1, 2]
print('原来的c', c, '地址', id(c))
d = c
print('d', d, '地址', id(d))
d[0] = 3
print('此时的c', c, '地址', id(c))
print('修改后的d', d, '地址', id(d))
# output
# 原来的c [1, 2] 地址 2758022890888
# d [1, 2] 地址 2758022890888
# 此时的c [3, 2] 地址 2758022890888
# 修改后的d [3, 2] 地址 2758022890888

修改后d地址没变,因为只修改了d内的部分值。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
e = {
"name": "ZhangSan",
"id": "0"
}
print('原来的e', e, '地址', id(e))
f = e
print('f', f, '地址', id(f))
f['id'] = '1'
print('此时的e', e, '地址', id(e))
print('修改后的f', f, '地址', id(f))
# output
# 原来的e {'name': 'ZhangSan', 'id': '0'} 地址 2001978290072
# f {'name': 'ZhangSan', 'id': '0'} 地址 2001978290072
# 此时的e {'name': 'ZhangSan', 'id': '1'} 地址 2001978290072
# 修改后的f {'name': 'ZhangSan', 'id': '1'} 地址 2001978290072

修改后f地址没变,因为只修改了f内的部分值。

  • copy.copy:对应浅拷贝
  • copy.deepcopy:对应深拷贝
      官方文档:copy函数

The difference between shallow and deep copying is only relevant for compound objects (objects that contain other objects, like lists or class instances):
A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original.
A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.

注意加粗字体,区别在于一个是引用,一个是复制体本身。

评论