一个bug引起的……thinking

需求简述

　　将excel学生信息表转换为json格式。　　其中代码有一步要将excel每行的数据按照json模板格式替换掉默认值。

原代码

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39


import xlrd
import json

class GetStudentInfo(object):
    def __init__(self, student_info_path):
        self.student_info_path = student_info_path
        self.template = {
            "name": "ZhangSan",
            "sex": "female",
            "grade": "6",
            "age": "12",
            "id": "0"
        }

    def create_new_student(self, name, student_id):
        new_student = self.template
        new_student['name'] = name
        new_student['id'] = student_id
        return new_student

    def get_whole_stu_info(self):
        students = {}
        tables = xlrd.open_workbook(self.student_info_path)
        table = tables.sheets()[0]
        for row in range(0, table.nrows - 1):
            name = table.cell_value(row + 1, 0)
            student_id = table.cell_value(row + 1, 1)
            new_student = self.create_new_student(name, student_id)
            students[str(row)] = new_student
        self.get_new_file(students)

    def get_new_file(self, students):
        with open('./output.json', 'w', encoding='utf-8') as file:
            json.dump(students, file, indent=4, ensure_ascii=False)

if __name__ ==  '__main__':
    student_info_path = './student_info.xlsx'
    data = GetStudentInfo(student_info_path)
    data.get_whole_stu_info()

　　输出文件为

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


{
    "0": {
        "name": "GouDong",
        "sex": "female",
        "grade": "6",
        "age": "12",
        "id": 4.0
    },
    "1": {
        "name": "GouDong",
        "sex": "female",
        "grade": "6",
        "age": "12",
        "id": 4.0
    },
    "2": {
        "name": "GouDong",
        "sex": "female",
        "grade": "6",
        "age": "12",
        "id": 4.0
    },
    "3": {
        "name": "GouDong",
        "sex": "female",
        "grade": "6",
        "age": "12",
        "id": 4.0
    }
}

Bug定位

1
2
3
4
5
6


    def create_new_student(self, name, student_id):
        new_student = self.template # 这一行有问题！
        print(id(new_student))
        new_student['name'] = name
        new_student['id'] = student_id
        return new_student

　　发现每一次的new_student的id是一样的

1
2
3
4


2110590375320
2110590375320
2110590375320
2110590375320

解决方法

修改为

1
2
3
4
5
6
7


    def create_new_student(self, name, student_id):
        new_student = copy.deepcopy(self.template) # 修改后（法一）
        # new_student = copy.copy(self.template) # 修改后（法二）
        print(id(new_student))
        new_student['name'] = name
        new_student['id'] = student_id
        return new_student

　　此时输出的id不同了：

1
2
3
4


2392740865832
2392740866072
2392740866152
2392740865752

　　新的输出文件为

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


{
    "0": {
        "name": "LiuBo",
        "sex": "female",
        "grade": "6",
        "age": "12",
        "id": 1.0
    },
    "1": {
        "name": "BoCai",
        "sex": "female",
        "grade": "6",
        "age": "12",
        "id": 2.0
    },
    "2": {
        "name": "CaiGou",
        "sex": "female",
        "grade": "6",
        "age": "12",
        "id": 3.0
    },
    "3": {
        "name": "GouDong",
        "sex": "female",
        "grade": "6",
        "age": "12",
        "id": 4.0
    }
}

分析

　　原代码中，new_student = self.template每一次都将*new_student指向self.template，students[str(row)] = new_student每一次都将students[str(row)]指向new_student。所以每次new_student修改后，students[str(row)]的全部值都会更改为最新版。

　　若要避免该问题，就涉及到浅拷贝和深拷贝的问题。

赋值：仅仅是个别名，引用，指向原有地址，id的地址和原有地址相同。（就像快捷方式。）
浅拷贝：第一层拷贝了，里面子文件全是引用。（先建一个新对象，对象地址是新的，里面放原数据的地址，就像一个文件夹里放的全是快捷方式。）
深拷贝：新对象的内存地址也会重新分配，跟原来的内存地址不一样。完全弄一个克隆版，克隆体和本体没有关系了，本体改了克隆体不变。（先建一个新对象，对象地址是新的，里面放的全是克隆体，其地址也是新的。就像一个文件夹里放的全是文件，而不是快捷方式。）

　　再要分清Python里，“=”号、copy.copy和copy.deepcopy三者的区别。

“=“号：对应赋值

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


a = 1
b = a
print('原来的a', a, '地址', id(a))
print('b', b, '地址', id(b))
b = 2
print('此时的a', a, '地址', id(a))
print('修改后的b', b, '地址', id(b))
# output
# 原来的a 1 地址 140720364485696
# b 1 地址 140720364485696
# 此时的a 1 地址 140720364485696
# 修改后的b 2 地址 140720364485728

修改后b此时地址变了，因为赋给一个全新完整的变量会重新生成新地址。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


c = [1, 2]
print('原来的c', c, '地址', id(c))
d = c
print('d', d, '地址', id(d))
d[0] = 3
print('此时的c', c, '地址', id(c))
print('修改后的d', d, '地址', id(d))
# output
# 原来的c [1, 2] 地址 2758022890888
# d [1, 2] 地址 2758022890888
# 此时的c [3, 2] 地址 2758022890888
# 修改后的d [3, 2] 地址 2758022890888

　　修改后d地址没变，因为只修改了d内的部分值。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


e = {
    "name": "ZhangSan",
    "id": "0"
}
print('原来的e', e, '地址', id(e))
f = e
print('f', f, '地址', id(f))
f['id'] = '1'
print('此时的e', e, '地址', id(e))
print('修改后的f', f, '地址', id(f))
# output
# 原来的e {'name': 'ZhangSan', 'id': '0'} 地址 2001978290072
# f {'name': 'ZhangSan', 'id': '0'} 地址 2001978290072
# 此时的e {'name': 'ZhangSan', 'id': '1'} 地址 2001978290072
# 修改后的f {'name': 'ZhangSan', 'id': '1'} 地址 2001978290072

　　修改后f地址没变，因为只修改了f内的部分值。

copy.copy：对应浅拷贝
copy.deepcopy：对应深拷贝　　官方文档：copy函数

The difference between shallow and deep copying is only relevant for compound objects (objects that contain other objects, like lists or class instances): A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original. A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.

　　注意加粗字体，区别在于一个是引用，一个是复制体本身。