Python深拷贝与浅拷贝bug实例浅析

  一个bug引起的……thinking

需求简述

  将excel学生信息表转换为json格式。   其中代码有一步要将excel每行的数据按照json模板格式替换掉默认值。

原代码

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import xlrd
import json

class GetStudentInfo(object):
    def __init__(self, student_info_path):
        self.student_info_path = student_info_path
        self.template = {
            "name": "ZhangSan",
            "sex": "female",
            "grade": "6",
            "age": "12",
            "id": "0"
        }

    def create_new_student(self, name, student_id):
        new_student = self.template
        new_student['name'] = name
        new_student['id'] = student_id
        return new_student

    def get_whole_stu_info(self):
        students = {}
        tables = xlrd.open_workbook(self.student_info_path)
        table = tables.sheets()[0]
        for row in range(0, table.nrows - 1):
            name = table.cell_value(row + 1, 0)
            student_id = table.cell_value(row + 1, 1)
            new_student = self.create_new_student(name, student_id)
            students[str(row)] = new_student
        self.get_new_file(students)

    def get_new_file(self, students):
        with open('./output.json', 'w', encoding='utf-8') as file:
            json.dump(students, file, indent=4, ensure_ascii=False)

if __name__ ==  '__main__':
    student_info_path = './student_info.xlsx'
    data = GetStudentInfo(student_info_path)
    data.get_whole_stu_info()

  输出文件为

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
{
    "0": {
        "name": "GouDong",
        "sex": "female",
        "grade": "6",
        "age": "12",
        "id": 4.0
    },
    "1": {
        "name": "GouDong",
        "sex": "female",
        "grade": "6",
        "age": "12",
        "id": 4.0
    },
    "2": {
        "name": "GouDong",
        "sex": "female",
        "grade": "6",
        "age": "12",
        "id": 4.0
    },
    "3": {
        "name": "GouDong",
        "sex": "female",
        "grade": "6",
        "age": "12",
        "id": 4.0
    }
}

Bug定位

1
2
3
4
5
6
    def create_new_student(self, name, student_id):
        new_student = self.template # 这一行有问题!
        print(id(new_student))
        new_student['name'] = name
        new_student['id'] = student_id
        return new_student

  发现每一次的new_student的id是一样的

1
2
3
4
2110590375320
2110590375320
2110590375320
2110590375320

解决方法

修改为

1
2
3
4
5
6
7
    def create_new_student(self, name, student_id):
        new_student = copy.deepcopy(self.template) # 修改后(法一)
        # new_student = copy.copy(self.template) # 修改后(法二)
        print(id(new_student))
        new_student['name'] = name
        new_student['id'] = student_id
        return new_student

  此时输出的id不同了:

1
2
3
4
2392740865832
2392740866072
2392740866152
2392740865752

  新的输出文件为

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
{
    "0": {
        "name": "LiuBo",
        "sex": "female",
        "grade": "6",
        "age": "12",
        "id": 1.0
    },
    "1": {
        "name": "BoCai",
        "sex": "female",
        "grade": "6",
        "age": "12",
        "id": 2.0
    },
    "2": {
        "name": "CaiGou",
        "sex": "female",
        "grade": "6",
        "age": "12",
        "id": 3.0
    },
    "3": {
        "name": "GouDong",
        "sex": "female",
        "grade": "6",
        "age": "12",
        "id": 4.0
    }
}

分析

  原代码中,new_student = self.template每一次都将*new_student指向self.templatestudents[str(row)] = new_student每一次都将students[str(row)]指向new_student。所以每次new_student修改后,students[str(row)]的全部值都会更改为最新版。

  若要避免该问题,就涉及到浅拷贝和深拷贝的问题。

  • 赋值:仅仅是个别名,引用,指向原有地址,id的地址和原有地址相同。(就像快捷方式。)
  • 浅拷贝:第一层拷贝了,里面子文件全是引用。(先建一个新对象,对象地址是新的,里面放原数据的地址,就像一个文件夹里放的全是快捷方式。)
  • 深拷贝:新对象的内存地址也会重新分配,跟原来的内存地址不一样。完全弄一个克隆版,克隆体和本体没有关系了,本体改了克隆体不变。(先建一个新对象,对象地址是新的,里面放的全是克隆体,其地址也是新的。就像一个文件夹里放的全是文件,而不是快捷方式。)

  再要分清Python里,“=”号、copy.copycopy.deepcopy三者的区别。

  • “=“号:对应赋值
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
a = 1
b = a
print('原来的a', a, '地址', id(a))
print('b', b, '地址', id(b))
b = 2
print('此时的a', a, '地址', id(a))
print('修改后的b', b, '地址', id(b))
# output
# 原来的a 1 地址 140720364485696
# b 1 地址 140720364485696
# 此时的a 1 地址 140720364485696
# 修改后的b 2 地址 140720364485728

修改后b此时地址变了,因为赋给一个全新完整的变量会重新生成新地址。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
c = [1, 2]
print('原来的c', c, '地址', id(c))
d = c
print('d', d, '地址', id(d))
d[0] = 3
print('此时的c', c, '地址', id(c))
print('修改后的d', d, '地址', id(d))
# output
# 原来的c [1, 2] 地址 2758022890888
# d [1, 2] 地址 2758022890888
# 此时的c [3, 2] 地址 2758022890888
# 修改后的d [3, 2] 地址 2758022890888

  修改后d地址没变,因为只修改了d内的部分值。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
e = {
    "name": "ZhangSan",
    "id": "0"
}
print('原来的e', e, '地址', id(e))
f = e
print('f', f, '地址', id(f))
f['id'] = '1'
print('此时的e', e, '地址', id(e))
print('修改后的f', f, '地址', id(f))
# output
# 原来的e {'name': 'ZhangSan', 'id': '0'} 地址 2001978290072
# f {'name': 'ZhangSan', 'id': '0'} 地址 2001978290072
# 此时的e {'name': 'ZhangSan', 'id': '1'} 地址 2001978290072
# 修改后的f {'name': 'ZhangSan', 'id': '1'} 地址 2001978290072

  修改后f地址没变,因为只修改了f内的部分值。

  • copy.copy:对应浅拷贝
  • copy.deepcopy:对应深拷贝   官方文档:copy函数

The difference between shallow and deep copying is only relevant for compound objects (objects that contain other objects, like lists or class instances): A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original. A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.

  注意加粗字体,区别在于一个是引用,一个是复制体本身。

使用 Hugo 构建
主题 StackJimmy 设计