您的位置: 首页 > 技术杂谈 > 正文

导入数据：gs_restore or MERGE INTO? 看看哪款更适合你

2022-03-18 12:00 https://my.oschina.net/gaussdb/blog/5493050 Gauss松鼠会次阅读条评论

腾讯云实验平台开春福利，100+门实验限免体验，精品实验享8折优惠！ >>>

1. 使用gs_restore命令导入数据

操作场景

gs_restore是openGauss数据库提供的与gs_dump配套的导入工具。通过该工具，可将gs_dump导出的文件导入至数据库。gs_restore支持导入的文件格式包含自定义归档格式、目录归档格式和tar归档格式。

gs_restore具备如下两种功能。

导入至数据库

如果指定了数据库，则数据将被导入到指定的数据库中。其中，并行导入必须指定连接数据库的密码。导入时生成列会自动更新，并像普通列一样保存。
导入至脚本文件

如果未指定导入数据库，则创建包含重建数据库所需的SQL语句脚本，并将其写入至文件或者标准输出。该脚本文件等效于gs_dump导出的纯文本格式文件。

gs_restore工具在导入时，允许用户选择需要导入的内容，并支持在数据导入前对等待导入的内容进行排序。

操作步骤

说明： gs_restore默认是以追加的方式进行数据导入。为避免多次导入造成数据异常，在进行导入时，建议选择使用”-c” 和”-e”参数。”-c”表示在重新创建数据库对象前，清理（删除）已存在于将要还原的数据库中的数据库对象；”-e”表示当发送SQL语句到数据库时如果出现错误请退出，默认状态下会继续，且在导入后会显示一系列错误信息。

以操作系统用户omm登录数据库主节点。

使用gs_restore命令，从postgres整个数据库内容的导出文件中，将数据库的所有对象的定义导入到backupdb。

gs_restore -U jack /home/omm/backup/MPPDB_backup.tar -p 8000 -d backupdb -s -e -c
Password:

表 1 常用参数说明

参数	参数说明	举例
-U	连接数据库的用户名。	-U jack
-W	指定用户连接的密码。如果主机的认证策略是trust，则不会对数据库管理员进行密码验证，即无需输入-W选项；如果没有-W选项，并且不是数据库管理员，会提示用户输入密码。	-W abcd@123
-d	连接数据库dbname，并直接将数据导入到该数据库中。	-d backupdb
-p	指定服务器所侦听的TCP端口或本地Unix域套接字后缀，以确保连接。	-p 8000
-e	当发送SQL语句到数据库时如果出现错误，则退出。默认状态下会忽略错误任务并继续执行导入，且在导入后会显示一系列错误信息。	-
-c	在重新创建数据库对象前，清理（删除）已存在于将要导入的数据库中的数据库对象。	-
-s	只导入模式定义，不导入数据。当前的序列值也不会被导入。	-

其他参数说明请参见《工具参考》中“服务端工具 > gs_restore”章节。

示例

示例一：执行gs_restore，导入指定MPPDB_backup.dmp文件（自定义归档格式）中postgres数据库的数据和对象定义。

gs_restore backup/MPPDB_backup.dmp -p 8000 -d backupdb
Password:
gs_restore[2017-07-21 19:16:26]: restore operation successful
gs_restore: total time: 13053  ms

示例二：执行gs_restore，导入指定MPPDB_backup.tar文件（tar归档格式）中postgres数据库的数据和对象定义。

gs_restore backup/MPPDB_backup.tar -p 8000 -d backupdb 
gs_restore[2017-07-21 19:21:32]: restore operation successful
gs_restore[2017-07-21 19:21:32]: total time: 21203  ms

示例三：执行gs_restore，导入指定MPPDB_backup目录文件（目录归档格式）中postgres数据库的数据和对象定义。

gs_restore backup/MPPDB_backup -p 8000 -d backupdb
gs_restore[2017-07-21 19:26:46]: restore operation successful
gs_restore[2017-07-21 19:26:46]: total time: 21003  ms

示例四：执行gs_restore，将postgres数据库的所有对象的定义导入至backupdb数据库。导入前，数据库存在完整的定义和数据，导入后，backupdb数据库只存在所有对象定义，表没有数据。

gs_restore /home/omm/backup/MPPDB_backup.tar -p 8000 -d backupdb -s -e -c 
Password:
gs_restore[2017-07-21 19:46:27]: restore operation successful
gs_restore[2017-07-21 19:46:27]: total time: 32993  ms

示例五：执行gs_restore，导入MPPDB_backup.dmp文件中PUBLIC模式的所有定义和数据。在导入时会先删除已经存在的对象，如果原对象存在跨模式的依赖则需手工强制干预。

gs_restore backup/MPPDB_backup.dmp -p 8000 -d backupdb -e -c -n PUBLIC
gs_restore: [archiver (db)] Error while PROCESSING TOC:
gs_restore: [archiver (db)] Error from TOC entry 313; 1259 337399 TABLE table1 gaussdba
gs_restore: [archiver (db)] could not execute query: ERROR:  cannot drop table table1 because other objects depend on it
DETAIL:  view t1.v1 depends on table table1
HINT:  Use DROP ... CASCADE to drop the dependent objects too.
Command was: DROP TABLE public.table1;

手工删除依赖，导入完成后再重新创建。

gs_restore backup/MPPDB_backup.dmp -p 8000 -d backupdb -e -c -n PUBLIC
gs_restore[2017-07-21 19:52:26]: restore operation successful
gs_restore[2017-07-21 19:52:26]: total time: 2203  ms

示例六：执行gs_restore，导入MPPDB_backup.dmp文件中hr模式下表hr.staffs的定义。在导入之前，hr.staffs表不存在，需要确保存在hr的schema。

gs_restore backup/MPPDB_backup.dmp -p 8000 -d backupdb -e -c -s -n hr -t staffs
gs_restore[2017-07-21 19:56:29]: restore operation successful
gs_restore[2017-07-21 19:56:29]: total time: 21000  ms

示例七：执行gs_restore，导入MPPDB_backup.dmp文件中hr模式下表hr.staffs的数据。在导入之前，hr.staffs表不存在数据，需要确保存在hr的schema。

gs_restore backup/MPPDB_backup.dmp -p 8000 -d backupdb -e -a -n hr -t staffs
gs_restore[2017-07-21 20:12:32]: restore operation successful
gs_restore[2017-07-21 20:12:32]: total time: 20203  ms

示例八：执行gs_restore，导入指定表hr.staffs的定义。在导入之前，hr.staffs表的数据是存在的。

human_resource=# select * from hr.staffs;
 staff_id | first_name  |  last_name  |  email   |    phone_number    |      hire_date      | employment_id |  salary  | commission_pct | manager_id | section_id 
----------+-------------+-------------+----------+--------------------+---------------------+---------------+----------+----------------+------------+------------
      200 | Jennifer    | Whalen      | JWHALEN  | 515.123.4444       | 1987-09-17 00:00:00 | AD_ASST       |  4400.00 |                |        101 |         10
      201 | Michael     | Hartstein   | MHARTSTE | 515.123.5555       | 1996-02-17 00:00:00 | MK_MAN        | 13000.00 |                |        100 |         20

gsql -d human_resource -p 8000

gsql ((openGauss x.x.x build 50dc16a6) compiled at 2020-11-29 05:49:21 commit 1071 last mr 1373)
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

human_resource=# drop table hr.staffs CASCADE;
NOTICE:  drop cascades to view hr.staff_details_view
DROP TABLE

gs_restore /home/omm/backup/MPPDB_backup.tar -p 8000 -d human_resource -n hr -t staffs -s -e 
Password:
restore operation successful
total time: 904  ms

human_resource=# select * from hr.staffs;
 staff_id | first_name | last_name | email | phone_number | hire_date | employment_id | salary | commission_pct | manager_id | section_id 
----------+------------+-----------+-------+--------------+-----------+---------------+--------+----------------+------------+------------
(0 rows)

示例九：执行gs_restore，导入staffs和areas两个指定表的定义和数据。在导入之前，staffs和areas表不存在。

human_resource=# \d
                                 List of relations
 Schema |        Name        | Type  |  Owner   |             Storage              
--------+--------------------+-------+----------+----------------------------------
 hr     | employment_history | table | omm | {orientation=row,compression=no}
 hr     | employments        | table | omm | {orientation=row,compression=no}
 hr     | places             | table | omm | {orientation=row,compression=no}
 hr     | sections           | table | omm | {orientation=row,compression=no}
 hr     | states             | table | omm | {orientation=row,compression=no}
(5 rows)

gs_restore /home/gaussdb/backup/MPPDB_backup.tar -p 8000 -d human_resource -n hr -t staffs -n hr -t areas 
Password:
restore operation successful
total time: 724  ms

human_resource=# \d
                                 List of relations
 Schema |        Name        | Type  |  Owner   |             Storage              
--------+--------------------+-------+----------+----------------------------------
 hr     | areas              | table | omm | {orientation=row,compression=no}
 hr     | employment_history | table | omm | {orientation=row,compression=no}
 hr     | employments        | table | omm | {orientation=row,compression=no}
 hr     | places             | table | omm | {orientation=row,compression=no}
 hr     | sections           | table | omm | {orientation=row,compression=no}
 hr     | staffs             | table | omm | {orientation=row,compression=no}
 hr     | states             | table | omm | {orientation=row,compression=no}
(7 rows)

human_resource=# select * from hr.areas;
 area_id |       area_name        
---------+------------------------
       4 | Middle East and Africa
       1 | Europe
       2 | Americas
       3 | Asia
(4 rows)

示例十：执行gs_restore，导入hr的模式，包含模式下的所有对象定义和数据。

gs_restore /home/omm/backup/MPPDB_backup1.dmp -p 8000 -d backupdb -n hr -e
Password:
restore operation successful
total time: 702  ms

示例十一：执行gs_restore，同时导入hr和hr1两个模式，仅导入模式下的所有对象定义。

gs_restore /home/omm/backup/MPPDB_backup2.dmp -p 8000 -d backupdb -n hr -n hr1 -s
Password:
restore operation successful
total time: 665  ms

示例十二：执行gs_restore，将human_resource数据库导出文件导入至backupdb数据库中。

openGauss=# create database backupdb;
CREATE DATABASE

gs_restore /home/omm/backup/MPPDB_backup.tar -p 8000 -d backupdb
restore operation successful
total time: 23472  ms

gsql -d backupdb -p 8000 -r

gsql ((openGauss x.x.x build 50dc16a6) compiled at 2020-11-29 05:49:21 commit 1071 last mr 1373)
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

backupdb=# select * from hr.areas;
 area_id |       area_name        
---------+------------------------
       4 | Middle East and Africa
       1 | Europe
       2 | Americas
       3 | Asia
(4 rows)

示例十三：用户user1不具备将导出文件中数据导入至数据库backupdb的权限，而角色role1具备该权限，要实现将文件数据导入数据库backupdb，可以在导出命令中设置–role角色为role1，使用role1的权限，完成导出目的。

human_resource=# CREATE USER user1 IDENTIFIED BY "1234@abc";
CREATE ROLE role1 with SYSADMIN IDENTIFIED BY "abc@1234";
gs_restore -U user1 /home/omm/backup/MPPDB_backup.tar -p 8000 -d backupdb --role role1 --rolepassword abc@1234
Password:
restore operation successful
total time: 554  ms

gsql -d backupdb -p 8000 -r 

gsql ((openGauss x.x.x build 50dc16a6) compiled at 2020-11-29 05:49:21 commit 1071 last mr 1373)
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

backupdb=# select * from hr.areas;
 area_id |       area_name        
---------+------------------------
       4 | Middle East and Africa
       1 | Europe
       2 | Americas
       3 | Asia
(4 rows)

2. 更新表中数据

2.1 使用DML命令更新表

openGauss支持标准的数据库操作语言（DML）命令，对表进行更新。

操作步骤

假设存在表customer_t，表结构如下：

CREATE TABLE customer_t
( c_customer_sk             integer,   
  c_customer_id             char(5),    
  c_first_name              char(6),    
  c_last_name               char(8) 
) ;

可以使用如下DML命令对表进行数据更新。

使用INSERT向表中插入数据。

向表customer_t中插入一行。

INSERT INTO customer_t (c_customer_sk, c_customer_id, c_first_name,c_last_name) VALUES (3769, 5, 'Grace','White');

向表customer_t中插入多行数据。

INSERT INTO customer_t (c_customer_sk, c_customer_id, c_first_name,c_last_name) VALUES    
(6885, 1, 'Joes', 'Hunter'),    
(4321, 2, 'Lily','Carter'),    
(9527, 3, 'James', 'Cook'),
(9500, 4, 'Lucy', 'Baker');

更多关于INSERT的使用方法，请参见向表中插入数据。

使用UPDATE更新表中数据。修改字段c_customer_id值为0。
```
UPDATE customer_t SET c_customer_id = 0;
```
更多关于UPDATE的使用方法，请参见UPDATE。
使用DELETE删除表中的行。

可以使用WHERE子句指定需要删除的行，若不指定即删除表中所有的行，只保留数据结构。
```
DELETE FROM customer_t WHERE c_last_name = 'Baker';
```
更多关于DELETE的使用方法，请参见DELETE。
使用TRUNCATE命令快速从表中删除所有的行。
```
TRUNCATE TABLE customer_t;
```
更多关于TRUNCATE的使用方法，请参见TRUNCATE。

删除表时，DELETE语句每次删除一行数据而TRUNCATE语句是通过释放表存储的数据页来删除数据，使用TRUNCATE语句比使用DELETE语句更加快速。

使用DELETE语句删除表时，仅删除数据，不释放存储空间。使用TRUNCATE语句删除表时，删除数据且释放存储空间。

2.1 使用合并方式更新和插入数据

在用户需要将一个表中所有的数据或大量的数据添加至现有表的场景下，openGauss提供了MERGE INTO语句通过两个表合并的方式高效地将新数据添加到现有表。

MERGE INTO语句将目标表和源表中数据针对关联条件进行匹配，若关联条件匹配时对目标表进行UPDATE，关联条件不匹配时对目标表执行INSERT。此方法可以很方便地用来将两个表合并执行UPDATE和INSERT，避免多次执行。

前提条件

进行MERGE INTO操作的用户需要同时拥有目标表的UPDATE和INSERT权限，以及源表的SELECT权限。

操作步骤

创建源表products，并插入数据。

openGauss=# CREATE TABLE products 
( product_id INTEGER, 
  product_name VARCHAR2(60), 
  category VARCHAR2(60) 
);
    
openGauss=# INSERT INTO products VALUES 
(1502, 'olympus camera', 'electrncs'),
(1601, 'lamaze', 'toys'),
(1666, 'harry potter', 'toys'),
(1700, 'wait interface', 'books');

创建目标表newproducts，并插入数据。

openGauss=# CREATE TABLE newproducts 
( product_id INTEGER, 
  product_name VARCHAR2(60), 
  category VARCHAR2(60) 
); 
    
openGauss=# INSERT INTO newproducts VALUES 
(1501, 'vivitar 35mm', 'electrncs'),
(1502, 'olympus ', 'electrncs'),
(1600, 'play gym', 'toys'),
(1601, 'lamaze', 'toys'), 
(1666, 'harry potter', 'dvd');

使用MERGE INTO 语句将源表products的数据合并至目标表newproducts。

MERGE INTO newproducts np    
USING products p    
ON (np.product_id = p.product_id )    
WHEN MATCHED THEN     
  UPDATE SET np.product_name = p.product_name, np.category = p.category 
WHEN NOT MATCHED THEN     
  INSERT VALUES (p.product_id, p.product_name, p.category) ;

上述语句中使用的参数说明，请见表1。更多信息，请参见MERGE INTO。

表 1 MERGE INTO语句参数说明

参数	说明	举例
INTO 子句	指定需要更新或插入数据的目标表。目标表支持指定别名。	取值：newproducts np 说明：名为newproducts，别名为np的目标表。
USING子句	指定源表。源表支持指定别名。	取值：products p 说明：名为products，别名为p的源表。
ON子句	指定目标表和源表的关联条件。关联条件中的字段不支持更新。	取值：np.product_id = p.product_id 说明：指定的关联条件为，目标表newproducts的product_id字段和源表products的product_id字段相等。
WHEN MATCHED子句	当源表和目标表中数据针对关联条件可以匹配上时，选择WHEN MATCHED子句进行UPDATE操作。仅支持指定一个WHEN MATCHED子句。 WHEN MATCHED子句可缺省，缺省时，对于满足ON子句条件的行，不进行任何操作。若目标表中存在分布列，则该列不支持更新。	取值：WHEN MATCHED THEN UPDATE SET np.product_name = p.product_name, np.category = p.category 说明：当满足ON子句条件时，将目标表newproducts的product_name、category字段的值替换为源表products相对应字段的值。
WHEN NOT MATCHED子句	当源表和目标表中数据针对关联条件无法匹配时，选择WHEN NOT MATCHED子句进行INSERT操作。仅支持指定一个WHEN NOT MATCHED子句。 WHEN NOT MATCHED子句可缺省。不支持INSERT子句中包含多个VALUES。 WHEN MATCHED和WHEN NOT MATCHED子句顺序可以交换，可以缺省其中一个，但不能同时缺省。	取值：WHEN NOT MATCHED THEN INSERT VALUES (p.product_id, p.product_name, p.category) 说明：将源表products中，不满足ON子句条件的行插入目标表newproducts。

查询合并后的目标表newproducts。

SELECT * FROM newproducts;

返回信息如下：

 product_id |  product_name  | category
------------+----------------+-----------
      1501 | vivitar 35mm   | electrncs
      1502 | olympus camera | electrncs
      1666 | harry potter   | toys
      1600 | play gym       | toys
      1601 | lamaze         | toys
      1700 | wait interface | books
(6 rows)

展开阅读全文