MongoDB之旅(上) | ￥ЯႭ1I0

介绍数据类型、shell、基本操作等内容

本文所使用的MongoDB版本为3.4.2

数据类型

在JSON键值对的基础上增加了其它通用类型:

null
布尔型 - true&false
数值 - shell默认为64位浮点数，对于整数可使用NumberInt
字符串
日期 - new Date()
正则表达式
数组
内嵌文档
对象id - 12字节id，文档唯一标识。_id和ObjectId键值对
二进制数据 - 二进制字符串，不能直接在shell中使用
代码 - 包含任意JavaScript代码

ObjectId

共12个字节:

0-3 时间戳
4-6 机器
7-8 PID
9-12 计数器

shell

无连接进入shell
```
mongo --nodb
```
启动mongo

mongod --dbpath /Users/yrq/Desktop/db

连接

mongo
#默认地址是mongodb://127.0.0.1:27017

查看库和集合

# 1.查看库
> show dbs
admin  0.000GB
local  0.000GB
test   0.000GB
# 2.显示当前库
> db
test
# 3.显示当前库的集合
> show collections
foo
users

基本操作

插入

单个插入

使用insert方法，接受一个文档对象。

> string = {"content":"Hello mongo"}
{ "content" : "Hello mongo" }
> db.foo.insert(string)
> db.foo.find()
{ "_id" : ObjectId("58e0d8ae0eb1a56f1497eda5"), "content" : "Hello mongo" }

批量插入

使用insert方法(在mongo3.2.3中batchInsert方法被弃用)，接受文档数组作为参数

> db.foo.insert([{"no":1},{"no":2},{"no":3}])
BulkWriteResult({
	"writeErrors" : [ ],
	"writeConcernErrors" : [ ],
	"nInserted" : 3,
	"nUpserted" : 0,
	"nMatched" : 0,
	"nModified" : 0,
	"nRemoved" : 0,
	"upserted" : [ ]
})
> db.foo.find()
{ "_id" : ObjectId("58e1ab966c4ede97e9d75aab"), "no" : 1 }
{ "_id" : ObjectId("58e1ab966c4ede97e9d75aac"), "no" : 2 }
{ "_id" : ObjectId("58e1ab966c4ede97e9d75aad"), "no" : 3 }

删除

使用remove方法，删除无法撤销也无法恢复。

> db.foo.remove({})
WriteResult({ "nRemoved" : 3 })
> db.foo.find()
>

更新

使用update方法,接受一个查询和一个更新对象。

> string.date = "20170403"
20170403
> db.foo.update({"content":"Hello mongo"},string)
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.foo.find()
{ "_id" : ObjectId("58e0d8ae0eb1a56f1497eda5"), "content" : "Hello mongo", "date" : "20170403" }

使用修改器

“$set” - 指定字段的值，若不存在则创建

> db.foo.update({"content":"Hello mongo"},{"$set":{"user":"yrq","content":"Hello"}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.foo.find()
{ "_id" : ObjectId("58e1be6cb6647fe45f462cf2"), "content" : "Hello", "user" : "yrq" }

“$inc” - 指定增加或减少已有键的值，若不存在则创建后加上设置的值

比如用户yrq评论精彩，增加了500分:

> db.foo.update({"user":"yrq"},{"$inc":{"score":500}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.foo.find()
{ "_id" : ObjectId("58e1be6cb6647fe45f462cf2"), "content" : "Hello", "user" : "yrq", "score" : 500 }

然后用户yrq又发了一条违规信息，减少了1000分:

> db.foo.update({"user":"yrq"},{"$inc":{"score":-1000}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.foo.find()
{ "_id" : ObjectId("58e1be6cb6647fe45f462cf2"), "content" : "Hello", "user" : "yrq", "score" : -500 }

“$push” - 若数组存在则在尾部添加元素，不存在则创建

用户yrq看完了一本书，需要添加到用户信息中:

> db.foo.update({"user":"yrq"},{"$push":{"books":{"book":"Know JS","author":"aaa"}}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.foo.find()
{ "_id" : ObjectId("58e1be6cb6647fe45f462cf2"), "content" : "Hello", "user" : "yrq", "books" : [ { "book" : "Know JS", "author" : "aaa" } ] }

之后yrq又看了一本书，再次添加:

> db.foo.update({"user":"yrq"},{"$push":{"books":{"book":"Know Mongo","author":"bbb"}}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.foo.find()
{ "_id" : ObjectId("58e1be6cb6647fe45f462cf2"), "content" : "Hello", "user" : "yrq", "books" : [ { "book" : "Know JS", "author" : "aaa" }, { "book" : "Know Mongo", "author" : "bbb" } ] }

upsert

一种特殊的更新，若未找到符合条件的文档就会以更新条件和更新文档为基础创建新的文档，若找到了则正常更新。

通过将update的第三个参数设为true来使用upsert

> db.foo.update({"user":"yrq"},{"$inc":{"score":500}},true)
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.foo.find()
{ "_id" : ObjectId("58e1be6cb6647fe45f462cf2"), "content" : "Hello", "user" : "yrq", "score" : 500 }

虽然效果与之前的一样，但upsert是原子性的，更高效。

查询

find和findOne查询

使用find和findOne方法，前者返回符合条件的文档，后者返回符合条件的第一条文档。

find中的第一个参数决定返回哪些文档,{}为匹配所有文档

> db.foo.find({"content" : "Hello mongo"})
{ "_id" : ObjectId("58e1be6cb6647fe45f462cf2"), "content" : "Hello mongo" }

find中的第二个参数指定需要返回的键

> db.foo.find({},{_id:1})
{ "_id" : ObjectId("58e1be6cb6647fe45f462cf2") }

查询条件

首先插入一些用户数据:

> db.users.find()
{ "_id" : ObjectId("58e2420bb6647fe45f462cf6"), "name" : "yrq", "age" : 30, "email" : "yrq@163.com" }
{ "_id" : ObjectId("58e24213b6647fe45f462cf7"), "name" : "yy", "age" : 25, "email" : "yy@163.com" }
{ "_id" : ObjectId("58e2421fb6647fe45f462cf8"), "name" : "dxy", "age" : 22, "email" : "dxy@163.com" }

条件查询

“$lt”、"$lte"、"$gt"、"$gte"对应<、<=、>、>=，可组合起来查询范围内的值。查找在23~26岁（含）之间的用户信息:

> db.users.find({"age":{"$gt":23,"$lte":26}})
{ "_id" : ObjectId("58e24213b6647fe45f462cf7"), "name" : "yy", "age" : 25, "email" : "yy@163.com" }

也可使用另外一种操作符"$ne"，表示不相等:

> db.users.find({"age":{"$ne":25}})
{ "_id" : ObjectId("58e2420bb6647fe45f462cf6"), "name" : "yrq", "age" : 30, "email" : "yrq@163.com" }
{ "_id" : ObjectId("58e2421fb6647fe45f462cf8"), "name" : "dxy", "age" : 22, "email" : "dxy@163.com" }

OR查询

“$in"用来查询查询一个键的多个值，比如查询所有年龄为22和25岁的用户信息:

> db.users.find({"age":{"$in":[22,25]}})
{ "_id" : ObjectId("58e24213b6647fe45f462cf7"), "name" : "yy", "age" : 25, "email" : "yy@163.com" }
{ "_id" : ObjectId("58e2421fb6647fe45f462cf8"), "name" : "dxy", "age" : 22, "email" : "dxy@163.com" }

“$in”中可以指定不同类型的值。

“$or"更通用一些，可以在多个键中查询任意的给定值,比如查询名称为yy或年龄为22的用户信息:

> db.users.find({"$or":[{"name":"yy"},{"age":22}]})
{ "_id" : ObjectId("58e24213b6647fe45f462cf7"), "name" : "yy", "age" : 25, "email" : "yy@163.com" }
{ "_id" : ObjectId("58e2421fb6647fe45f462cf8"), "name" : "dxy", "age" : 22, "email" : "dxy@163.com" }

$not

“$not"是元条件句，可以用在任何其他条件之上，比如查询除了年龄22和25岁的用户信息:

> db.users.find({"age":{"$not":{"$in":[22,25]}}})
{ "_id" : ObjectId("58e2420bb6647fe45f462cf6"), "name" : "yrq", "age" : 30, "email" : "yrq@163.com" }

特定类型查询

null

若仅想匹配值为null的文档，既要检查是否为null也要使用”$exists"条件判定键值已存在:

> db.users.find({"y":{"$in":[null],"$exists":true}})
{ "_id" : ObjectId("58e2420bb6647fe45f462cf6"), "name" : "yrq", "age" : 30, "email" : "yrq@163.com", "y" : null }

正则

比如寻找名称中含有字符r的用户信息，不区分大小写:

> db.users.find({"name":/r/i})
{ "_id" : ObjectId("58e2420bb6647fe45f462cf6"), "name" : "yrq", "age" : 30, "email" : "yrq@163.com", "y" : null }

数组

游标

通过游标返回find的查询结果，调整游标的行为可以控制输出的最终结果。

# 1.构造查询
> var cursor = db.users.find()
# 2.发送查询，shell得到查询结果，结果中是否还有数据未被返回
> cursor.hasNext()
true
# 3.获得结果中的下一条数据
> cursor.next()
{
	"_id" : ObjectId("58e2420bb6647fe45f462cf6"),
	"name" : "yrq",
	"age" : 18,
	"email" : "yrq@163.com",
	"y" : null,
	"sex" : "male"
}
# 4.直接读取，由于第一条数据已返回显示的是后两条数据
> cursor
{ "_id" : ObjectId("58e24213b6647fe45f462cf7"), "name" : "yy", "age" : 25, "email" : "yy@163.com" }
{ "_id" : ObjectId("58e2421fb6647fe45f462cf8"), "name" : "dxy", "age" : 22, "email" : "dxy@163.com" }
# 5.再次读取就为空了，这是由于所有结果都已返回，游标被释放掉了（默认情况下，10分钟不使用游标也会被释放）
> cursor
>

查询选项相关函数

limit - 限制返回结果数量，参数为数量上限

比如最多返回两条用户数据:

> db.users.find().limit(2)
{ "_id" : ObjectId("58e2420bb6647fe45f462cf6"), "name" : "yrq", "age" : 18, "email" : "yrq@163.com", "y" : null, "sex" : "male" }
{ "_id" : ObjectId("58e24213b6647fe45f462cf7"), "name" : "yy", "age" : 25, "email" : "yy@163.com" }

skip - 跳过指定数量的数据

比如跳过前两条用户数据:

> db.users.find().skip(2)
{ "_id" : ObjectId("58e2421fb6647fe45f462cf8"), "name" : "dxy", "age" : 22, "email" : "dxy@163.com" }

sort - 对数据进行排序

比如按照年龄的正序对用户数据进行排序:

> db.users.find().sort({"age":1})
{ "_id" : ObjectId("58e2420bb6647fe45f462cf6"), "name" : "yrq", "age" : 18, "email" : "yrq@163.com", "y" : null, "sex" : "male" }
{ "_id" : ObjectId("58e2421fb6647fe45f462cf8"), "name" : "dxy", "age" : 22, "email" : "dxy@163.com" }
{ "_id" : ObjectId("58e24213b6647fe45f462cf7"), "name" : "yy", "age" : 25, "email" : "yy@163.com" }

索引

索引，相当于一本字典的目录，用于加快查询速度。在查询时，若存在索引会先在索引中查找，找到后直接跳到目标文档的位置，可以使查询速度提高几个数量级。

使用

先清空一下之前的users集合
```
> db.users.remove({})
```
加入10万条随机的用户数据

> for(var i=0;i<100000;i++){
...   db.users.insert(
...     {
...       "number":i,
...       "user":"user_"+i,
...       "age":Math.floor(Math.random()*120),
...       "created": new Date()
...     }
...   );
... }

查询用户名为user_10000的用户数据，使用explain方法可以看到整个查询过程搜索了所有10万个文档，耗时61ms

新版的explain方法默认为queryPlanner模式，需要输入executionStats键来查看具体的查询信息。

> db.users.find({"user":"user_10000"}).explain("executionStats").executionStats
{
  "executionSuccess" : true,
  "nReturned" : 1,
  "executionTimeMillis" : 61,
  "totalKeysExamined" : 0,
  "totalDocsExamined" : 100000,
  "executionStages" : {
    "stage" : "COLLSCAN",
    "filter" : {
      "user" : {
        "$eq" : "user_10000"
      }
    },
    "nReturned" : 1,
    "executionTimeMillisEstimate" : 59,
    "works" : 100002,
    "advanced" : 1,
    "needTime" : 100000,
    "needYield" : 0,
    "saveState" : 783,
    "restoreState" : 783,
    "isEOF" : 1,
    "invalidates" : 0,
    "direction" : "forward",
    "docsExamined" : 100000
  }
}

添加user键的索引(升序)

> db.users.ensureIndex({"user":1})

再次进行查询，可以看到仅搜索了一个文档直接找到了，耗时7ms

> db.users.find({user:"user_10000"}).explain("executionStats").executionStats
{
  "executionSuccess" : true,
  "nReturned" : 1,
  "executionTimeMillis" : 7,
  "totalKeysExamined" : 1,
  "totalDocsExamined" : 1,
  "executionStages" : {
      "stage" : "FETCH",
      "nReturned" : 1,
      "executionTimeMillisEstimate" : 0,
      "works" : 2,
      "advanced" : 1,
      "needTime" : 0,
      "needYield" : 0,
      "saveState" : 0,
      "restoreState" : 0,
      "isEOF" : 1,
      "invalidates" : 0,
      "docsExamined" : 1,
      "alreadyHasObj" : 0,
      ...
      }
  }
}

复合索引

在需要进行多值查询与排序时，可以使用复合索引，不同索引的使用方式取决于查询的类型。

键的方向

若需要在多个查询条件上进行排序，可能在索引键的方向上会不同。

索引{"user" : 1, "age" : 1}和索引{"user" : 1, "age" : -1}是不同的。
索引{"user" : 1, "age" : -1}和索引{"user" : -1, "age" : 1}是等价的。

隐式索引

如果有一个拥有N个键的索引，那么就同时拥有了所有这N个键的前缀组成的索引。

比如你有了一个索引{"age" : 1, "created" : 1, "user" : 1}，就相当于也有了一个索引{"age" : 1}和一个索引{"age" : 1, "created" : 1}。

管理

查看

在集合上使用getIndexes()方法查看所有索引信息。

> db.users.getIndexes()
[
	{
		"v" : 2,
		"key" : {
			"_id" : 1
		},
		"name" : "_id_",
		"ns" : "test.users"
	},
	{
		"v" : 2,
		"key" : {
			"user" : 1
		},
		"name" : "user_1",
		"ns" : "test.users"
	}
]

其中_id索引为自动创建的，user为之前创建的索引。v标识索引版本，key和name为具体的键名称方向信息与索引的标识信息。

标识

索引信息中的name字段为索引的标识，生成规则为"键_方向_键_方向…"，如升序user索引的name为"user_1”，也可以在生成索引时指定索引名称。

> db.users.ensureIndex({"user":1},{"name":"u"})

删除

使用dropIndex()方法删除索引，用索引信息中的name字段指定所删索引。

> db.users.dropIndex("user_1")

特殊集合和索引

视图

3.4版本中的新特性

可以根据已存在的集合创建只读视图(read-only views)。

创建

> db.createView("yrq_view","comments",{$match:{"name":"yrq"}})
{ "ok" : 1 }
> show collections
comments
yrq_view

可以看到视图是一种集合。输入参数分别为视图名、集合、管道操作。

特性
- 视图是只读的，尝试写入会发生错误
- 可以使用源集合中的索引
- 不能对视图重命名
- 不支持集合上的mapReduce操作、$text操作符、geoNear命令

固定集合

顾名思义，固定集合就是固定大小的集合，存储方式类似一个循环队列，当容量满时插入新数据会将老数据删除。

创建一个指定大小的固定集合

> db.createCollection("capped_collection",{"capped":true,"size":10000})
{ "ok" : 1 }

指定最大文档数量，无论先达到大小限制还是数量限制，插入新文档后都会讲老文档挤出集合

> db.createCollection("capped_collection",{"capped":true,"size":10000，"max":100})
{ "ok" : 1 }

将已有的常规集合转换为固定集合，无法进行逆操作(将固定集合转换为非固定集合)

> db.runCommand({"convertToCapped":"users","size":100000})
{ "ok" : 1 }

TTL索引

TTL索引是一种具有生命周期的索引,Mongo每分钟会对TTL索引进行一次清理。

创建一个保活一天的索引

> db.users.ensureIndex({"created":1},{"expireAfterSeconds":60*60*24})
{
  "createdCollectionAutomatically" : false,
  "numIndexesBefore" : 1,
  "numIndexesAfter" : 2,
  "ok" : 1
}

使用collMod命令修改保活的时间

> db.runCommand({"collMod":"users",index:{keyPattern : {created : 1}, expireAfterSeconds : 3600}})
{
  "expireAfterSeconds_old" : 86400,
  "expireAfterSeconds_new" : 3600,
  "ok" : 1
}

全文索引

使用全文索引与$text操作符对字符串内容进行全文搜索操作。

创建全文索引

> db.users.ensureIndex({"user":"text"})
{
  "createdCollectionAutomatically" : false,
  "numIndexesBefore" : 2,
  "numIndexesAfter" : 3,
  "ok" : 1
}

通过$text操作符使用全文索引

> db.users.find({$text:{$search:"user_98882"}})
{ "_id" : ObjectId("58e6f4fd81ede7e9078b2fad"), "number" : 98882, "user" : "user_98882", "age" : 96, "created" : ISODate("2017-04-07T02:10:05.305Z") }

关于$text操作符，可以看看官方文档

地理空间索引

MongoDB提供了两种地理空间索引，2d索引(平面地图)和2dsphere索引(曲面地图)。

具体内容可查看官方文档:2d Indexes和2dsphere Indexes

聚合

可以在集合上使用aggregate()对文档进行变换和组合操作，用多种构建构成一个管道(pipeline)。

假设要查找评论中评论次数最多的三名用户，可以使用如下聚合操作:

> db.comments.aggregate(
... {"$project":{"name":1}},
... {"$group":{"_id":"$name","count":{"$sum":1}}},
... {"$sort":{"count":-1}},
... {"$limit":3})
{ "_id" : "dxy", "count" : 18 }
{ "_id" : "yrq", "count" : 10 }
{ "_id" : "rizu", "count" : 7 }

其中

$project是投射操作，使用{filename:1}取出所需字段，使用{filename:0}排除指定字段。
$group是分组操作，这里对name进行分组，每遇到相同的name时则对对应的文档count值加一。
$sort是排序操作，由于要得到评论数最多的，因此这里使用降序排序。
$limit是限制操作，限制返回文档的数量。

管道操作符

$match

筛选出符合条件的文档，之后进行其它聚合操作。

> db.comments.aggregate({"$match":{"name":"rizu"}})
{ "_id" : ObjectId("58e8cc36d574ca3ad45a97cd"), "name" : "rizu", "date" : ISODate("2017-04-08T11:40:38.893Z"), "content" : "hi" }
...

$project

投射操作，可以提取与重命名指定字段，还可使用数学与字符串操作符进行其他操作。

提取与排除指定字段

> db.comments.aggregate({"$project":{"name":1,"_id":0}})
{ "name" : "yrq" }
...

重命名字段

> db.comments.aggregate({"$project":{"user":"$name"}})
{ "_id" : ObjectId("58e8cbeed574ca3ad45a97ac"), "user" : "yrq" }
...

$group 通过指定字段的不同值分组。如根据date分组，并记录每个date时的评论数量。

> db.comments.aggregate({"$group":{"_id":"$date","total":{"$sum":1}}})
{ "_id" : ISODate("2017-04-08T11:40:38.897Z"), "total" : 1 }
{ "_id" : ISODate("2017-04-08T11:40:38.896Z"), "total" : 2 }
{ "_id" : ISODate("2017-04-08T11:39:26.703Z"), "total" : 2 }
...

$unwind 将数组中的值拆分为不同文档。先给用户yrq添加一个喜好数组。

> db.comments.update({"name":"yrq"},{$set:{"hobby":[{"sport":"soccer"},{"lang":"en"}]}})

使用$unwind操作符拆分喜好数组。

> db.comments.aggregate({"$unwind":"$hobby"})
{ "_id" : ObjectId("58e8cbeed574ca3ad45a97ac"), "name" : "yrq", "date" : ISODate("2017-04-08T11:39:26.667Z"), "content" : "hello", "hobby" : { "sport" : "soccer" } }
{ "_id" : ObjectId("58e8cbeed574ca3ad45a97ac"), "name" : "yrq", "date" : ISODate("2017-04-08T11:39:26.667Z"), "content" : "hello", "hobby" : { "lang" : "en" } }

$limit 限制返回文档的数量。

> db.comments.aggregate({"$limit":2})
{ "_id" : ObjectId("58e8cbeed574ca3ad45a97ac"), "name" : "yrq", "date" : ISODate("2017-04-08T11:39:26.667Z"), "content" : "hello" }
{ "_id" : ObjectId("58e8cbeed574ca3ad45a97ad"), "name" : "yrq", "date" : ISODate("2017-04-08T11:39:26.699Z"), "content" : "hello" }

$skip 跳过指定数量的文档。

> db.comments.aggregate({"$skip":10},{"$limit":2})
{ "_id" : ObjectId("58e8cbfed574ca3ad45a97b6"), "name" : "yy", "date" : ISODate("2017-04-08T11:39:42.989Z"), "content" : "h" }
{ "_id" : ObjectId("58e8cbfed574ca3ad45a97b7"), "name" : "yy", "date" : ISODate("2017-04-08T11:39:42.990Z"), "content" : "h" }

MapReduce

使用MapReduce的几个步骤:

map(映射) - 将操作映射到集合中的文档，操作可以无任何行为或产生一些键和值。
shuffle(洗牌) - 中间环节，按照键分组，将产生的键值组成列表放到对应的键中。
reduce(化简) - 把列表中的值化简为单值并返回，然后继续洗牌，直到每个键的列表只有一个值为止。

例：找出comments集合中的所有键并统计个数。首先编写map函数得到文档中的所有键，使用emit函数返回某个键的计数{count:1}。

> map = function(){
...  for(var key in this){
...    emit(key,{count:1});
... }};

编写reduce函数，统计emit，reduce函数需要在之前的map阶段与前一个reduce产生的结果上反复执行，因此它返回的文档必须能作为reduce的第二个参数的一个元素。

> reduce = function(key,emits){
...   total = 0;
...   for(var i in emits){
...     total += emits[i].count;
...   }
...   return {"count":total};
... }

执行mapReduce命令。

> db.comments.mapReduce(map,reduce,{out:{"inline":1}})
{
	"results" : [
		{
			"_id" : "_id",
			"value" : {
				"count" : 40
			}
		},
		{
			"_id" : "content",
			"value" : {
				"count" : 40
			}
		},
		{
			"_id" : "date",
			"value" : {
				"count" : 40
			}
		},
		{
			"_id" : "hobby",
			"value" : {
				"count" : 1
			}
		},
		{
			"_id" : "name",
			"value" : {
				"count" : 40
			}
		}
	],
	"timeMillis" : 18,
	"counts" : {
		"input" : 40,
		"emit" : 161,
		"reduce" : 4,
		"output" : 5
	},
	"ok" : 1
}

第一个参数为map函数，第二个参数为reduce函数，第三个参数设置query、output、sort和limit等。

聚合命令

count

返回集合中的文档数量。
```
> db.comments.count()
40
```

distinct

找出指定键的不同值，需要输入集合和键名。

> db.runCommand({"distinct":"comments","key":"name"})
{ "values" : [ "yrq", "yy", "dxy", "rizu" ], "ok" : 1 }

group

3.4版本以后弃用了group命令，可以使用db.collection.aggregate()结合$group操作符或者db.collection.mapReduce()替换。
aggreagate & mapReduce

上面已介绍。