json - mongodb schema optimization -


extension of earlier query @ mongodb find query

under season different vendors sells different fruits , arrange them inaccorance shelf number. below records have inserted in mongodb - vendor.json

{   "_id" : "vendor1",   "shelf_1": ["pear","banana"],   "shelf_2" : ["grapes", "muskmelon", "apricot"],   "shelf_3" : ["pineapple, "kiwi fruit"],   "shelf_4" : ["orange"],   "shelf_5" : ["guava","lemon"] }  {   "_id" : "vendor2",   "shelf_1": ["mango","banana"],   "shelf_2" : ["grapes", "muskmelon", "peach"],   "shelf_3" : ["pear, "pulm"],   "shelf_4" : ["jackfruit"],   "shelf_5" : ["apple","apricot"],   "shelf_6": ["avocado","cherry"],   "shelf_7" : ["clementine", "date", "fig"],   "shelf_8" : ["guava, "honeydew melon"],   "shelf_9" : ["lemon"],   "shelf_10" : ["kiwi fruit","elderberry"],   "shelf_11": ["mysore raspberry","mountain apple"],   "shelf_12" : ["starfruit", "scrub cherry", "pomegranate"],   "shelf_13" : ["sugar apple, "tropical appricot"],   "shelf_14" : ["chinese chestnut",passion fruit],   "shelf_15" : ["raspberry","wax apple"],   "shelf_16": ["blueberries"],   "shelf_17" : ["strawberry", "ugli fruit", "watermelon"],   "shelf_18" : ["quince, "satsuma","quince"],   "shelf_19" : ["pineapple"],   "shelf_20" : ["peanut","orange","blackcurrant","lime","nectarine"] } {   "_id" : "vendor3",   "shelf_1": ["mango","banana"],   "shelf_2" : ["jackfruit"],   "shelf_3" : ["lemon, "pulm","pineapple"],   "shelf_4" : ["orange","guava"],   "shelf_5" : ["apple","apricot"],   "shelf_6": ["avocado","cherry"],   "shelf_7" : ["pomegranate", "date", "fig"],   "shelf_8" : ["watermelon"],   "shelf_9" : ["kiwi fruit","strawberry"] } 

i have added index on both shelf , on each fruit. here each shelf's contains unique fruit , arrangement of fruits in shelf's different different vendors.

i want use above schema to

  1. find fruit shelf available vendor, when vendor known to
  2. find total number of shelf used particular vendor. suggestion regarding schema have created run above 2 queries

while on normalizing imposes serious problems, have under normalized schema.

  1. it not scale well. of time of writing, there 16mb size limit on bson documents. if have large vendor, (theoretically) might run problems. imagine walmart thousands of shelves in different locations. remember facebook has pays huge amounts of money because vastly underestimated need scale.
  2. with current schema, have have arbitrary number of indices when want index shelves. setting aside other problems: building index isn't free, when done in background.
  3. only 1 index utilized per query. need reduce number of indices.
  4. the questions asking don't need schema. both of time, vendor known. can use more conventional approach, easy , efficient queries.

here how it. i'd have vendor schema stuff name , location. next, i'd have shelf schema. each shelf have reference vendor, in sql. thing references "weak", say. since vendor known, _id query shelf schema.

the vendor schema

this pretty straightforward

{   '_id': new objectid(),   'name': 'acme mart',   'location': {     type: 'point',     coordinates: [ 34.180278, -118.328333 ]   } } 

the shelf schema

actually quite easy, too

{   _id: new objectid(),   vendor: "idofvendor",   description: "shelf 14"   contents: ["apples", "oranges", "kiwi" ] } 

the indices

setting aside geospatial index need vendors location field, here indices need

// if want search name db.vendors.ensureindex({name:1})  // use compound index here db.shelves.ensureindex({vendor:1,contents:1}) 

you use text index on contents, enabling search find both "apples" , "apple" found via "apples", that's on decide.

your queries

since vendor known, , _id, can find shelves containing kiwi:

db.shelves.find({vendor:"idofvendor", contents: "kiwi"}) 

counting number of shelves becomes simpler:

db.shelves.find({vendor:"idofvendor"}).count()