Hey friends, I'm Scott Hanselman and it's another episode
of Azure Friday, I'm here with Emily Lawton she's a software
engineer on Cosmos DB.
How are you?
>> I'm great, Scott.
Thanks for having me, happy to be here.
>> Yeah, it's cool.
So I've used Cosmos DB with some Mongo apps where I literally
just swapped the URL out for the connection string and
it totally worked.
>> Yeah.
>> And then I just said to myself,
wow that's amazing, it's a hundred percent compatible.
Is that true, is it 100% compatible?
>> So, that's why I'm thrilled to be here today because we
have a number of new features that been working on in the past
couple months, and even the past couple weeks.
That will enable Mongo customers to migrate much more complex and
powerful applications to Cosmos.
And we have, with these updates we can much more confidently say
we do support all of the most sought after Mongo features and
are fully invested in really making sure that we have any
developers coming over from Mongo,
that we are there to support them with proper tools and
resources to succeed in Cosmos.
Now there's, I imagine,
a pie chart of all the features that Cosmos could have, right?
>> Yeah, there's a lot.
>> There's a lot of stuff, right?
And you're doing like some percentage of them but
it's less than a 100%.
>> Yes. >> But still 80%,
90% some big number cuz it works really well.
When a Mongo application talks to Cosmos and
it doesn't support that, does it just go Or how does it say,
I don't know how to do that?
>> Yeah we're constantly working on improving this experience,
and like just the other day, we were working on updating.
So we have a lot of features in private previews, so
you need to email us.
Make sure we're giving much more gentle error messages,
to let them know that you know, we do support this,
we're working on it.
>> Okay.
Yes [CROSSTALK] >> So if you're an active
user of this and you really want to engage with the Cosmos
team then you could talk to you via private previews.
>> Yeah. >> And then when you start to
use a new feature, it could say,
we're totally working on this feature, don't forget.
>> Yeah, and all the things we support are completely based on,
we heard this customer yesterday that really needs this
feature and [CROSSTALK] >> Really?
>> We run and
are fighting fires everyday to make sure we help them out.
>> What are some new things that are happening in Cosmos space?
>> So the three things I'm going to discuss today are some tips
and tricks for a successful migration, support for
the aggregation pipeline and support for unique indexes.
>> And aggregation pipeline is a feature that people who use
Mongo really like and it's one [CROSSTALK]
>> They love it.
>> Of their most asked-for ones.
>> Yeah, yeah.
So, one of the major pain points that our Mongo customers hit
when moving over to Cosmos is simply just migrating their
data.
So we have a few updates to help squash any errors that
they're running into there.
When customers have a lot of data we generally ask them to go
to the MongoDB download center, get Mongo import.
And a lot of times when they run this they'll hit throttling
errors.
And this is largely because collections when created in
Cosmos are defined within 1000 provision RUs.
Oftentimes this isn't enough for a large data migration.
So one of the first steps we want our customers to know is
that it's best if you go into the Azure portal and
pre-create the collections you wanna import data into.
>> Mm-hm. >> And
bump up the RUs during the migration then once migration is
complete go down and decrease the RUs.
So this is one way to help get past these issues.
Another thing is that we've added two new options through
the Mongo import command.
>> Mm-hm.
>> One is number of insertion workers and batch size.
And I've included a link here which goes through a very
detailed process of how to compute these values.
And when computed correctly, you won't see any throttling, and
your migration will be much smoother, and you'll get up and
going with cosmos much quicker, right?
>> Do you have a rule of thumb or general sense,
cuz you said large.
>> Yeah. >> And like, I did a migration,
but I don't know what I'm doing.
And I used an application called MongoChef Core.
And I basically opened up two connections, one to Cosmos and
one to Mongo.
And I dragged and dropped.
I don't know if that was bad.
>> Yeah, no, so it's a lot of experimentation to get the right
parameters here to make sure your migration is smooth.
>> Mm-hm.
>> But I swear by this link, if you go to it,
it goes through details how to compute [CROSSTALK]
>> That is the bible for
Cosmos DB migrate.
>> Yes. >> Cool, so
if I follow those then I would have a good experience.
>> You will.
>> Okay cool, thank you.
>> The next update that I'd love to talk about today is our
support for the aggravation pipeline.
>> Okay.
>> So we know in today's data-driven age,
it's incredibly important to give tools to our developers.
To make sure that they can quickly gain insights into their
data.
So Mongo
developed this framework called aggregation pipeline.
Which is essentially an alternative to map reduce.
But it's much more lightweight, it's easier to get ramped up on.
And it's used whenever the complexity
of map reduce is unwarranted.
So I'll go through some demonstrations of the things
that you can do with aggregation pipeline.
>> Please. >> This is something our team
has been working on for [CROSSTALK]
>> And out of curiosity,
what would I have seen if I'd tried to do these things before?
Would Mongo have given?
Would Cosmos have said an error or unsupported.
>> It would have said, yeah, we don't support this.
>> Okay, but it returns an error that
indicates a lack of support.
>> Yes, yes.
>> Not like a general- >> Not just some-
>> 500 error.
>> Yeah, of course.
>> Okay, that's cool.
>> So, one of my favorite hobbies is running.
So, let's connect to my account and
I've loaded up the results of the 2017 Boston Marathon here.
>> And you're in the Mongo shell, you're just in
the Windows [CROSSTALK] >> This is the Mongo shell.
>> The Mongo shell.
Is it bad that the shell and the server versions don't match?
Does that mean anything?
>> In this scenario it's not a problem.
There were some new operators that were added in 3.4.
>> Okay, not a big deal.
>> Yeah.
So, let's first just do.
>> Did you say you ran the Boston Marathon?
>> No, I did not run it.
I wish!
So an aggregate command takes an array of stages.
>> Okay.
>> So, let's just experiment with a match stage first.
So it matches, essentially, just a SQL where clause.
>> Okay. >> So let's say we wanna find
all the data we have about the runner Galen Rupp.
Galen is an American
professional long distance runner.
He runs for the Nike Oregon project,
he's a role model of mine.
>> That's cool. >> So let's see [CROSSTALK]
>> I'm from Oregon,
I grew up down the street from Nike.
>> Nice. >> If I bump in to him
while I'm walking, then I will say hi.
>> [CROSSTALK] So this is what a match query looks like and so
we see this is a document that is associated with Galen Rupp.
We see he's from Portland, his official time was 2 hours and
9 minutes.
>> And his overall place was second.
>> It's about 26 miles.
>> Yes. >> That's like 12 miles an hour.
>> It's 26.2, yeah.
>> I don't think I can run like that.
>> [LAUGH] So he had very impressive finish.
So this type of query isn't that interesting.
So let's do some more interesting types of [CROSSTALK]
>> Out of curiosity,
is that the user aggregated query but
there is ways that I could have queried for
that without using that [CROSSTALK] cuz I was aware of.
>> Yeah, you could also do a simple refined query and
specify the filter that you wanna specify.
And I should note that you always want it,
when using the aggregation pipeline, this is just a tip for
once you get started in using this, you wanna do the match
stage generally as early on in pipeline as you can
to filter out and make sure you have fewer documents.
As you go through the pipeline.
>> I see, so it is a pipeline, like a sales pipeline,
it's like the smaller it gets, the less data you have to deal
with further down in the pipeline.
>> Yeah. >> Okay, cool.
>> So let's find all the runners that actually ran faster than
three hours.
So here we'll use a conditional operator,
less than, still the same type of match query.
And now we'll get a long list of all the runners that
ran faster than three hours.
This is not that interesting because it's hard to kind of
visualize or scroll through these results.
But let's say we actually,
now let's do something a little bit more interesting.
Let's say we want to group the runners by their country and see
what the average place of the runners In those groups of his.
>> So here we add the group stage.
And all group stages require an underscore ID field,
which is what you're grouping the documents by.
>> If I understand your tip from before, if I had flipped this
around and did the group first and then the match,
it's kind of silly because I'd be taking everyone and
grouping them by country, and then Chopping them up by time?
>> Yes, yes.
So let's go ahead and make sure I copy that one.
>> Feels like PowerShell for data.
>> [LAUGH] Kind of, and
again we get a list of an array of all of the countries that had
runners that ran faster than three hours.
>> And is it sorted?
>> This is not sorted,
I'm gonna, I've another query that will, getting ahead of us?
>> No, pardon me.
>> So the next one, let's find the number of runners in each of
those groups which will be a be a little bit more interesting
and here we can specify within the group stage, an accumulator
and we're gonna use the summing accumulator and sum up one for
each runner in each group >> Sum up one, okay so,
then, it will be the count of the number of people
under three hours per country.
>> Yes.
>> Okay, so you're doing a projection at this point,
you're making a new kind of-.
>> Yeah, we created a new field.
And so here we see, again it's kind of hard to understand.
>> Austria had eight, that's good, Egypt had one.
>> Yeah and then the next thing we'll do is, let's
find the average like we wanted to and like I said earlier find
the average overall place of each of the countries.
But let's order them in a way such that we can actually see
which country had the fastest average time.
>> That's a good point,
otherwise I'm just scrolling around, okay.
>> And we're going to sort by average place, and we pass
the negative one because we want to sort in descending order.
>> Negative one is descending, okay.
And here we see.
>> Zimbabwe.
>> Zimbabwe actually had the fastest, or the lowest
average place, which was 14 but they only had one runner.
So last query we will do is let's see
who had the most runners who finished less than three hours.
My wife's from Zimbabwe.
>> [LAUGH] >> Okay, cool.
>> And here we see the United States had the most runners that
finished faster than three hours.
>> Are we last, though?
>> We are not.
>> We sorted last on this.
>> You can also find that.
Well, no I sorted in ascending order this time.
>> Okay good.
>> So you can see that we had over a thousand runners which is
over ten times more than the next country had.
>> Which makes sense cuz it was in Boston.
>> It was in Boston.
>> That is pretty slick and it's coming back so fast though.
>> Yeah, so this data site had, I think, 25,000 documents, so
it's not huge.
But we're continually working on making sure this
pipeline is scalable.
Because when you pair it with all of the features that you get
that come with Cosmos like autoscaling with provisionary
use, auto-sharding, and automatic indexing, it makes for
a really synergistic effect and a great experience.
>> Yeah I mean you're getting the feeling of your favorite
database, your Mongo experience, and the familiarity.
That I can see that you have with the syntax, but
it's a global database, it's geo replicated,
it's the best of both worlds.
>> Yep, exactly.
>> That's pretty slick.
>> So it's super fun to work on.
>> And then, quickly, one other update that I'd like to discuss
is support for unique indexes.
This was another thing that our customers have been badgering us
for, so we went ahead and delivered.
And as the name suggests,
unique indexes just ensure that the fields that you specify
the unique indexes on cannot have duplicate values.
>> Interesting, that feels like it blurs the line a little bit
between document database, we just throw anything into it.
>> Yeah.
>> And relational database where the ID matters.
>> Yeah.
You can imagine if you had some sort of a collection, that was
containing account data that had the tax ID field or something.
Wanna make sure that field is unique, so
that you don't have multiple accounts with the same tax id.
>> Okay.
>> And so quickly, here's the syntax for
how you create a unique index.
You can go ahead and once you have this feature enabled
on your account you can create it on any field.
For instance, this is the user_id field and
then you just specify the unique attribute to true.
>> That seems pretty straightforward.
>> Yeah.
>> What if it already has a bunch of data in it?
So one thing to know is that you cannot actually create a unique
index field that already, Does not follow
the constraint, right?
>> Okay, I see, so it's only at the time of creation that you do
that, is modified.
>> Yes, well,
you can still actually create it once the field has already
been there.
But we'll just go through and we'll make sure that you don't
already have duplicate values there cuz you can't do that.
>> And then if you did, you would deal with that?
>> We'll still work on a better experience.
>> But it sounds like it's pretty straight forward.
I can just make a temporary collection,
spin through it project over to it and de-doop.
Very cool, and is this all right now happening now or
do I have to sign up for a preview?
How do I get involved?
>> So these two features are currently in private preview,
which means I've included the link that you can email us at.
And we'll enable this,
we're very quick about enabling it on accounts.
And by the end of the year this will be in public preview and
you can actually go into the Azure portal,
in a self serve fashion enable it on our accounts.
>> Very cool. >> And
take advantage of all these new updates.
>> And you were saying that you are doing all of these features
in this order because people are asking for them.
>> Yeah. >> And if they email you,
like real people are on the other end of that email and
a team is excited to hear about what you're using with Mongo and
Cosmos and how to make it better.
>> Yes.
>> Very cool. Thanks so much for your time.
>> Thank you for having me.
All right, I am learning all about Mongo DB and
Azure Cosmos DB here on Azure Friday.
[MUSIC]
Không có nhận xét nào:
Đăng nhận xét