Tuesday, April 1, 2008

Detailed Look: Key components in LINQ to SQL and their Key Roles

Detailed Look: Key components in LINQ to SQL and their Key Roles

As part of this blog, I plan to have an on-going set of articles that takes a detailed look into some part of the .NET Framework. I plan to bring as much knowledge as I can find on the topic, but drill down into a subset of its components, to make aware and have discussions. This will bring about more understanding for me and the .NET Community. So if you have suggestions, comments, or a different idea on the subject, please share it.

Before I begin on discussing its key components, I would like to point you to some very good articles written about LINQ to SQL, which should be read to have a deeper understanding of this new technology, and prepare you to be able to use it comfortably. This is a brief listing of articles written to help you better understand how to use LINQ to SQL:

Scott Gu’s Multi-part Tutorials:

· Part 1: Introduction to LINQ to SQL

· Part 2: Defining our Data Model Classes

· Part 3: Querying our Database

· Part 4: Updating our Database

· Part 5: Binding UI using the ASP:LinqDataSource Control

· Part 6: Retrieving Data Using Stored Procedures

· Part 7: Updating our Database using Stored Procedures

· Part 8: Executing Custom SQL Expressions

Rick Strahl has many articles written on practical usages of LINQ:
http://www.west-wind.com/WebLog/ShowPosts.aspx?Category=LINQ

Pro LINQ by Joseph C. Rattz, Jr. (a book I have read cover to cover multiple times and actually reference some of his explanations).

Today I would like to discuss some of LINQ to SQL’s key components, specifically the DataContext and Entity objects. I have read a lot of articles dealing on how to effectively use LINQ to SQL, and how to structure it. However, I would like to drill down more into what two very specific parts are and what they are responsible for. Understanding these objects more fully and being aware of it, will help bring about understanding and correct usage of these components.

Before we drill down into what the DataContext and Entity objects are and what they are responsible for, let’s take a brief look at the key difference between LINQ to SQL and LINQ to Objects / XML.

What are the key differences between LINQ to SQL and LINQ to Objects / XML?

· LINQ to SQL needs a Data Context object (generic or a Custom Inherited version).

· LINQ to SQL returns IQueryable
LINQ to Objects / XML returns IEnumerable

· Normal LINQ queries are performed on arrays/collections that implement IEnumerable.

LINQ to SQL queries are performed on classes that implement the TQueryable interface, such as Table.

· LINQ to SQL is translated to SQL unlike LINQ to Objects / XML which are translated to Intermediate Language (IL).

LINQ to SQL is translated to SQL by way of Expression Trees, which allow them to be evaluated as a single unit and translated to appropriate and optimal SQL Statements.

· Normal LINQ is executed in local machine memory, LINQ to SQL is translated to SQL calls and executed on the specified Database.

What are the similarities shared between LINQ to SQL and LINQ to Objects / XML?

The similarities shared between all aspects of LINQ are the concept of Deferred Loading and Execution. I will not go into detail on this topic here, as it has been discussed countless times on various other blog articles. Suffice it to say, that it allows you define a LINQ query that maintains what you want to query, but doesn’t actually query that item until you use it.

Quick Example:
var query = from customer in db.Customers
where customer.City == "Paris"
select customer;

foreach
(var Customer in query) <<>{

Console.WriteLine(Customer.CompanyName);
foreach (var order in Customer.Orders) <<>

{

Console.WriteLine(order.OrderID);

}

}

This allows you to define what you query, through various code paths, until you actually need to use the query for data.

Example:
NorthwindDataContext northwind = new NorthwindDataContext();
var products = from p in northwind.Products
select p;

if (somecondition)
{ products = from p in products
where p.Discontinued
select p;
}

foreach (Product p in products)
{ // do something }


Responsibilities of the Entity Classes and the DataContext Class

Let us continue on to our discussion about what the DataContext and Entity objects responsibilities are and what they actually do for us. We will discuss the core concepts of what the DataContext and the Entity classes do and a description of how it does it. For the Entity class part, we will actually get down and dirty with how it actually accomplishes its responsibilities, since we can visibly see how it does it via the code generated for us.

Please note, that all of this is automatically handled if you use the SQLMetal or OR/M designer in Visual Studio 2008. This knowledge is to understand what it automatically provided to us, to better understand these objects and also if we decided to implement these features ourselves.

DataContext class is responsible for identity tracking, change tracking, and change processing. All of this is automatically handled by the base DataContext Class.

Identity Tracking:
When a record is queried from the database for the first time since the instantiation of the DataContext object, that record is recorded in an identity table using its primary key, and an entity object is created and stored in cache. Subsequent queries that determine that the same record should be returned will check the identity table, and if the record exists in the identity table, the already existing entity object will be returned from the cache. This is an important concept to grasp, so I will reiterate it in a slightly different way. When a query is executed, if a record in the database matches the search criteria, and its entity object is already cached, the already cached entity object is returned. This means that the actual data returned by the query may not be the same as the record in the database. The query determines which entities will be returned based on the data in the database, but the DataContext’s identity tracking service determines WHAT data is returned. Lucky for us, we can refresh the DataContext’s cache and have it retain our changes. I will show an example of this later.

Change Tracking:
Once the identity tracking service creates an entity object in its cache, change tracking begins for that object. Change tracking works by storing the original values of an entity object. Change tracking for an entity object continues until you call the SubmitChanges method. Calling the SubmitChanges method causes the entity objects’ changes to be saved to the database, the original values to be forgotten and the changed values to become the original values. This allows the change tracking to start over. This works fine as long as the entity objects are retrieved from the database. However, merely creating a new entity object by instantiating it will not provide any identity or change tracking until the DataContext is aware of its existence. To make the DataContext aware, simply insert the entity object into one of the Table properties (that represents the collection of the Table). To accomplish this just call InsertOnSubmit or Attach method on the DataContext’s Table property passing this new entity instance as a parameter. When this is done, the DataContext will begin identity and change tracking on that entity object.

Change Processor:
When you call SubmitChanges() method, the DataContext object’s change processor manages the update to the database. First, the change processor will insert any newly inserted entity objects to its list of tracked entity objects. Next, it will order all changed entity objects based on dependency. Then, if no transaction is in scope, it will create a transaction so that all SQL commands carried out during this invocation of SubmitChanges will have transactional integrity. It uses SQL Server’s default isolation level of ReadCommited, which means that data read will not be physically corrupted and only committed data will be read, but since the lock is shared, nothing prevents the data from changing before the end of the transaction. Lastly, it will enumerate through the ordered list of changed entity objects, creating necessary SQL and executing them.


Entity
classes are responsible for change notification, graph consistency, and implementing good practices.

Change Notification:
The DataContext must be able to monitor this Entity class and know what changed and when it is changed. The Entity class must notify the DataContext that something has changed, in some form or fashion.

Graph Consistency:
Updating the relationship between two entity objects, such as Products and Orders. The reference on each side of the relationship must be properly updated so that each entity object refers to each other (or no longer refers if removed).

Implementing Good Practices:
Entity classes should implement INotifyPropertyChanging and INotifyPropertyChanged. If you decide to make an entity class by hand, and do not implement this and change notification, the DataContext will need to create 2 copies of each entity object: one with original copy, to compare and determine changes (highly inefficient).

Add OnCreated to the constructor of the entity class so that the DataContext knows it is created and create partial methods of [PropertyName]Changing / [PropertyName]Changed for each property and add it before and after each set of that property.

Example of Property for an Entity Class and how it accomplishes change notification, and graph consistency (as taken from the book Pro Linq):

[Column(Storage="_ShipCountry", DbType="NVarChar(15)")]

public string ShipCountry

{

get

{

return this._ShipCountry;

}

set

{

if ((this._ShipCountry != value))

{

this.OnShipCountryChanging(value);

this.SendPropertyChanging();

this._ShipCountry = value;

this.SendPropertyChanged("ShipCountry");

this.OnShipCountryChanged();

}

}

}

As you can see in this example, the get part of this property is relatively simple. When we take a look at the set part of this property, we can see it is calling the INotifyPropertyChanging and INotifyPropertyChanged versions of SendPropertyChanging and SendPropertyChanged. This is to let the DataContext know that this property is about to change and has actually changed.

The other two odd looking methods to notice is the OnShipCountryChanging and OnShipCountryChanged methods. This is the partial methods that the user can attach to (if they decide to partial out this entity class) to add validation or additional logic if they so choose. Partial methods are like lightweight Event Handlers, in that you can stub them out in one partial class definition, and actually define them in another partial class definition. But if they are never defined in another partial implementation, the compiler removes all traces that the partial method ever existed. So it is a very efficient way to add functionality, and if you never use it, it isn’t added to the compiled code – hence like a lightweight event.

Examples of a Property for an Entity Class and how it accomplishes change notification and graph consistency (as taken from the book Pro LINQ):

public Order()

{

this._Order_Details = new EntitySet<Order_Detail>(

new Action<Order_Detail>(this.attach_Order_Details),

new Action<Order_Detail>(this.detach_Order_Details));


this
._Customer = default(EntityRef<Customer>);

OnCreated();
}

private void attach_Order_Details(Order_Detail entity)

{

this.SendPropertyChanging();

entity.Order = this;

}

private void detach_Order_Details(Order_Detail entity)

{

this.SendPropertyChanging();

entity.Order = null;

}

This part of the example sets up the relationship to the Order Entity object. It sets up the child objects Order_Details, and its parent object Customer. The interesting thing here to note is the fact that the parent object is of type EntityRef and the child object is of type EntitySet. EntityRef and EntitySet are generic types that allow the deferred loading to occur. Meaning, they simply state what the relationships are, but do not actually load the references, until it is absolutely necessary. The main difference is EntityRef is a single object that represents the fact that it is a parent and EntitySet is a collection of objects that represents a set of children.

You can see above in this example that EntitySet gets instantiated with two Action delegates. This is to specify how to attach and how to detach. It will use these mechanisms to add the children and remove the children to this relationship. Adding will take place when you manually add a new or existing Order_Detail object to this Order object, and remove will occur when you Remove it. This setup is there because to remove or add a relationship, requires it be done on two sides. If you are adding a child object to a parent object (such as adding a Order_Detail object instance to an Order object), you must tell both the Order_Detail object that it is now a child of the Order object (setting its parent), and tell the Order object that it now contains a new child object of that Order_Detail instance. The above example deals with this new child object and how to attach to this parent object (as you can see in the attach_Order_Details method above).

Simply put, LINQ to SQL will use these two action delegates to assign a Order to an Order_Detail, or remove an assignment of Order from Order_Detail. This and the previous example is how Change Notification works and functions in LINQ to SQL, and how the DataContext knows what to update, remove, and insert.

[Association(Name="Order_Order_Detail", Storage="_Order_Details", OtherKey="OrderID")]

public EntitySet<Order_Detail> Order_Details

{

get

{

return this._Order_Details;

}

set

{

this._Order_Details.Assign(value);

}

}

In the Order_Detail (children reference) example above, it simply returns out the EntitySet collection of the children objects and maintains the deferred execution until you absolutely need it. When the child objects are needed, it executes the relevant SQL and grabs the child objects. It then calls the set part of this property which re-assigns the EntitySet with the loaded child objects.

[Association(Name="Customer_Order", Storage="_Customer", ThisKey="CustomerID",
IsForeignKey=true)]

public Customer Customer

{

get

{

return this._Customer.Entity;

}

set

{

Customer previousValue = this._Customer.Entity;

if (((previousValue != value)

|| (this._Customer.HasLoadedOrAssignedValue == false)))

{

this.SendPropertyChanging();

if ((previousValue != null))

{

this._Customer.Entity = null;

previousValue.Orders.Remove(this);

}

this._Customer.Entity = value;

if ((value != null))

{

value.Orders.Add(this);

this._CustomerID = value.CustomerID;

}

else

{

this._CustomerID = default(string);

}

this.SendPropertyChanged("Customer");

}

}

}

In the Customer (parent reference) example, we will skip the get part of the Property as it is fairly obvious what it is doing and focus on the set part.

Customer previousValue = this._Customer.Entity;

You can see that the first line of the set method code, they store off a copy of the original Customer assigned. Don’t let the fact that it is calling Entity on the _Customer member confuse you. _Customer is of type EntityRef, so in order to get the actual customer, we actually have to call directly to it.

if (((previousValue != value)

|| (this._Customer.HasLoadedOrAssignedValue == false)))

The above statement is checking to see if the Customer is currently being assigned to an existing customer. If it is the same customer that is already assigned, there is nothing more to do.

this.SendPropertyChanging();

As part of the change tracking system, this is notifying the DataContext that we are about to change the Customer Property as part of the Change Tracking / Notification piece.

if ((previousValue != null))

{

this._Customer.Entity = null;

previousValue.Orders.Remove(this);

}

This next part of the code determines if the previous Customer object is null. If it isn’t null, then clear out the relationship between the previous parent and this child object. The inner two lines simple removes the parent object reference and the child reference to this object (in the previous parent). Calling the Remove method above will cause the Customer class’s detach_Orders (similar to the top example) to get called and the passed Order object to be removed. In the detach_Orders method, the passed Order object’s Customer property is set to null and looks like the following:

private void detach_Orders(Order entity)

{

this.SendPropertyChanging();

entity.Customer = null;

}

As you can see, when the Customer property is set to null, this will cause the Order object’s Customer property’s set method to be called, which is the method that invoked the code that called the detach_Orders method. So the very method that started this process of removal is getting called recursively.

set

{

Customer previousValue = this._Customer.Entity;

if (((previousValue != value)

|| (this._Customer.HasLoadedOrAssignedValue == false)))

Remember how we set the Customer parent object to null before we called the detach_Order method? Well, because of this, the previousValue is set to null, and since we are passing in null, stops the execution here without doing anything else.

So, once the recursion call to the set method returns, we no longer have a reference between the prior parent and that parent to this child object. We are now back to the next line of our code.

this._Customer.Entity = value;

if ((value != null))

{

value.Orders.Add(this);

this._CustomerID = value.CustomerID;

}
else

{

this._CustomerID = default(string);

}

The above first line sets the new parent object to the _Customer member to maintain the parent relationship. The next line will check the value to make absolutely sure it is not null. If it was null, it would just assign the default value. If it isn’t null, we will set the new parents child reference to point to this object. The current Order object will be passed to the Customers collection of child Order objects. If it was null, it would just assign the default value.

The result of this will cause the attach_Order method to be called. This will assign the current Order object’s Customer object to the passed Customer, resulting in the Order object’s Customer property’s set part being called again (the second part of the recursion).

if (((previousValue != value)

|| (this._Customer.HasLoadedOrAssignedValue == false)))

Just like previous, this line will break the recursion. Remember “this._Customer.Entity = value;”, before our recursion out, we set the Oreder object’s Customer property to the new Customer, who was passed this set part again from the attach_Orders method. Since they are the same, this exits out the recursion the same way the detach did.

The last thing of relevance is setting the Customer ID from the new Customer parent, and the letting the DataContext know that this property was changed via the
this.SendPropertyChanged("Customer") part.

This whole system is required to maintain a one-to-many relationship, between the Entity objects and maintain the graph consistency between them. If you decide to write an Entity class by hand, you must remember to implement a feature similar to this, as the Entity class is responsible for its own graph consistency. A typical approach (if you really want to make your own entity class), is to allow the ORM create these classes for you, and copy the code and paste it into your own Entity class. This will cut down on the amount of work you need to do to maintain change tracking and graph consistency.

It may take a couple of re-reads to fully get what is accomplished here, but it will be worth it to fully understand how all of the change tracking and graph consistency “magic” occurs. If you truly want a deeper understanding of these concepts, I highly recommend reading the book “Pro LINQ” I mentioned earlier. It opened my eyes to a lot of the inner workings of LINQ in general.

Some of the key components not mentioned in this article is the concept of Expression Trees. This is essential to how LINQ to SQL works. In future blog articles I will try to get more in depth with this concept, and how they relate to the .NET Framework 3.5 as a whole. For now I highly recommend reading the article posted here. Another thing I would like to touch on in the future is the inner workings of the DataContext, and how it accomplishes the responsibilities.

Look for more Detailed Look blog articles in the future, as well as Tips and Tricks (quick excerpts on useful features in .NET), Common Design Patterns (review of common design patterns and how they relate to you as a .NET developer), and Hair Pullers (those things that are frustrating gotchas and simple ways around them).