Sunday, October 4, 2009

CookieContainer domain handling bug fix

CookieContainer has a bug on handling domain name here
https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=478521
and Microsoft decided not to fix it in .NET 2.0/3.0/3.5.

I want to inspect the problem and want to do my own hack to solve this issue in .NET earlier than 4.0.

At first, there is three object related to cookie management in this issue, Cookie, CookieCollection and CookieContainer. Cookie is the cookie object, CookieCollection is just simply the collection of cookies and CookieContainer is a Cookie or CookieCollection container that will manage domain uri, path and expiry.

Now the issue resides in CookieContainer which not handle domain properly.

Domain handling is important because it tells client side browser or processor which cookie is visible to the client side code and can be sent back to the server. The theory is something like this:

Web from different domain can't access cookie which is belong to other domain. For example http://www.yahoo.com can't see all cookies from google.com.
Sub domain can access cookies for the sub domain and all its parent sub domain and domain itself. For example http://groups.google.com can see cookies from groups.google.com and .google.com.
Parent sub domain or domain itself can't access cookies from any of its child sub domain. So http://domain.com can't see cookies from sub.domain.com.

However I have a little confusing about the dot at the beginning as the rfc2019 standard like .domain.com and .groups.domain.com compare to non-dot domain.

Back to the issue, CookieContainer has three interested method for me to inspect. It is Add, GetCookies and SetCookies.

To properly inspect CookieContainer object, I should not trust to all these three methods instead, I want to use reflection to see what happened inside the object.
After some reflection studuying, I reveal this fields:

CookieContainer cc has m_domainTable field

Hashtable m_domainTable
Hashtable Key: string domain
Hashtable Value: PathList pathList

PathList pathList has m_list field

SortedList m_list
Key: string path
Value: CookieCollection colCookies

So from the CookieContainer object, we can get the Cookie object by this path: domain string > path string > cookies. All cookies are grouped by path and domain.
This is the code that helping me to study CookieContainer object, I get from http://channel9.msdn.com/forums/TechOff/260235-Bug-in-CookieContainer-where-do-I-report/ Thanks to JohnWinner:

public List GetAllCookies(CookieContainer cc)
{
List lstCookies = new List();

Hashtable table = (Hashtable)cc.GetType().InvokeMember("m_domainTable", System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.GetField | System.Reflection.BindingFlags.Instance, null, cc, new object[] { });

foreach (object pathList in table.Values)
{
SortedList lstCookieCol = (SortedList)pathList.GetType().InvokeMember("m_list", System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.GetField | System.Reflection.BindingFlags.Instance, null, pathList, new object[] { });
foreach (CookieCollection colCookies in lstCookieCol.Values)
foreach (Cookie c in colCookies) lstCookies.Add(c);
}

return lstCookies;
}

Now I want to test all Add, GetCookies and SetCookies methods if it doing right.

Testing Add method:

This are the code I use to test it:

Uri u = new Uri("http://sub.domain.com");
Cookie c1 = new Cookie("test1", HttpUtility.UrlEncode("www.domain.com"), "/", "www.domain.com");
Cookie c11 = new Cookie("test11", HttpUtility.UrlEncode(".www.domain.com"), "/", ".www.domain.com");
Cookie c2 = new Cookie("test2", HttpUtility.UrlEncode("sub.domain.com"), "/", "sub.domain.com");
Cookie c21 = new Cookie("test21", HttpUtility.UrlEncode("sub.domain.com"), "/", "sub.domain.com");
Cookie c22 = new Cookie("test22", HttpUtility.UrlEncode(".sub.domain.com"), "/", ".sub.domain.com");
Cookie c3 = new Cookie("test3", HttpUtility.UrlEncode(".domain.com"), "/", ".domain.com");
Cookie c31 = new Cookie("test31", HttpUtility.UrlEncode(".domain.com"), "/", ".domain.com");
Cookie c4 = new Cookie("test4", HttpUtility.UrlEncode("domain.com"), "/", "domain.com");
CookieContainer cc = new CookieContainer();

I test add the cookie by this,

cc.Add(c2);
cc.Add(c21);
cc.Add(c3);
cc.Add(c31);
cc.Add(c4);

and this

cc.Add(u, c2);
cc.Add(u, c21);
cc.Add(u, c3);
cc.Add(u, c31);
cc.Add(u, c4);

Note that c1 and c11 has been removed from testing Add(Uri, Cookie) and Add(Uri, CookieCollection) since using uri sub.domain.com will throw an exception The 'Domain'='www.domain.com' part of the cookie is invalid. I think the www consider a diffrent 'sub domain' which is not related to the sub domain (sub.domain.com).
Maybe the www.domain.com cookie used to let the cookie visible in the domain.com but not in its sub domain.

Checking the table, lstCookieCol, colCookies and c object in the GetAllCookies method above, I can conclude this:

Add and SetCookie method:

Add(Cookie) vs Add(Uri, Cookie) overloads:
Hashtable m_domainTable key store in different way. The key is domain name.
Add(Cookie) stored in domain key which is direct get from domain property of the cookie.
Add(Uri, Cookie) stored in domain key which are use the rfc2109 standard, all domains will start with dot. So all cookies with domain .sub.domain.com and sub.domain.com will merge into .sub.domain.com hashtable key. Note that this modification of domain name only effect to hashtable domain key and not domain property in the cookie. Domain property in cookie will remain unchange.

I get this result by checking table.Keys value. Another two Add overloads Add(CookieCollection) and Add(Uri, CookieCollection) will result the same like Add(Cookie) and Add(Uri, Cookie) respectively. Finction .SetCookie(Uri, cookieHeader) also use .Add() internally.

So now we have two different way on how the cookie stored in the CookieCollection object.
#1 The Key will be use direct from domain property of the cookie. This is related to Add(Cookie) which I think a BUG.
#2 The Key will start with dot. The dot will added in the beginning of domain property of the cookie if it doesn't have. This is related to Add(Uri, Cookie)

In my point of view #1 is a bug because it store .sub.domain.com and sub.domain.com in different group but logically we need the both cookies in the sub visible to http://sub.domain.com.

GetCookies method:

During inspecting Add method overloads, I also check the CookieCollection returned from cc.GetCookies(u).
To make easier to check the result I use this code (because List have better debugging visualization than CookieCollection):

List lstCookies = MyGetCookies(cc, new Uri("http://sub.domain.com"));

public List MyGetCookies(CookieContainer cc, Uri u)
{
List lstCookies = new List();
CookieCollection colCookies = cc.GetCookies(u);
foreach (Cookie c in colCookies)
lstCookies.Add(c);
return lstCookies;
}

Using this .GetCookies(Uri), retrieving cookies which is specific in #1 and #2 will have different result. Both result are not correct one and it resides into two different problems. Problem #1 and Problem #2 related to #1 and #2 respectively.

Problem #1: It can't retrieve cookies for current sub domain start with dot and from parent domain not start with dot. So I just get cookie c2, c21, c3 and c31.
Problem #2: It can't retrieve cookies for current sub domain since the sub domain was added a dot at the beginning. I just get cookie c3, c31 and c4.

I aspect all 6 cookies should be retrieved. After a few days, I just realize the Problem #1 and Problem #2 is the same thing on how GetCookies retrieve the cookies. It can't retrieve cookies for current sub domain start with dot and from domain not start with dot. Since Problem #2 use .Add(Uri, Cookie) and all domain key will start with dot, so it can only retrieve all parent domain.

Remember back, we have 2 domain keys .sub.domain.com and .domain.com. So when Uri is http://sub.domain.com. Only all cookies in .domain.com. can be retrieve. To make the GetCookies can retrieve the current sub domain in .sub.domain.com key, we need to make another key which is don't have dot at the beginning. It is sub.domain.com. That means we should have this 2 keys sub.domain.com and .domain.com.

Consider that the sub domain has another sub. So sub1.sub.domain.com need .sub.domain.com key in order to retrieve the cookie. So I can conclude that we need to make all domain have dot and non-dot version. Thus GetCookies can retrieve cookies from the current sub domain and all its parent.

Since using Add(Uri, Cookie) method only generate dot domain key, we need to copy it to a new non-dot domain key. Here is the function to fix it.

private void BugFix_CookieDomain(CookieContainer cookieContainer)
{
Hashtable table = (Hashtable)_ContainerType.InvokeMember("m_domainTable",
System.Reflection.BindingFlags.NonPublic |
System.Reflection.BindingFlags.GetField |
System.Reflection.BindingFlags.Instance,
null,
cookieContainer,
new object[] { });
ArrayList keys = new ArrayList(table.Keys);
foreach (string keyObj in keys)
{
string key = (keyObj as string);
if (key[0] == '.')
{
string newKey = key.Remove(0, 1);
table[newKey] = table[keyObj];
}
}
}

Thanks to SalarSoft that solve this bug fix. I get it in the source of ASProxy. This is an assumption that method Add(Uri, Cookie) is always used and dot domain key created. To be better coding you can modify this code to make sure dot domain key always mirror to non-dot domain key vice versa, then you can use Add(Cookie) method.

As a conclusion CookieContainer can be fix by using this function with this simple two conditions:
- Don't use Add(Cookie), always use Add(Uri, Cookie)
- Call BugFix_CookieDomain each time you add cookie or before you retrieve it using GetCookies or before system use the container.

CallMeLaNN

5 comments:

  1. Thanks,
    someone has posted a workaround in the microsoft bug report page:

    I've written a small function to work around the problem,
    it adds the "www." to each Uri which seems to "solve" the problem:

    public static Uri FixUriForCookies(string Url)
    {
    Uri Uri = new Uri(Url);

    if (!Uri.Host.StartsWith("www"))
    {
    if (Uri.Scheme == "http")
    {
    Uri = new Uri(Url.Replace("http://", "http://www."));
    }
    else if (Uri.Scheme == "https")
    {
    Uri = new Uri(Url.Replace("https://", "https://www."));
    }
    }
    return Uri;
    }

    didn't test myself but I don't it works for all urls such as sub-domains!

    ReplyDelete
  2. This may sound obvious, but this also applies to using the CookieContainer in HttpWebRequest and WebClient.
    It's really a pain in the butt, especially since Microsoft decided to fix this so late. So, thank you VERY much for the fix! I've applied it by executing BugFix_CookieDomain after every SetCookies-call.

    ReplyDelete
  3. Hi,

    Fix doesn't work right for me.
    I'm doing some requests on a domain and get cookies back for domain starting with ".", then I add my own for that domain (without www) and they end up in the collection as domain without ".".
    When I call you fixing code afterwards, the cookies I just added get overwritten by the ones that were there before.

    Ideas?

    ReplyDelete
  4. //bug fix, exists only in 3.5 FW, please wrap it with defines
    //http://dot-net-expertise.blogspot.com/2009/10/cookiecontainer-domain-handling-bug-fix.html
    if(!value.Contains("://www.")) //we are going to hit the bug
    {
    string urlWWW = value.Replace("://", "://www.");
    Uri uriWWW = new Uri(urlWWW);
    foreach (Cookie c in _cookieContainer.GetCookies(uriWWW))
    if (c.Domain.StartsWith("."))
    request.Headers["Cookies"] += c.Name + "=" + c.Value + ";"; //manually add the cookies
    }
    //~bug fix

    ReplyDelete