httpwebrequest - C# webrequest returning strange characters mixed in with html code -
i have code fetches searches google , noticed html retrieved contains characters compared web browsers response. noticed google seems forcing https might issue. if me figure out i'd appreciate it.
url = "http://www.google.com/search?hl=en&safe=off&q=test"; httpwebrequest myrequest = (httpwebrequest)webrequest.create(url); myrequest.proxy = null; myrequest.method = "get"; myrequest.useragent = "mozilla/5.0 (windows nt 6.1; wow64; rv:27.0) gecko/20100101 firefox/27.0"; myrequest.accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"; myrequest.headers.add("accept-charset", "iso-8859-1,utf-8;q=0.7,*;q=0.7"); myrequest.headers.add("accept-language", "en-us,en;q=0.5"); webresponse myresponse = myrequest.getresponse(); streamreader sr = new streamreader(myresponse.getresponsestream(), system.text.encoding.utf8); string result = sr.readtoend(); sr.close(); myresponse.close(); textwriter tw2 = new streamwriter(directory.getcurrentdirectory() + "\\google.html"); tw2.writeline(result); tw2.close(); here comparison between result code , web browser. first 1 code, notice ‎ near end. (the other slight difference doesn't effect , because of different headers or something.)
speedtest.net ookla - global broadband speed <em>test</em></a></h3><div class="s"><div><div class="f kv" style="white-space:nowrap"><cite class="_md"><cite class="visurl">speedtest.net</cite><cite class="visurl"></cite></cite>‎<div speedtest.net ookla - global broadband speed <em>test</em></a></h3><div class="s"><div><div class="f kv _xu" style="white-space:nowrap"><cite class="_md">www.speed<b>test</b>.net/</cite><div
something bad regex. it's totally normal have non-ansi characters in unicode response. must expect have also. living in unicode epoch now. must here - because present in google's response. , t's not bug, it's feature. :)
there absolutely nothing wrong webrequest.
Comments
Post a Comment