c# - How to Convert Unicode Text File With URLs to ANSI using URL Encoding -


i have large text files containing urls. encoded in ucs-2 little endian. contain kinds of links contain: arabian, chinese, japanese, korean, russian , languages can think of in url.

my goal create script url encode automatically of these links , save them in ansi encoded file.

example:

these of original links:

http://ejje.weblio.jp/content/あきれて物が言えない https://ru.wikipedia.org/wiki/Дактиль http://zh.wikipedia.org/zh/垃圾食品 http://abunawaf.com/سيارات-الملوك-وورثتهم-صور http://ko.wiktionary.org/wiki/가능해지다 

these need become:

http://ejje.weblio.jp/content/%e3%81%82%e3%81%8d%e3%82%8c%e3%81%a6%e7%89%a9%e3%81%8c%e8%a8%80%e3%81%88%e3%81%aa%e3%81%84 https://ru.wikipedia.org/wiki/%d0%94%d0%b0%d0%ba%d1%82%d0%b8%d0%bb%d1%8c http://zh.wikipedia.org/zh/%e5%9e%83%e5%9c%be%e9%a3%9f%e5%93%81 http://abunawaf.com/%d8%b3%d9%8a%d8%a7%d8%b1%d8%a7%d8%aa-%d8%a7%d9%84%d9%85%d9%84%d9%88%d9%83-%d9%88%d9%88%d8%b1%d8%ab%d8%aa%d9%87%d9%85-%d8%b5%d9%88%d8%b1 http://ko.wiktionary.org/wiki/%ea%b0%80%eb%8a%a5%ed%95%b4%ec%a7%80%eb%8b%a4 

i've used c# that. i've tried using httputility.urlpathencode method this:

    static void main(string[] args)     {          string path = @"c:\temp\test.txt";         string enpath = @"c:\temp\entest.txt";          string[] lines = file.readalllines(path);          (int = 0; < 72; i++)         {             console.write(httputility.urlpathencode(lines[i]) + environment.newline);             system.io.file.appendalltext(enpath, httputility.urlpathencode(lines[i]) + environment.newline, encoding.ascii);         }          console.readline();      } 

it seems converting them except 1 small bug: if url contains question mark, doesn't convert after it. big handicap me have lot of links contain question marks.

example:

http://www.alkousy.com/showthread.php?4113-ÇáÚáã-ÈÇááøóå-åæ-ßäÒ-ÇáÃäÈíÇÁ-ææÑËÊåã-ãä-ÇáãÄãäíä 

is being converted as:

http://www.alkousy.com/showthread.php?4113-?????-???????-??-???-????????-???????-??-???????? 

this totally unacceptable me, , i'm looking solution. i've tried uri.escapedatastring well, guy converts including // , :

is there quick solution without custom coding anything?

use uri class instead:

var url = "http://www.alkousy.com/あきれて物が言.php?4113-ÇáÚáã-ÈÇááøóå-åæ-ßäÒ-ÇáÃäÈíÇ"; var uri = new uri(url, urikind.absolute); console.writeline(uri.getcomponents(uricomponents.absoluteuri, uriformat.uriescaped)); 

which output:

 http://www.alkousy.com/%e3%81%82%e3%81%8d%e3%82%8c%e3%81%a6%e7%89%a9%e3%81%8c%e8 %a8%80.php?4113-%c3%87%c3%a1%c3%9a%c3%a1%c3%a3-%c3%88%c3%87%c3%a1%c3%a1%c3%b8%c3 %b3%c3%a5-%c3%a5%c3%a6-%c3%9f%c3%a4%c3%92-%c3%87%c3%a1%c3%83%c3%a4%c3%88%c3%ad%c 3%87 

the uri class understands uri actual uri, knows not encode protocol. can adjust code this:

static void main(string[] args) {      string path = @"c:\temp\test.txt";     string enpath = @"c:\temp\entest.txt";      string[] lines = file.readalllines(path);      (int = 0; < 72; i++)     {         var uri = new uri(lines[i], urikind.absolute);         var escaped = uri.getcomponents(uricomponents.absoluteuri, uriformat.uriescaped);         console.writeline(escaped);         system.io.file.appendalltext(enpath, escaped + environment.newline, encoding.ascii);     }      console.readline(); } 

based on comments, can implement foreach loop:

foreach (var line in lines) {     uri uri;     if (uri.trycreate(line, urikind.absolute, out uri))     {         var escaped = uri.getcomponents(uricomponents.absoluteuri, uriformat.uriescaped);         console.writeline(escaped);         system.io.file.appendalltext(enpath, escaped + environment.newline, encoding.ascii);     } } 

Comments

Popular posts from this blog

css - SVG using textPath a symbol not rendering in Firefox -

Java 8 + Maven Javadoc plugin: Error fetching URL -

datatable - Matlab struct computations -