ruby rails invalid byte sequence in UTF-8 when read csv [solved]

使用ruby1.9或者ruby2标准库读取csv文件的时候出现:

CSV.read(csv_file_path)
# => ArgumentError: invalid byte sequence in UTF-8

原因是因为csv文件不是utf-8格式,你只要知道文件格式然后转一下就可以,我这里是中文GB18030编码,所以解决办法如下:

CSV.read(csv_file_path, encoding: 'GB18030:utf-8')

ruby1.9 regexp match named captures

ruby中的捕获型括号:

str = "this is test data"
str =~ /(da)(ta)/

p $~
# => #(MatchData "data" 1:"da" 2:"ta")
p $~.to_a
# => ["data", "da", "ta"]
p $1
# => "da"
p $2
# => "ta"

ruby1.9终于加入了命名捕获,用来取代$1,$2这种可读性很差的方式。

p RUBY_VERSION
#=> "1.9.2"

str = "this is test data"
/(?<d>da)(?<t>ta)/ =~ str

p d
#=> "da"
p t
#=> "ta"

#***注意顺序***
str =~ /(?<ddd>da)(ta)/

p ddd
#=> undefined local variable or method `ddd' for main:Object

invalid multibyte char (US-ASCII) ruby1.9

ruby1.9+rails3在helper方法中写中文时,出现invalid multibyte char (US-ASCII)异常了,查了下,原来ruby1.9是用ASCII编码来读源码的,奇怪。
解决办法是在文件的第一行加上

# encoding: utf-8

rails3中确定application.rb中有:

# Configure the default encoding used in templates for Ruby 1.9.
config.encoding = "utf-8"

阅读下面的文章:

http://blog.grayproductions.net/articles/ruby_19s_three_default_encodings
http://blog.grayproductions.net/articles/understanding_m17n

ruby1.9 hash

Hash有序
Ruby1.8:

ruby-1.8.7-p299 > h = {}
 => {}
ruby-1.8.7-p299 > h[1] = 1
 => 1
ruby-1.8.7-p299 > h
 => {1=>1}
ruby-1.8.7-p299 > h[0] = 1
 => 1
ruby-1.8.7-p299 > h
 => {0=>1, 1=>1}

Ruby1.9

irb(main):001:0> RUBY_VERSION
=> "1.9.3"
irb(main):002:0> h = {}
=> {}
irb(main):003:0> h[1] = 1
=> 1
irb(main):004:0> h[0] = 0
=> 0
irb(main):005:0> h
=> {1=>1, 0=>0}
irb(main):006:0>

类似json的写法:

ruby1.8不支持
ruby1.9:

irb(main):001:0> RUBY_VERSION
=> "1.9.3"
irb(main):002:0> {a: 1}
=> {:a=>1}
irb(main):003:0> {a_b: 1}
=> {:a_b=>1}
irb(main):004:0> {a-b: 1}
SyntaxError: (irb):3: syntax error, unexpected tLABEL
irb(main):005:0> {"a": 1}
SyntaxError: (irb):4: syntax error, unexpected ':', expecting tASSOC

参考链接

ruby-19-internals-ordered-hash/
on_the_horizon_ten_things_i_li.html
ruby-1-9-hash-with-a-dash-in-a-key