Sqawk:SQL 与 Awk 的融合:将 SQL 应用于文本型数据文件
Sqawk: A fusion of SQL and Awk: Applying SQL to text-based data files

原始链接: https://github.com/jgarzik/sqawk

Sqawk是一个命令行SQL工具,用于处理分隔符分隔的文件,例如CSV和TSV,类似于`awk`。它将数据加载到内存表中,允许你直接对文件内容执行SQL查询。你可以使用`sqawk -s "SELECT * FROM data" data.csv`来查询data.csv。 Sqawk支持标准的SQL操作,例如SELECT、WHERE过滤、UPDATE和DELETE。使用`--write`将修改保存回原始文件。它还可以顺序处理多个SQL语句。 主要功能包括使用`DISTINCT`查找唯一值,使用`INNER JOIN`连接表,以及使用`-F`指定不同的分隔符。例如,`sqawk -F '\t' -s "SELECT * FROM employees" employees.tsv`处理制表符分隔的文件。可以在命令中添加`-v`选项以启用详细日志记录。Sqawk采用MIT许可证授权。

Hacker News 最新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 Sqawk:SQL 和 Awk 的融合:将 SQL 应用于基于文本的数据文件 (github.com/jgarzik) 25 分,来自 ossusermivami,4 小时前 | 隐藏 | 过去 | 收藏 | 3 条评论 augusto-moura 1小时前 | 下一条 [–] 同样相关的是,SQLite 支持将 CSV 文件加载为虚拟表 [1][1]:https://sqlite.org/csv.html 回复 mrbungie 50 分钟前 | 父级 | 下一条 [–] 别忘了 DuckDB [1].[1] https://duckdb.org/docs/stable/data/csv/overview.html 回复 smitty1e 2小时前 | 上一条 [–] 现在我必须开发一个象形文字扩展,这样我就可以“像埃及人一样用 Sqawk 了”。我先走了。 回复 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系方式 搜索:

原文

Crates.io Docs.rs MIT licensed

Sqawk is an SQL-based command-line tool for processing delimiter-separated files (CSV, TSV, etc.), inspired by the classic awk command. It loads data into in-memory tables, executes SQL queries against these tables, and writes the results back to the console or files.

sqawk -s "SELECT * FROM data" data.csv

This loads data.csv into an in-memory table called "data" and performs a SELECT query.

Filtering data with WHERE clause

sqawk -s "SELECT * FROM employees WHERE salary > 50000" employees.csv
sqawk -s "UPDATE data SET status = 'active' WHERE id = 5" data.csv --write

This updates the status field to 'active' for rows with id = 5 and saves the changes back to data.csv.

sqawk -s "DELETE FROM data WHERE id = 5" data.csv --write

This removes rows with id = 5 and saves the changes back to data.csv.

sqawk -s "UPDATE data SET status = 'inactive' WHERE last_login < '2023-01-01'" -s "DELETE FROM data WHERE status = 'inactive' AND last_login < '2022-01-01'" -s "SELECT * FROM data" data.csv --write

This executes multiple SQL statements in sequence: first marking recent inactive accounts, then removing very old inactive accounts, and finally showing the results.

sqawk -s "SELECT * FROM users" -s "SELECT * FROM orders" users.csv orders.csv

Finding unique values with DISTINCT

# Get unique values from a single column
sqawk -s "SELECT DISTINCT category FROM products" products.csv

# Get unique combinations of columns
sqawk -s "SELECT DISTINCT department, role FROM employees" employees.csv

# Use DISTINCT with ORDER BY for sorted unique values
sqawk -s "SELECT DISTINCT region FROM customers ORDER BY region" customers.csv

Join tables with INNER JOIN

# Join users and orders using INNER JOIN with ON condition
sqawk -s "SELECT users.name, orders.product_id, orders.date FROM users INNER JOIN orders ON users.id = orders.user_id" users.csv orders.csv

# Join with additional WHERE filtering
sqawk -s "SELECT users.name, orders.product_id, orders.date FROM users INNER JOIN orders ON users.id = orders.user_id WHERE orders.product_id > 100" users.csv orders.csv

# Using DISTINCT with JOINs to find unique customer-product pairs
sqawk -s "SELECT DISTINCT users.name, products.name FROM users INNER JOIN orders ON users.id = orders.user_id INNER JOIN products ON orders.product_id = products.product_id" users.csv orders.csv products.csv
# Process a tab-delimited file (TSV)
sqawk -F '\t' -s "SELECT * FROM employees WHERE salary > 70000" employees.tsv

# Process a colon-delimited file
sqawk -F ':' -s "SELECT id, name, email FROM contacts" contacts.txt
sqawk -s "SELECT * FROM data WHERE value > 100" data.csv -v
sqawk -s "DELETE FROM data WHERE status = 'expired'" data.csv --write

By default, sqawk doesn't modify input files. Use the --write flag to save changes back to the original files.

For more detailed information, see:

Licensed under the MIT License (LICENSE or http://opensource.org/licenses/MIT).

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you shall be licensed as MIT, without any additional terms or conditions.

联系我们 contact @ memedata.com